[ Search ] - [ Help ] - [ FAQ ] - [ FTP data ] - [ Release Notes ] - [ Build Procedure ]
[ Contact Us ] - [ Related Links ] - [ IMAGE Home ]


Help

You can read this document in the traditional manner, or just call it up as needed. Throughout IMAGEne whenever help is available (and hopefully whenever needed) there will be a Bold Link, which will take you to the relevant section.

What is IMAGE?
What is IMAGEne?
User Interface
Search
Build Information
Known Genes
Full
Predicted Full
Unknown
Partial
Empties
Candidate Genes
Multi-Member
Singletons
Search By
Gene Name / Keyword
IMAGE Clone ID
GB Accession #
Sequence
Cluster ID
Species
Minimum Blast2 Score
Search Results
Cluster ID
Description
Fulls
Predicted Fulls
Unknowns
Partials
Display
Ordered Clone List Frame
GenBank
Coverage
Library
Vector
Tissue
Length
Sequence Verified By
Other Clusters
Alignments Frame
Show Alignments
Contig to Display

What is IMAGE?

Integrated Molecular Analysis of Genomes and their Expression. The I.M.A.G.E. Consortium was founded in 1993 to accelerate gene discovery through the use of arrayed cDNA libraries, and to aid in the accumulation of sequence, map, and expression information for all genes. One of the initial goals of the Consortium was to create a non-redundant set of unique genes representing the complete set of human transcripts, and to provide this resource to the research community as a basis for the analysis of the human genome. Recently, the Consortium has begun to focus on the genomes of model organisms, such as mouse and zebrafish, to complement the work being done with human clones. In addition, the I.M.A.G.E Consortium is a part of the NCI Cancer Genome Anatomy Project in which cDNA clones derived from tumor libraries will be used to study gene expression patterns in tumors, and the NIH Mammalian Gene Collection, focusing on obtaining full-length cDNA clones.

All clones are available from any of our authorized distributors, and all sequence obtained from the clones is submitted immediately to Genbank.

More information is available through our web page at http://image.llnl.gov

What is IMAGEne?

IMAGEne is a software package for clustering IMAGE clones/ESTs to known genes, and to each other. It is a useful tool to aid in the re-array of IMAGE clones for public distribution. The publicly accessible web interface that by now you've seen has a dual purpose.

User Interface

IMAGEne's user interface is broken up into three main pages: Search, Search Results and Display.

Search

This page is where you can enter search parameters to find one or more IMAGEne clusters.

Build Information

IMAGEne's data is rebuilt periodically. This area serves to provide the user with summary information regarding this build. Imagene's data set can be broken up into two categories: Known Genes and Candidate Genes:

Known Genes

Known gene clusters are groups of Clones / ESTs that are homologous to the best known representations of known gene sequences: NCBI Reference Sequences. All known gene clusters fall into five categories, and a count is provided for each.

Full - A known gene cluster that has one or more full-coding clones in it.
Predicted Full - Predicted fulls have been computationally predicted by NCBI for the MGC (Mamalian Gene Collection) project to contain the entire ORF based on the 5' EST only. In most cases the 3' EST from that same clone has not been determined but is assumed to contain the complete 3' end of the gene.
Unknown - Unknowns describe 1) those clones for which it is not known whether the clone represents the entire ORF for that gene (ie if only one EST has been determined and it covers only one side of the coding region), and 2) any clone in a cluster that does not correspond to a known gene (since the transcript size is not known for novel genes).
Partial - Partials describe those clones for which there is evidence that the clone does not represent the entire ORF for that gene (either the 5' or 3' EST does not cover the coding region).
Empties - A known gene cluster that does not contain any clones.

Candidate Genes

Candidate gene clusters are gene clusters that are derived from EST and full-insert sequences, but do not correspond to any members of the NCBI Reference Sequence set.

Multi-Member - A candidate gene cluster containing sequences from more than one clone.
Singletons - A singleton is an IMAGE clone whose ESTs do not cluster with any other clone, and contain a minimium of 50 consecutive base pairs of non-repetitive sequence (as determined by RepeatMasker).

Search by

IMAGEne's data set can searched using any one of five distinct methods. (Note: The search tool is case sensitive)

Gene Name/Keyword

Will allow queries by one or more keywords logically ANDed together. Keywords may be portions of the gene's GenBank accession number, its proper name, abbreviated name, or related words. None of these is guaranteed to exist, but with great frequency some or all of these aspects are available.

Ex. Query on the phrase "card". A table of results will be returned. Matches were found on the words cardiac and cardiotrophin.

If a query produces too many results you may wish to refine your search. This can be done in two ways: additional letters, or additional words. By searching on "cardio" there will be fewer matches. When all relevant letters have been specified it may be necessary to use additional words in the query. To illustrate search for "cardiac protein". Only those entries with both words are returned.

This query can also be used when you know the GenBank Accession number of a gene. Searching on "NM_000206" will return the cluster pertaining to that sequence. You can also search using ranges. For example searching on "NM_00020" will return all the genes with GenBank IDs in the ranges NM_000200-NM_000209.

IMAGE Clone ID

This is a much simpler query. Provide one or more IMAGE Clone IDs and a complete list of clusters containing one or more of the clones is returned. Substring matches are not permitted. The known or candidate gene clusters containing any of these clones will be returned.

Ex. Query on "510700 123456".

GB Accession #

This method is identical to searching by clone except that here the query is performed against the GenBank accession number(s) of the ESTs or the GenBank accession (of the RefSeq entry) for the known gene.

Ex. Query on "R91111 AA132727".

Sequence

When no identifier is available for reference, searching can be done by sequence. Nucleotide sequence can be pasted into the query window, and compared (by blastn v2.0.8) to the known genes.

Matches are returned sorted by the quality of the match.

The number of matches can be controlled by changing the Minimum Blast2 Score.

Cluster ID

This method will allows searching by one or more known gene or candidate gene cluster ID's.

Species

Allows a search to be conducted against all species for which Imagene has clusters or for just a particular species of interest.

Minimum Blast2 Score

This parameter is only significant when doing a search by sequence. All other times it can be safely ignored. When it has been set below Blast2's default it will be ignored and all possible matches will be displayed. Setting it too high will cause no scores to be returned. The notion of a 'reasonable' setting varies with the particular sequences you may be using. If you are having difficulty try setting this to 0 and increase it if you are finding more matches than you expect.

Search Results

Cluster ID

Each row in this table is the entry for the cluster of a particular gene. The button links to the display. Note that Candidate Gene clusters do NOT have consistent IDs from build to build. This is due to the nature of the clusters being dynamic and the fact that the consensus sequence for a cluster could change in subsequent builds due to the entrance of new clones.

Description

This description is the reduced description for a gene. Common phrases such as "gene", "human" and "complete cds" have been removed. This should help avoid false matches.

Fulls

Fulls displays the number of clones which cover the entire coding segment. If a cluster has any fulls, it is considered a full cluster.

Predicted fulls

Predicted Fulls displays the number of clones that have been computationally predicted by the NCBI for the MGC (Mamalian Gene Collection) project to contain the entire ORF based on the 5' EST only. In most cases the 3' EST from that same clone has not been determined but is assumed to contain the complete 3' end of the gene.

Unknowns

Unknowns displays the number of clones 1) for which it is not known whether the clone represents the entire ORF for that gene (ie if only one EST has been determined and it covers only one side of the coding region), and 2) that do not correspond to a known gene (since the transcript size is not known for novel genes.

Partials

Partials displays the number of clones which do not cover the entire coding segment, or have unknown coverage. Unknown coverage is usually due to only having a single EST from a clone. When a cluster has no fulls but does have partials, it is considered a partial cluster.

If all of these categories display 0, then no clones have yet been found which cluster with this gene. The cluster is considered to be an empty cluster.

Display

Ordered Clone List Frame

GenBank

In this column are gene and clone identifiers with links to NCBI's Entrez browser. Above is a button which will bring up descriptions of the gene. Below are links to the dbest entries for clones. These links are intended to provide a jumping off point when information ouside the scope of IMAGEne is necessary. It should be noted that sometimes an asterisk (*) will appear after a GenBank accession number. This indicates that the EST referenced by that accession number is derived from the corresponding clone, but does not appear in this cluster. This can happen for many reasons both biological (ie alternate splicing) and computational (ie low quality sequence data). An example is a clone containing two ESTs where one EST passed the clustering criteria established in the IMAGEne algorithm but the other one failed. The aligned sequence for this cluster would be assembled from only the qualifying EST and would be noted as a partial-coding clone with a length greater than or equal to the length of that EST.

Coverage

Coverage is the classification of a clone to the gene it is believed to be derived from. If a clone seems to cover the entire coding segment on both 5' and 3' ends, it is considered full-coding; if not, it is partial. By definition if a clone is in a Candidate Gene cluster, then its coverage is unknown, because the mRNA length is not known.

Library

Library indicates the source that a clone was taken from. Detailed information on each library can be found at our resources page or by using our library query tool. The relevant Library information on our resources page can also be reached directly by simply clicking on the Library name on the display page.

Vector

Vector indicates the cloning vector used for this library/clone. Detailed information on each vector, including full sequences and maps when available, can be found at our vector page.

Tissue

Tissue indicates the generic tissue type from which the clone was derived. It does NOT indicate normal vs. abnormal, include information about specific sub-tissue(s) or tissue source. These details and more can be found at our resources page or by using our library query tool.

Length

When a clone has ESTs from the 5' and 3' ends which both match well to the same known gene, the length of the clone can be easily and accurately determined. When only one EST from a clone has been determined, or is included in a cluster, a greater than symbol appears in front of the length, indicating that this clone is at least as long as the length of the EST (but is most probably longer).

Sequence Verified By

If this clone's sequence, as it appears in GenBank, has been verified, the name of the group who has reported the verification appears here as a link to either the group's homepage or to a page containing any additional information.

Other Clusters

Normally, a clone could only have been derived from a single gene. Even so it is sometimes ambiguous just which gene it might be.

For this reason a count of the number of other clusters a clone belongs to, or could belong to, is provided. Usually this will be 0, as it should be. However when a clone is found in two or more clusters it will be displayed as a link. This link will search by clone, and return all of the genes it belongs to. Any of these might be the true origin of the clone. "Other clusters" links most often occur with gene family members or other closely related genes.

Alignments Frame

This is where you can view the alignments of the clones or sequences to the known gene or consensus sequences.

Show Alignments

This pulldown menu item controls the alignments applet in the bottom frame of the display. The applet can display merged alignments for clones in the cluster or just the alignments for each sequence.

  • by Clone - IMAGEne will load the applet and will display matches with one clone per line. The top line will always contain the cluster_id and alignment of the known gene or consensus sequenece. Each subsequent line will contain an alignment for a clone and has the clone_id followed by a colon followed by the coverage, followed by the alignment. If a clone has multiple sequences, which clustered, then all the sequences will be condensed into one line and shown in the alignment.
  • by EST - IMAGEne will load the applet and will display matches with one sequence per line. Since this data can be broken down no further then what you see is just the sequence as it matches to the known gene or the consensus sequence.

Contig to Display

This pulldown menu item appears only when view candidate gene clusters with more than one contig in the cluster. It controls which contig of the current Candidate Gene cluster you wish to view. One reason that multiple contigs may be generated for a cluster comes when clustering the sequences together during the build. Two groups of sequences might pull together by a single clone and that clone's two sequences, are from each end and yet there is not a representative sequence for the middle of the clone.







© Copyright 1997 All Rights Reserved
LLNL Disclaimer
UCRL-MI-119848
Web page maintained by
imagene@image.llnl.gov