[
Search
] - [
Help
] - [
FAQ
] - [
FTP data
] - [
Release Notes
] - [
Build Procedure
]
[
Contact Us
] - [
Related Links
] - [
IMAGE Home
]
You can read this document in the traditional manner, or just call it up as needed. Throughout IMAGEne whenever help is available (and hopefully whenever needed) there will be a Bold Link, which will take you to the relevant section.
Integrated Molecular Analysis of Genomes and their Expression. The I.M.A.G.E. Consortium was founded in 1993 to accelerate gene discovery through the use of arrayed cDNA libraries, and to aid in the accumulation of sequence, map, and expression information for all genes. One of the initial goals of the Consortium was to create a non-redundant set of unique genes representing the complete set of human transcripts, and to provide this resource to the research community as a basis for the analysis of the human genome. Recently, the Consortium has begun to focus on the genomes of model organisms, such as mouse and zebrafish, to complement the work being done with human clones. In addition, the I.M.A.G.E Consortium is a part of the NCI Cancer Genome Anatomy Project in which cDNA clones derived from tumor libraries will be used to study gene expression patterns in tumors, and the NIH Mammalian Gene Collection, focusing on obtaining full-length cDNA clones.
All clones are available from any of our authorized distributors, and all sequence obtained from the clones is submitted immediately to Genbank.More information is available through our web page at http://image.llnl.gov
IMAGEne is a software package for clustering IMAGE clones/ESTs to known genes, and to each other. It is a useful tool to aid in the re-array of IMAGE clones for public distribution. The publicly accessible web interface that by now you've seen has a dual purpose.
Known gene clusters are groups of Clones / ESTs that are homologous to the best known representations of known gene sequences: NCBI Reference Sequences. All known gene clusters fall into five categories, and a count is provided for each.
Candidate gene clusters are gene clusters that are derived from EST and full-insert sequences, but do not correspond to any members of the NCBI Reference Sequence set.
IMAGEne's data set can searched using any one of five distinct methods. (Note: The search tool is case sensitive)
Will allow queries by one or more keywords logically ANDed together. Keywords may be portions of the gene's GenBank accession number, its proper name, abbreviated name, or related words. None of these is guaranteed to exist, but with great frequency some or all of these aspects are available.
Ex. Query on the phrase "card". A table of results will be returned. Matches were found on the words cardiac and cardiotrophin.
If a query produces too many results you may wish to refine your search. This can be done in two ways: additional letters, or additional words. By searching on "cardio" there will be fewer matches. When all relevant letters have been specified it may be necessary to use additional words in the query. To illustrate search for "cardiac protein". Only those entries with both words are returned.
This query can also be used when you know the GenBank Accession number of a gene. Searching on "NM_000206" will return the cluster pertaining to that sequence. You can also search using ranges. For example searching on "NM_00020" will return all the genes with GenBank IDs in the ranges NM_000200-NM_000209.
This is a much simpler query. Provide one or more IMAGE Clone IDs and a complete list of clusters containing one or more of the clones is returned. Substring matches are not permitted. The known or candidate gene clusters containing any of these clones will be returned.
Ex. Query on "510700 123456".
Ex. Query on "R91111 AA132727".
Matches are returned sorted by the quality of the match.
The number of matches can be controlled by changing the Minimum Blast2 Score.
This method will allows searching by one or more known gene or candidate gene cluster ID's.
Allows a search to be conducted against all species for which Imagene has clusters or for just a particular species of interest.
This parameter is only significant when doing a search by sequence. All other times it can be safely ignored. When it has been set below Blast2's default it will be ignored and all possible matches will be displayed. Setting it too high will cause no scores to be returned. The notion of a 'reasonable' setting varies with the particular sequences you may be using. If you are having difficulty try setting this to 0 and increase it if you are finding more matches than you expect.
Each row in this table is the entry for the cluster of a particular gene. The button links to the display. Note that Candidate Gene clusters do NOT have consistent IDs from build to build. This is due to the nature of the clusters being dynamic and the fact that the consensus sequence for a cluster could change in subsequent builds due to the entrance of new clones.
This description is the reduced description for a gene. Common phrases such as "gene", "human" and "complete cds" have been removed. This should help avoid false matches.
Fulls displays the number of clones which cover the entire coding segment. If a cluster has any fulls, it is considered a full cluster.
Predicted Fulls displays the number of clones that have been computationally predicted by the NCBI for the MGC (Mamalian Gene Collection) project to contain the entire ORF based on the 5' EST only. In most cases the 3' EST from that same clone has not been determined but is assumed to contain the complete 3' end of the gene.
Unknowns displays the number of clones 1) for which it is not known whether the clone represents the entire ORF for that gene (ie if only one EST has been determined and it covers only one side of the coding region), and 2) that do not correspond to a known gene (since the transcript size is not known for novel genes.
Partials displays the number of clones which do not cover the entire coding segment, or have unknown coverage. Unknown coverage is usually due to only having a single EST from a clone. When a cluster has no fulls but does have partials, it is considered a partial cluster.
If all of these categories display 0, then no clones have yet been found which cluster with this gene. The cluster is considered to be an empty cluster.
In this column are gene and clone identifiers with links to NCBI's Entrez browser. Above is a button which will bring up descriptions of the gene. Below are links to the dbest entries for clones. These links are intended to provide a jumping off point when information ouside the scope of IMAGEne is necessary. It should be noted that sometimes an asterisk (*) will appear after a GenBank accession number. This indicates that the EST referenced by that accession number is derived from the corresponding clone, but does not appear in this cluster. This can happen for many reasons both biological (ie alternate splicing) and computational (ie low quality sequence data). An example is a clone containing two ESTs where one EST passed the clustering criteria established in the IMAGEne algorithm but the other one failed. The aligned sequence for this cluster would be assembled from only the qualifying EST and would be noted as a partial-coding clone with a length greater than or equal to the length of that EST.
Coverage is the classification of a clone to the gene it is believed to be derived from. If a clone seems to cover the entire coding segment on both 5' and 3' ends, it is considered full-coding; if not, it is partial. By definition if a clone is in a Candidate Gene cluster, then its coverage is unknown, because the mRNA length is not known.
Library indicates the source that a clone was taken from. Detailed information on each library can be found at our resources page or by using our library query tool. The relevant Library information on our resources page can also be reached directly by simply clicking on the Library name on the display page.
Vector indicates the cloning vector used for this library/clone. Detailed information on each vector, including full sequences and maps when available, can be found at our vector page.
Tissue indicates the generic tissue type from which the clone was derived. It does NOT indicate normal vs. abnormal, include information about specific sub-tissue(s) or tissue source. These details and more can be found at our resources page or by using our library query tool.
When a clone has ESTs from the 5' and 3' ends which both match well to the same known gene, the length of the clone can be easily and accurately determined. When only one EST from a clone has been determined, or is included in a cluster, a greater than symbol appears in front of the length, indicating that this clone is at least as long as the length of the EST (but is most probably longer).
If this clone's sequence, as it appears in GenBank, has been verified, the name of the group who has reported the verification appears here as a link to either the group's homepage or to a page containing any additional information.
Normally, a clone could only have been derived from a single gene. Even so it is sometimes ambiguous just which gene it might be.
For this reason a count of the number of other clusters a clone belongs to, or could belong to, is provided. Usually this will be 0, as it should be. However when a clone is found in two or more clusters it will be displayed as a link. This link will search by clone, and return all of the genes it belongs to. Any of these might be the true origin of the clone. "Other clusters" links most often occur with gene family members or other closely related genes.
This pulldown menu item controls the alignments applet in the bottom frame of the display. The applet can display merged alignments for clones in the cluster or just the alignments for each sequence.
This pulldown menu item appears only when view candidate gene clusters with more than one contig in the cluster. It controls which contig of the current Candidate Gene cluster you wish to view. One reason that multiple contigs may be generated for a cluster comes when clustering the sequences together during the build. Two groups of sequences might pull together by a single clone and that clone's two sequences, are from each end and yet there is not a representative sequence for the middle of the clone.
|
© Copyright 1997 All Rights Reserved
LLNL Disclaimer UCRL-MI-119848 |
Web page maintained by imagene@image.llnl.gov |