[
Search
] - [
Help
] - [
FAQ
] - [
FTP data
] - [
Release Notes
] - [
Build Procedure
]
[
Contact Us
] - [
Related Links
] - [
IMAGE Home
]
Integrated Molecular Analysis of Genomes and their Expression. The I.M.A.G.E. Consortium was founded in 1993 to accelerate gene discovery through the use of arrayed cDNA libraries, and to aid in the accumulation of sequence, map, and expression information for all genes. One of the initial goals of the Consortium was to create a non-redundant set of unique genes representing the complete set of human transcripts, and to provide this resource to the research community as a basis for the analysis of the human genome. Recently, the Consortium has begun to focus on the genomes of model organisms, such as mouse and zebrafish, to complement the work being done with human clones. In addition, the I.M.A.G.E Consortium is a part of the NCI Cancer Genome Anatomy Project in which cDNA clones derived from tumor libraries will be used to study gene expression patterns in tumors, and the NIH Mammalian Gene Collection, focusing on obtaining full-length cDNA clones.
All clones are available from any of our authorized distributors, and all sequence obtained from the clones is submitted immediately to Genbank.More information is available through our web page at http://image.llnl.gov
IMAGEne is a software package for clustering IMAGE clones/ESTs to known genes, and to each other. It is a useful tool to aid in the re-array of IMAGE clones for public distribution. The publicly accessible web interface that by now you've seen has a many purposes
We created a document expressly to address this. It can be found here.
A variety of rearrays of IMAGE cDNAs are available now through our authorized distributors. Please contact them for most current descriptions.
The IMAGEne database is rebuilt with each major release of GenBank, which is currently every two months. For details of each release see our release notes.
Email us at imagene@image.llnl.gov to discuss your interest.
Consensus sequences can be obtained via our anonymous ftp site at image.llnl.gov. The data is located within the /image/imagene directory. The data is marked with the IMAGEne version number in the filename, thus Imagene 3.0 data can be found in the file called consensus_seq_3.0.fasta.
Master and candidate_gold listings can be obtained via our anonymous ftp site at image.llnl.gov. These listings are located within the /image directory and are appended with the IMAGEne version number they were derived from.
A master listing contains the "best" clone for each known gene. A candidate_gold listing is a subset of the master list, containing only full-coding clones.
Use of an IMAGE clone can be referenced as: Lennon, G.G., Auffray, C., Polymeropoulos, M., Soares, M.B. The I.M.A.G.E. Consortium: An Integrated Molecular Analysis of Genomes and their Expression. Genomics 33:151-152 [1996].
Use of the IMAGEne tool can be referenced as: Cariaso, M., Folta, P., Lennon, G., Wagner, M., Kuczmarski, T.; IMAGEne I: The clustering of ESTs corresponding to known genes. Bioinformatics [Volume 15, Number 11 pp 965-973].
Please see the Linking to Imagene web page
There are both biological and computational reasons for this, two examples might be alternative splicing or low quality sequence.
An EST may not be included in a known gene cluster even if another EST from the same clone is included. Some reasons for this are sequencing errors or incorrect EST to clone ID associations in GenBank. Beginning with release 3.3, we have made efforts to improve the accuracy of the clusters by noting any problems found/reported with IMAGE clones in our problem database, and excluding those clones from future IMAGEne builds.
All Cluster IDs not in GenBank format belong to the Candidate Gene clusters and appear in the form CXXXXXX-YY, where X and Y are digits (eg.C001496-01). The 'C' in the id stands for cluster, the X's are the cluster number and the Y's are the contig number of that cluster. So, C001496-01 is Cluster 1496, contig 1. The candidate gene cluster IDs that we generate have reserved ranges to accomodate for multiple species. IDs less than C100000 are for human and less than C200000 are for mouse.
The difference between a cluster and a contig comes about because of the methods by which we form clusters. A candidate gene cluster is formed by first blasting all ESTs against each other (level 1 clustering) in order to determine similiarity. We then merge those sets if they have more than one clone in common (level 2 clustering). Since clone sequences are ESTs and usually don't represent the full insert of the clone, you can have gaps in the consensus sequence of a cluster. Each gap splits a cluster into contigs, or groups of contiguous sequence. The most common cause for multiple contigs is that the 5' ends of clones cluster together and the 3' ends of clones cluster together, but the two groups of sequences do not overlap with each other.
So in order to see all of the sequence of a candidate gene cluster you must view all of its contigs and realize there are unknown gap lengths between contigs.
The IMAGEne algorithms currently make no serious attempt to address alternative splices in the EST cluster for a known gene. This may change in the future. Right now the clustering results for alternatively spliced genes depend on the number of sequences in the cluster and the length of the alternatively spliced section.
In about 13% of the known genes from GenBank, the specified coding sequence does not start with ATG. (This may indicate an incomplete coding sequence, or other situations). The current IMAGEne algorithm will mark a clone as full-coding if it is homologous to both the 3' and 5' ends of the coding sequence as specified in GenBank.
The number of extra dashes ('-') to put before and after the sequence displayed by the Java is determined based on the alignment of the other sequences in the cluster. There is at least one sequence in the displayed cluster beginning at the rightmost edge, and at least one ending at the leftmost edge.
Due to the fact that clusters can become very large, only the top clusters (currently 500), ordered by relevance, are displayed. This was done in an effort to speed up the HTML and Java display, and avoid bogging down the user's system. If the sequence(s) determining the right- and leftmost edge are below this limit, they will not be displayed, producing the empty space that you are seeing.
|
© Copyright 1997 All Rights Reserved
LLNL Disclaimer UCRL-MI-119848 |
Web page maintained by imagene@image.llnl.gov |