diArk | faq

General

Purpose of the database

The number of completed eukaryotic genome sequences and cDNA projects has increased exponentially in the past few years although most of them have not been published yet. In addition, many microarray analyses yielded thousands of sequenced EST and cDNA clones. For the researcher interested in single gene analyses (from a phylogenetic, a structural biology or other perspective) it is therefore important to have up-to-date knowledge about the various resources providing primary data. diArk provides comprehensive search modules each with detailed options and three different views of the selected data.

Usage

The web interface has been designed using the next generation internet features (Web 2.0). Thus, the site makes extensive use of Ajax (Asynchronous JavaScript and XML) in order to present the user with a feature rich interface while minimizing the amount of transferred data. To use the database you therefore have to enable javascript and session-cookies.

Usage of the browser search plugin

You can search for any species existing in the diArk. Just type in either the scientific name, the common name, the German name, or the taxonomy. The result will show all matches to your search, e.g. all mammals when searching for "Mammalia".

Either click on image above (Firefox only) or pull down the search menu at the top right corner of your browser window and select "add diArk species search" to install.

How long has the database been available?

The first version of the database went online in October 2006 with release v 1.0 including 364 species, 742 sequencing projects, and 228 publications. For recent updates have a look at the News pages.

How do I cite diArk?

If you used data or tools from diArk in your research please cite

F. Odronitz, M. Hellkamp & M. Kollmar (2007) Open Access

diArk - a resource for eukaryotic genome research.

BMC Genomics 8, 103.

Thank you.

Who assembled / moderates / pays for the database?

Look at the Team and Funding pages for more information.

Browse Database

Why does diArk not list the species that I have recently found at JGI/NCBI/Broad?

All data of diArk is manually entered and thus strongly depends on whether we recognise new projects or publications. Although we are continously checking the corresponding databases and we know of the status of ongoing sequencing projects, we are sure that in some cases you might observe new data faster than we (in the timescale of days, but we are sure that we will detect new data within two weeks). We will provide weekly updates of the database, but during holiday times the interval will surely be longer. Our experience shows that genome assemblies are not continously been released but batch-wise. To be sure that you won't miss any update of diArk you can save your searches and you will be notified about changes.

What criterion is used to include a publication?

We included publications that refer to specific cDNA datasets (e.g. the large scale cDNA sequencing of the nematodes), or that refer to the first description of the genome sequence of a certain species (e.g. the publication of the Osterococcus tauri genome). The list might not be complete for species for which many cDNA datasets have been published or whose genome sequence has been published in several parts (e.g. Homo sapiens). The problem with these publications is that they are very difficult to identify (e.g. if the cDNA dataset has been part of a microarray study). We might also miss upcoming descriptions about new releases of already published genomes.

When is genome or cDNA sequencing supposed to be complete?

The term completeness is intended to describe the coverage of the genome and the chance to find all homologs of the gene of interest. In this respect, EST/cDNA data is always incomplete as most genes are either only partially or not at all covered. Genomic sequencing is thought to be complete if a certain quality and coverage of the assembly is reached. Genome sequences with low assembly coverages (below 3x) and/or short assembled contigs (a few kbp) do not provide enough information to reconstitute even medium sized genes and are also considered incomplete. For example the species included in the Mammalian Genome Project have only been sequenced to a coverage of 2. The resulting contigs are short and most genes are spread over several contigs, in many cases missing several exons. Another example is the low-coverage sequencing of 13 yeast genomes as part of the Genolevures 1 project. Those genome sequences are regarded as incomplete.

Why do I just get a BLAST search form if I click on a "GenBank - NIH genetic sequence database" link?

For many species, the sequence information is not available via a dedicated species home page but only via GenBank. Some of them can be searched for using the genomicBLAST pages. At NCBI, however, there are two possibilities to BLAST against genomic assembly and cDNA/EST data: directly using e.g. TBLASTN choosing the WGS or EST database or by selecting one of the genomicBLAST tables. But there are strong discrepancies between the WGS database and the assemblies available via genomicBLAST. The WGS database contains 145 species while the genomicBLAST tables list only 130 organisms of which 2 are redundant. Missing species in the genomicBLAST tables comprise for example the fish Gasterosteus aculeatus, the plants Ricinus communis and Populus trichocarpa, and the fungus Batrachochytrium dendrobatidis. Therefore, the "GenBank" links (associated to the GenBank reference) of the species projects provide BLAST search forms including the corresponding database (some data is only available from the WGS, other from the EST database) and the corresponding species name.