First data release: SNP data downloads and genome browser representing four high coverage individuals

2008-12-23 00:00:00 +0000

The first set of SNP calls representing the preliminary analysis of four genome sequences are now available to download through the EBI FTP site and the NCBI FTP site. The README file dealing with the FTP structure will help you find the data you are looking for.

The data can also be viewed directly through the 1000 Genomes browser at http://browser.1000genomes.org. Launch the browser and view a sample region here.

For more information about using the 1000 Genomes browser, download the Quick start guide.

The 1000 Genomes Project announces the release of the first set of SNP calls for 4 individuals that are part of the high coverage pilot project. These SNPs represent the preliminary analysis of a portion of the data so far collected and are released in accordance with the Ft. Lauderdale agreement for community resource projects.

This preliminary release is designed to both provide data to the community and to test the systems used to make the data available including the 1000 Genomes browser available at http://browser.1000genomes.org. Additional updates of this site and the 1000 Genomes browser will take place througout January.

In addition to the SNP files and the 1000 Genomes project browser, raw project data is made available as soon as possible through by NCBI and the EBI. The data consist of README files, fastq files (nucleotides and qualities) suitable for use with most aligners, and md5 checksum data.

Users from the Americas should download the mirrored 1000 Genomes data from NCBI via ftp at: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ or via the Aspera high speed data transfer client at: http://fasp.ncbi.nlm.nih.gov/1000genomes.html.

Users from the Europe and the rest of the would should download the mirrored 1000 Genomes data from the EBI at: ftp://ftp.1000genomes.ebi.ac.uk/. The EBI will be implementing an Aspera client in the near future.

The data available at the EBI and the NCBI are identical. Depending on server load and Internet connectivity some users in the Americas may have faster FTP downloads from the EBI and some users in Europe and the rest of the world may have faster downloads from NCBI.

Raw data for a portion of the project is already available through the Short Read Archive at the NCBI and the European Read Archive at the EBI. These comprehensive archives are specifically designed for short read data and will be making the complete project data available as soon as possible

The 1000 Genomes project plans to release summary data including positions of variants in individuals and populations. These data releases will include SNPs and CNV analysis for the six high-coverage individuals and all of the low coverage individuals. Quarterly data releases are planned starting in January 2009.