What do the pilot project, phase 1, phase 2 and phase 3 mean?

The 1000 Genomes Project was divided into stages. Initially, a set of pilot projects were undertaken, followed by the main project, which was broken into three phases.

The initial part of the Project was called the pilot project. This was split into three pilot studies: the low coverage pilot (pilot 1), the high coverage pilot (pilot 2) and the exon targeted pilot (pilot 3). This data was completed in 2009 and published in Nature in 2010. All of the data associated with the pilot projects is available in the pilot_data directory on the FTP site.

Phase 1 represented low coverage and exome data analysis for the first 1092 samples. The phase 1 low coverage alignments and exome alignments are available in the phase 1 directory on the FTP site. Analysis of phase 1 was published in 2012. The analysis results associated with the paper can be found in the phase 1 analysis_results directory. The low coverage sequence data from phase 1 is listed in the 20101123 sequence index and the exome data in the 20110521 sequence index.

During phase 2 the set of samples expanded to around 1700 in number. The sequence data is represented in the 20111114 sequence index. This data was used for method development, to both improve on existing methods from phase 1 and also develop new methods to handle features like multi-allelic variant sites and true integration of complex variation and structural variants.

Phase 3 represents 2504 samples, including additional African samples and samples from South Asia. The methods developed in phase 2 were applied to this data set and a final catalogue of variation was released on the FTP site. These results were published in two publications in 2015, one covering the main project and the other focusing on structural variation.

Related questions: