Does the 1000 Genomes Project use HapMap data?


The 1000 Genomes Project shares some samples with the HapMap project; any sample which starts with NA was likely part of the HapMap project. In the pilot stages of the project HapMap genotypes were also used to help quality control the data and identify sample swaps and contamination. Since phase 1 the HapMap data has not been used by the 1000 Genomes Project, and all genotypes were independantly identified by 1000 Genomes.

Related questions:

How do I navigate the FTP site to find the files I need?


The easiest way to find the files you’re looking for is with the Data Portal. You can search for individuals, populations and data collections, and filter the files by data type and technologies. This will give you locations of the files, which you can use to download directly, or to export a list to use with a download manager.

Related questions:

How was exome and exon targetted sequencing used?


The 1000 Genomes Project has run two different pull-down experiments. These are labelled as “exon targetted” and “exome”.

An exon targetted run is part of the pilot study which targetted 1000 genes in nearly 700 individuals. The targets for this pilot can be found in the pilot_data/technical/reference directory.

An exome run is part of the whole exome sequencing project which targetted the entirety of the CCDS gene set. The targets used for the phase 1 data release of 1092 samples can be found in technical/reference/exome_pull_down_targets_phases1_and_2; the targets for phase3 analysis can be found in technical/reference/exome_pull_down_targets/.

The phase 1 and 2 targets are intersections of the different technologies used and the CCDS gene list. For phase 3 we used using a union of two different pull-down lists: NimbleGen EZ_exome v1 and Agilent sure select v2. In phase 3 very little exome specific calling took place. Instead analysis groups called variants tending to use the low coverage and exome data together in an integrated manner.

Capture technology

Different centres have used different pull-down technologies for the Exome sequencing done for the 1000 Genomes project.

Baylor College of Medicine used NimbleGen SeqCap_EZ_Exome_v2 for its Solid based exome sequencing. For its more recent Illumina based exome sequencing it used a custom array HSGC VCRome.

The Broad Institute has used Agilent SureSelect_All_Exon_V2 ( using ELID: S0293689).

The BGI used NimbleGen SeqCap EZ exome V1 for the phase 1 samples and NimbleGen SeqCap_EZ_Exome_v2 for phase 2 and 3 (the v1 files were obtained from BGI directly; they are discontinued from Nimblegen).

The Washington University Genome Center used Agilent SureSelect_All_Exon_V2 ( using ELID: S0293689) for phase 1 and phase 2, and NimbleGen SeqCap_EZ_Exome v3 for phase 3

Related questions: