Which reference assembly do you use?

The reference assembly the 1000 Genomes Project has mapped sequence data to has changed over the course of the project.

For the pilot phase we mapped data to NCBI36. A copy of our reference fasta file can be found on the ftp site.

For the phase 1 and phase 3 analysis we mapped to GRCh37. Our fasta file which can be found here called human_g1k_v37.fasta.gz, it contains the autosomes, X, Y and MT but no haplotype sequence or EBV.

We are currently in the process of remapping the final phase 3 data onto GRCh38.

Related questions: