Where are the alignment files for the exon targetted individuals?


The alignment files which were used as part of the pilot project are found under These are all aligned to NCBI36.

There are also GRCh37 alignments available for both the high coverage individuals ( and the exon targetted individuals (

Please be aware much of the sequence data these alignments are based on is very short read data (36bp) and was generated a long time ago (~2008) so may not reflect current sequencing data.

A more modern data set for the CEU trio is available from Illumina on their Platinum Genomes page.

Which reference assembly do you use?


The reference assembly the 1000 Genomes Project has mapped sequence data to has changed over the course of the project.

For the pilot phase we mapped data to NCBI36. A copy of our reference fasta file can be found on the ftp site.

For the phase 1 and phase 3 analysis we mapped to GRCh37. Our fasta file which can be found on our ftp site called human_g1k_v37.fasta.gz, it contains the autosomes, X, Y and MT but no haplotype sequence or EBV.

Our most recent alignment release was mapped to GRCh38, this also contained decoy sequence, alternative haplotypes and EBV. It was mapped using an alt aware version of BWA-mem. The fasta files can be found on our ftp site

