Our reference data sets can be found in technical/reference/ and this includes items like the reference genome, ancestral alignments and standard annotation sets.
There is also a frozen version of the reference data used for the pilot project available in pilot_data/technical/reference
The reference assembly the 1000 Genomes Project has mapped sequence data to has changed over the course of the project.
For the pilot phase we mapped data to NCBI36. A copy of our reference fasta file can be found on the ftp site.
For the phase 1 and phase 3 analysis we mapped to GRCh37. Our fasta file which can be found here called human_g1k_v37.fasta.gz, it contains the autosomes, X, Y and MT but no haplotype sequence or EBV.
We are currently in the process of remapping the final phase 3 data onto GRCh38.