What is the difference between your data directory and the pilot_data/data directory?

The data directory represents the most current up-to-date view of sequence and alignment data available for the project. We also have a frozen data set which represents the data which was aligned for the pilot project as published in Nature in 2010.

An important difference to note is that while the main project data is all mapped to the GRCh37 assembly the pilot project was mapped to the NCBI36 assembly so positions of variants and alignments reported in the pilot_data directory will be different to what you see in the main project and many genome browsers. Genome browser and variant database also display the 1000 Genomes variants re-mapped to GRCh38, so these will give different coordinates again; you can access GRCh37 on Ensembl and UCSC genome browsers.

Related questions: