GRCh38 mapping of the 1000 Genomes low coverage data is now available

2015-10-10 00:00:00 +0100

We have realigned the low coverage 1000 Genomes sequence data to GRCh38. We aligned to the full assembly including the GRC maintained alternate loci sequences and decoy and additional HLA sequences from the IMGT, our fasta file can be found in our reference directory. The alignment was carried out using a new alt-aware version of BWA-mem

The alignment files themselves can be found in the data_collections/1000_genomes_project/data directory.

The alignment index and sequence.index can be found in the data_collections/1000_genomes_projectdirectory.

Please note, these files are now being distributed in CRAM format, rather than BAM format. You can find more details about CRAM in this README

Full details of our alignment pipeline can be found in the alignment pipeline README

If you have any questions please email