GRCh38 alignments for Exome and High Coverage 1000 Genomes Data

2015-12-16 00:00:00 +0000

We hare realigned exome data from 2692 samples and high coverage PCR-free data from 24 samples, generated for the 1000 Genomes Project to the GRCh38 human assembly.

The alignment is against the full assembly including the GRC maintained alternate loci sequences and decoy and additional HLA sequences from the IMGT.

Our fasta file can be found in the reference directory.

The alignment was carried out using a new alt-aware version of BWA-mem. The alignment files themselves can be found in the data_collections/1000_genomes_project/data directory. The exome alignment index, high coverage alignment index and sequence.index can be found in the data_collections/1000_genomes_project directory.

Please note, these files are now being distributed in CRAM format, rather than BAM format. You can find more details about CRAM in this README. Full details of our alignment pipeline can be found in the alignment pipeline README.

If you have any questions please email info@1000genomes.org.