GRCh38 genome accessibility masks for 1000 Genomes data

2016-06-23 00:00:00 +0100

As part of the 1000 Genomes Project, which parts of the genome were accessible to the sequencing methods being used was assessed. This was done by looking at the amount of sequence data that aligned to any given location in the reference genome used by the project (GRCh37). Files were created which could be used to mask regions of the genome, if it was considered that they were not accessible to the technologies used.

The sequence data from the 1000 Genomes Project has since then been aligned to the newer reference genome, GRCh38. Based on these alignments, new accessibility masks on GRCh38 have been created and are now available.

Further details on the new mask files are available and information on how the original mask files were used can be found in the main publication from the 1000 Genomes Project.