3,202 samples at high-coverage from NYGC

2020-08-14 00:00:00 +0100

Earlier this year, the New York Genome Center (NYGC) released high-coverage (30x) data for an additional 698 samples from the 1000 Genomes Project sample collections. These 698 samples are related to the original set of 2,504 samples previously sequenced by NYGC. The 2,504 samples are a set of samples unrelated to each other that made up the panel used by the 1000 Genomes Project in its third (and final) phase. This brings the total number of samples sequenced to high-coverage by NYGC to 3,202, in work funded by NHGRI.

NYGC aligned the data to the GRCh38 reference assembly and the CRAMs have been shared and are listed in our data portal. These files can be accessed from FTP sites hosted by EMBL-EBI and NCBI, and are also hosted on AWS and AnVIL. Details on accessing and using the data can be found on our page for this data collection.

This high-coverage data adds to the previous data sets, giving us:

These data collections, covering large numbers of samples, are supplemented by other data collections in IGSR where a wider range of technologies have been applied to subsets of the samples. Genomic sequence data is also available for samples that were not part of the 1000 Genomes Project. Our data portal can be used to explore the main data sets in IGSR, with additional (and preliminary) data sets available via our FTP site.