The New York Genome Center (NYGC), funded by NHGRI, has sequenced 3202 samples from the 1000 Genomes Project sample collection to 30x coverage. Initially, the 2504 unrelated samples from the phase three panel from the 1000 Genomes Project were sequenced. Thereafter, an additional 698 samples, related to samples in the 2504 panel, were also sequenced. NYGC aligned the data to GRCh38 and those alignments are publicly available along with a data reuse statement. Details, including URLs for the data in ENA, are in our data portal (below) and are listed on our FTP site. The alignments can be accessed at the following locations:
NYGC have performed variant calling on the data and the resulting call sets are available on our FTP site. These include:
The initial GATK call set for the 2504 unrelated samples remains available.
A preprint is available describing this work, which can be used for citation purposes.