Overview

The International Genome Sample Resource (IGSR) was established at EMBL-EBI in January 2015. The resource was established with three main aims, to:

  1. Ensure maximal usefulness and relevance of the existing 1000 Genomes data resources
  2. Extend the resource for the existing populations
  3. Expand the resource to new populations

The first aim will start with the remapping of the existing low coverage and exome data to the new version of the human reference assembly, GRCh38. The second aim will bring together other functional and sequence data that has been generated on the 1000 Genomes cell lines, such as the Geuvadis RNA-Seq data and the high coverage and long read data that the 1000 Genomes Structural Variant group is continuing to generate, in order to present a uniform analysis set. The final aim is to expand the resource to new populations; the IGSR has been funded to support the addition of new populations to the 1000 Genomes dataset and this document describes the principles of that process.

The IGSR recognises that the current 1000 Genomes Project samples do not reflect all populations. An important aim for the IGSR is to expand the populations represented in the collection and to ensure that the public data represents maximum possible population diversity. This will ensure that the 1000 Genomes dataset remains a valuable open resource for the community over the next five years. The IGSR will work with the groups who were unable to contribute samples to the 1000 Genomes Project prior to the completion of sample collection, and will investigate collaborations with other groups to ensure that population diversity gaps are filled.

Here we propose a process for oversight of consent, sample collection, data production and data and cell line dissemination. The IGSR wishes to ensure that any new population cohorts collected and their associated data allow for public data release, and broad use of the data and samples. In this way, the data can be made available alongside the HapMap and 1000 Genomes Project samples and data.

The IGSR has no funds to support sample collection or data generation. This aspect of the project will need to be self-funded by sample collection groups or funded by third parties. The IGSR is funded to provide ethical review and, data coordination as well as analysis and integration of new population collections established and accepted into the IGSR.

In order for a population collection to be accepted as part of the IGSR, it must meet the following criteria:

  1. Confirmation that the Consent, Ethics Review and Sampling Process meet the criteria established for the 1000 Genomes Project. We suggest that the applicant confirm to the P3G-IPAC that these criteria have been sought prior to sample collection, as was done by the Samples and ELSI subgroup for 1000 Genomes Project sample sets. P3G-IPAC, will then provide approval for acceptance for deposit in IGSR. Once approved, a three-letter code would be provided, as for HapMap or 1000 Genomes Project sample sets;
  2. Recommendation by the IGSR Geographical and Population Advisory Board (GPAB) that the population represents a valuable addition to the IGSR dataset and expands the global diversity found in the data;
  3. Collection of primary sequence data and deposition in the public sequence archives (GenBank, ENA). The IGSR will confirm that these data meet the quantity and quality criteria already established as part of the 1000 Genomes Project. The IGSR will undertake alignment and variant calling, and provide an integrated set of haplotypes that will include the new sample collection within the existing 1000 Genomes data. The IGSR will provide unrestrained public access for download as well as interactive use; and,
  4. If possible, establishment of cell lines and deposition of them in an approved cell line repository, either Coriell or another equivalent repository.

The following information covers useful details for groups planning to create new sample collections. We cover details about educating local Institutional Review Boards/Research Ethics Committees (IRB/RECs) about the types of procedures that were used successfully for sample collection from other populations, in a way that allowed public data release and broad use of data and samples. This commentary includes considerations with respect to cell line production.

The 1000 Genomes Samples and ELSI group determined that all consent forms, for the 1000 genomes project, were required to explicitly state the following items. A similar review would be undertaken for any new populations being added to the IGSR.

Sampling process

Sample collection review process

Cell lines

Data production

Data release

Costs

IGSR includes no funding for sample collection or data production. The IGSR will work with the sample collection group to create alignments and variant calls integrated with the main 1000 Genomes datasets once sequence data is submitted to the public archives. Groups wishing to contribute the samples and data need to cover the costs of sample collection process review, sample collection, cell line transformation, and sequence and genotype data production

This document was approved on the 30th June 2015