How do I find the familial relationships between the individuals?


While the main outputs of the 1000 Genomes Project’s phase three work focused on 2,504 unrelated individuals, we also hold data from related samples. Frequently, these are trios (parents and child), with some families also indluding further generations.

For the 1000 Genomes Project phase three analysis, the relationships between the samples are recorded in a .ped pedigree file on our FTP site. This is based on both the known relationships and analysis of the data generated for this work.

In some instances, analysis of a given set of data may suggest different relationships from those originally recorded. This could be due to a number of possible reasons, including, for example, an error in relationship recording or an accidental sample swap during data generation. Where such concerns exist for a data set, this is reflected in the pedigree file. We have pedigree files accompanying analyses of different sets of data that may list different relationships or concerns from each other, based on the different data sets, although such cases are very rare.

In general, to understand the relationships between large sets of samples, consulting the accompanying pedigree file is the best approach.

Relationship information is also recorded in our data portal for all samples. Further, sample relationship information is also held at Coriell and CEPH along with the cell lines.

Related questions:

Which datasets include related individuals?


The 1000 Genomes Project, which was concluded in 2015, focussed its analysis on unrelated individuals, with the panel analysed being composed of 2,504 unrelated individuals. However, they did generate data for some related individuals. These are listed in the files relating to related individuals.

Subsequent projects have included related individuals in their full analyses. These include the initial phase and subsequent work of the Human Genomes Structural Variation Consortium (HGSVC) and the generation of high-coverage data for the full set of 1000 Genomes Project samples, including the 2,504 unrelated individuals but adding 698 samples related to samples in the 2,504 panel, bringing the set to a total of 3,202 samples with high coverage data.

To check the relationships in a given data set, the accompanying publications, pedigree files and similar should be consulted. Individual sample relationships are also listed in out data portal.

Related questions: