Were the same analysis tools used for every sample in one data collection?

Answer:

The analysis tools used for samples in a given data collection can vary depending on the technologies (i.e. PacBio, exome, etc.) that were used to generate data for any given sample. The technologies may not be the same across all samples in a collection.

Generally, however, for any given analysis the data types will be the same or very similar and will have been analysed in a consistent manner.

For data collections where the analysis has been published, the publication will give details of what methods were used. Checking this may involve looking at the supplementary material. We list publcations on our data collections page. In addition, for analyses which may not have been published, you will find README files on our FTP site, in the directories for each data collection, that provide further information.

Related questions:

Which datasets include related individuals?

Answer:

The 1000 Genomes Project, which was concluded in 2015, focussed its analysis on unrelated individuals, with the panel analysed being composed of 2,504 unrelated individuals. However, they did generate data for some related individuals. These are listed in the files relating to related individuals.

Subsequent projects have included related individuals in their full analyses. These include the initial phase and subsequent work of the Human Genomes Structural Variation Consortium (HGSVC) and the generation of high-coverage data for the full set of 1000 Genomes Project samples, including the 2,504 unrelated individuals but adding 698 samples related to samples in the 2,504 panel, bringing the set to a total of 3,202 samples with high coverage data.

To check the relationships in a given data set, the accompanying publications, pedigree files and similar should be consulted. Individual sample relationships are also listed in out data portal.

Related questions:

Why are there gaps in the X and Y chromosome?

Answer:

The X and Y chromosomes frequently require different analytical approaches as, unlike the autosomes (chromosomes 1-22), they are not present in two copies in all samples. As such, the X and Y chromsomses are not always included in analyses or may be analysed using an alternative methodology.

The pseudoautosomal regions (PARs) are regions of X and Y that share homology. As such, these regions can be amenable to analysis in a similar manner to the autosomes and may be included in analyses where the remainder of these chromsomes are exlcuded.

Analysis of the high-coverage sequencing of the 1000 Genomes Project samples by NYGC has analysed both the X and Y chromosomes in full on GRCh38. Both chromsomes were also analysed by the original 1000 Genomes Project phase three, which analysed data on GRCh37. Full details are in the associated publications, listed alongside the data collections.

Other collections, such as the reanalysis of the original 1000 Genomes Project data realigned to GRCh38, only consider the PAR regions. Again, to understand the work that was done we recommend consulting the publications listed with the data collections.

Related questions: