Links

Are all the genotype calls in the 1000 Genomes Project current release VCF files bi-allelic?

Answer:

No. While bi-allelic calling was used in earlier phases of the 1000 Genomes Project, multi-allelic SNPs, indels, and a diverse set of structural variants (SVs) were called in the final phase 3 call set. More information can be found in the main phase 3 publication from the 1000 Genomes Project and the structural variation publication. The supplementary information for both papers provides further detail.

In earlier phases of the 1000 Genomes Project, the programs used for genotyping were unable to genotype sites with more than two alleles. In most cases, the highest frequency alternative allele was chosen and genotyped. Depth of coverage, base quality and mapping quality were also used when making this decision. This was the approach used in phase 1 of the 1000 Genomes Project. As methods were developed during the 1000 Genomes Project, it is recommended to use the final phase 3 data in preference to earlier call sets.

Related questions:

Can I get individual genotype information from browser.1000genomes.org?

Answer:

The 1000 genomes browser at browser.1000genomes.org but all data is accessible from the Ensembl browser at grch37.ensembl.org You can see individual genotype information in the browser by looking at the Sample Genotypes section of the a variant page. This can be reached from the menu on the left hand side of the page. You can find a particular variant by putting its rs number in the search box visible at the top right hand corner of every browser page.

Related questions:

What does Genotype Dosage mean in the phase 1 integrated call set?

Answer:

The Genotype Dosage value comes from Mach/Thunder, imputation engine used for genotype refinement in the phase 1 data set.

The Dosage represents the predicted dosage of the non reference allele given the data available, it will always have a value between 0 and 2.

The formula is Dosage = Pr(Het|Data) + 2*Pr(Alt|Data)

The dosage value gives an indication of how well the genotype is supported by the imputation engine. The genotype likelihood gives an indication of how well the genotype is supported by the sequence data.

Related questions:

What is the HLA Diversity in 1000 Genomes Samples?

Answer:

HLA diversity is not something which was studied by the 1000 Genomes Project directly, however, groups have looked at the HLA diversity of the samples in the 1000 Genomes Project.

2018 data

The most recent of these studies was published by Laurent Abi-Rached, Julien Paganini and colleagues in 2018 and covers 2,693 samples from the work of the 1000 Genomes Project. Details of the study and data used in this work are available via the publication and the HLA types are available on our FTP site at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HLA_types/.

2014 data

The FTP site also hosts data from an earlier study by Pierre-Antoine Gourraud, Jorge Oksenberg and colleages at UCSF who carried out an HLA typing assay on DNA sourced from Coriell for 1000 Genomes samples. This earlier study looks at only the 1,267 samples that were available at that time.

The earlier work assessing HLA Diversity is publised in “HLA diversity in the 1000 Genome Dataset”, with data available from the 1000 Genomes FTP site in ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20140725_hla_genotypes/.

Related questions: