How do I find out information about a single variant?

Our VCF files contain global and super population alternative allele frequencies. You can see this in our most recent release. For multi allelic variants, each alternative allele frequency is presented in a comma separated list.

An example info column which contains this information looks like

1 15211 rs78601809 T G 100 PASS AC=3050;AF=0.609026;AN=5008;NS=2504;DP=32245;EAS_AF=0.504;AMR_AF=0.6772;AFR_AF=0.5371;EUR_AF=0.7316;SAS_AF=0.6401;AA=t|||;VT=SNP

If you want population specific allele frequencies you have three options: * For a single variant you can look at the population genetics page for a variant in the Ensembl browser. This gives you piecharts and a table for a single site. * For a genomic region you can use our allele frequency calculator tool which gives a set of allele frequencies for selected populations * If you would like sub population allele frequences for a whole file, you are best to use the vcftools command line tool.

This is done using a combination of two vcftools commands called vcf-subset and fill-an-ac

An example command set using files from our phase 1 release would look like

grep CEU integrated_call_samples.20101123.ALL.panel | cut -f1 > CEU.samples.list

vcf-subset -c CEU.samples.list ALL.chr13.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf.gz | fill-an-ac |
    bgzip -c > CEU.chr13.phase1.vcf.gz
    </pre>

Once you have this file you can calculate your frequency by dividing AC (allele count) by AN (allele number).

Please note that some early VCF files from the main project used LD information and other variables to help estimate the allele frequency. This means in these files the AF does not always equal AC/AN. In the phase 1 and phase 3 releases, AC/AN should always match the allele frequency quoted.

Lists of identifiers

You can get information about a list of variant identifiers using Ensembl’s Biomart.

This YouTube video gives a tutorial on how to do it.

The basic steps are:

  1. Select the Ensembl Variation Database
  2. Select the Homo sapiens Short Variants (SNPs and indels excluding flagged variants) dataset
  3. Select the Filters menu from the left hand side
  4. Expand the General Variant Filters section
  5. Check the Filter by Variant Name (e.g. rs123, CM000001) [Max 500 advised] box
  6. Add your list of rs numbers to the box or browse for a file which contains this list
  7. Click on the Results Button in the headline section
  8. This should provide you with a table of results which you can also download in Excel or CSV format

If you would like the coordinates on GRCh38, you should use the main Ensembl site, however if you would like the coordinates on GRCh37, you should use the dedicated GRCh37 site.

Related questions: