This directory contains files associated with the variant calling carried out for the phase1 of the 1000 genomes project and other ancillary files associated with the analysis for phase1.
The phase1 analysis results directory contains a number of sub directories with different content. These are listed here.
This directory contains information about the local ancestry inference which has been carried out on the ad-mixed populations found in the 1000 genomes phase1 samples. These are the African Americans (ASW), Colombians (CLM), Mexicans (MXL) and Puerto Ricans (PUR).
These directories contain the consensus call sets and genotype likelihoods which were used to produce the final integrated release. Please note the indel file in this directory still contains indels which were subsequently filtered out of our integrated data release due to validation efforts. These can be identified by looking at the excluded_indel_sites list [EBI|NCBI].
This directory contains information about which sites were validated for the different variant types and the results of the validation processes.
This contains two directories, annotation_sets contains bed and gtf files which describe the gene and non coding annotation which our variant sets were compared with and annotation_vcfs that contains the actual variant annotation in vcf format.
This directory contains all the union call sets for the snps (both low coverage and exome), indels and deletions that make up the integrated release. The directory contains several vcf files in each file any variant whose filter column reads PASS should be part of the integrated release.
This directory contains our final variant calls for the phase1 data sets. The majority of the data in this directory is identical to what can be found in ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521 but there are also chrY calls for snps and deletions and chrMT calls for snps found here.