The MAGE RNA-seq study provides transcriptome data from a large sample of 1000 Genomes Project cell lines. Poly-A selected bulk RNA from 731 lymphoblastoid cell lines (LCLs) spanning all 26 canonical 1000G populations was sequenced on the Illumina NovaSeq 6000 platform (150 bp paired-end, unstranded). A subset of 24 samples were sequenced in triplicate (replicates within and between sequencing batches).

The work is described in the article “Sources of gene expression variation in a globally diverse human cohort” by Taylor et al. (2024).

Available data:

Raw FASTQ reads are available on the Sequence Read Archive (Accession: PRJNA851328) and hosted on the ENA FTP (ftp.sra.ebi.ac.uk). Forward (R1) and reverse (R2) paired-end reads are provided as separate files. These files can be browsed by sample, population, and data type below.

Processed gene expression matrices and e/sQTL mapping results as well as other downstream data are described and linked on the MAGE GitHub repository and available for download from Zenodo.

Cell lines:

The data in this collection is from immortalised cell lines generated from samples collected by the 1000 Genomes Project. These cell lines are available from the Coriell institute.

Questions:

Questions about these data can be directed to Rajiv McCoy (rajiv.mccoy@jhu.edu).