Cell Line - Methods summary

Summary

The Cell Lines section contains information on genome-wide RNA expression profiles of human protein-coding genes in human cell lines. The transcriptomics analysis covers 69 human cell lines and includes classification based on specificity, distribution and expression clusters.

Key publication

Uhlen M et al. (2019) “A genome-wide transcriptomic analysis of protein-coding genes in human blood cells” Science 366 (6472): aax9198

What can you learn from the Cell Lines section?

Learn about:

  • if a gene is enriched in a particular human cell line (specificity)
  • which genes have a similar expression profile across the cell lines (expression cluster)
  • the catalogue of genes elevated in each of the cell lines

How has the data been generated?

A genome wide expression analysis of 69 human cell lines was performed using RNA seq with early-split samples as duplicates.

How has the data been analyzed?

The transcript abundance of each protein-coding gene was estimated using the average TPM value of the individual samples for each cell line. The transcriptomics data was then used to cluster the both the cellines and the genes in order to find (i) the relation between the cell lines and (ii) the highest correlating genes and further to classify all genes according to their cell line-specific expression.

What is presented in the section?

The RNA expression levels were determined for all protein-coding genes (n = 20090) across the 69 human cell lines and the results are presented on the gene summary page of the Cell Lines section as exemplified in the figure.


How has the classification of all protein-coding genes been done?

A genome-wide classification of the protein-coding genes with regard to cell line distribution as well as specificity has been performed using between-sample normalized data. The results can serve as a reference for researchers interested in expression profiles of human cell lines. The genes were classified according to specificity into (i) cell line enriched genes with at least four-fold higher expression levels in one cell line as compared with any other analysed cell line; (ii) group-enriched genes with enriched expression in a small number of cell lines (2 to 5); and (iii) cell line enhanced genes with only moderately elevated expression. In addition, all genes were classified according to distribution in which each gene is scored according to the presence (expression levels higher than a cut-off) in the cell lines. In addition, all genes were classified according to distribution in which each gene is scored according to the presence (expression levels higher than a cut-off) in the cell lines.

The cell line enriched and group enriched genes are displayed in the interactive plot below in which clicking on the red and orange circles results in gene lists for the corresponding enriched and group enriched genes, respectively.

Finally, a new classification has been introduced in which genes are clustered based on similarity in expression across the cell lines. The results are presented as an interactive UMAP plot in which mouse-over displays general information for the clusters and the clicking on a cluster will display more information and plots regarding that specific cluster, as well as, a clickable list of all clusters.