The Pathology section - methods summary Summary Key publication What can you learn from the Pathology section?How has the data been generated?How has the data been analyzed?What is presented in the section?

The Pathology section - methods summary

Summary

The pathology/cancer section consists of two different parts; 1) information of association between genome-wide RNA expression levels and survival of cancer patients (for nearly 8000 cancer patients representing 17 major types of cancer), and 2) examples of protein expression patterns in cancer tissues (for 216 tumors representing the 20 most common forms of human cancer).

Key publication

Uhlen M et al. (2017) “A pathology atlas of the human cancer transcriptome.” Science 357 (6352): aan2507

What can you learn from the Pathology section?

Learn about:

if the mRNA expression of a gene is prognostic for patient survival in each of the cancer types
if a gene is enriched in a particular cancer type (specificity)
the catalogue of genes elevated in each of the cancer types

How has the data been generated?

Cancer tissues used for protein expression analysis were obtained from the Department of Pathology, Uppsala University Hospital, Uppsala, Sweden as part of the sample collection governed by the Uppsala Biobank (http://www.uppsalabiobank.uu.se/en/). Cases were selected after microscopical examination of representative HE sections. Cores with 1 mm diameter were subsequently obtained from corresponding tissue blocks and transferred into cancer tissue microarrays. All human tissue samples used in the present study were anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board Cancer patient samples used for mRNA expression and survival analysis were collected from The Cancer Genome Atlas (TCGA) project from the initial release of Genomic Data Commons (GDC) on June 6, 2016, and information regarding sex, age and other clinical information can be found at https://gdc-portal.nci.nih.gov/. Only samples with both clinical info and transcriptomic data available at that time point were used in this study.

How has the data been analyzed?

For protein expression analysis, sections from cancer tissue microarrays were immunohistochemically stained and corresponding slides scanned to generate digital images. All images were then analyzed by pathologists and annotated with respect to staining intensity and fraction of positive cancer cells for all approved antibodies. The result of immunohistochemistry-based protein expression was then summarized as high, medium, low or not detected.

For RNA expression analysis, quantified raw sequencing data were downloaded from https://gdc-portal.nci.nih.gov/ in the available format (FPKM tables). Each of the 20,090 genes with mapped RNA-seq data was classified into one of six categories for cancers based on the FPKM levels in 17 cancer types, respectively: (1) Not detected: FPKM <1 in all cancers; (2) Enriched: at least a 5-fold higher FPKM level in one cancer than in all other cancers; (3) Group enriched: a 5-fold higher average FPKM value in a group of 2-7 cancers than in all other cancers; (4) Expressed in all: detected in all 32 cancers with FPKM >1; (5) Enhanced: at least a 5-fold higher FPKM level in one cancer than the average value of all 17 cancers; and (6) Mixed: the remaining genes detected in 1-16 cancers with FPKM >1 that did not fit the above categories.

Based on the FPKM value of each gene, we classified the patients into two groups and examined their prognoses. In the analysis, we excluded genes with low expression, i.e., those with a median expression among samples less than FPKM 1. The prognosis of each group of patients was examined by Kaplan-Meier survival estimators, and the survival outcomes of the two groups were compared by log-rank tests. To choose the best FPKM cut-offs for grouping the patients most significantly, all FPKM values from the 20th to 80th percentiles were used to group the patients, significant differences in the survival outcomes of the groups were examined and the value yielding the lowest log-rank P value is selected. Genes with log rank P values less than 0.001 were defined as prognostic genes. In addition, if the group of patients with high expression of a selected prognostic gene has a higher observed event than expected event, it is an unfavorable prognostic gene; otherwise, it is a favorable prognostic gene.

What is presented in the section?

Kaplan-Meier survival plots which show the prognostic association between RNA expression of each protein-coding genes and patient survival of each of the 17 cancer types were generated. A summary of significant prognostic results is provided in the gene summary page. In addition, the Kaplan-Meier survival plots as well as a scatter plot showing the correlation between RNA expression of the gene and patient survival of a specific cancer type are shown in a cancer type specific gene summary page. The page is interactive, and users can select a subgroup of patients based on, for example, tumor stage i and generate specific plots for the selected subgroup on the website immediately. The user can also us any specific expression cutoff (FPKM value) to produce different Kaplan-Meier and scatter plots. An example of the cancer specific Kaplan-Meier and scatter plot for a gene is shown as below.

The RNA expression levels were summarized across 17 cancer types for all protein-coding genes. The results are presented as shown for the examples a gene enriched in liver cancer in the figure.

Similarly, the protein levels were determined across 20 cancer types for all protein-coding genes, and the results are presented as shown for the same gene as above in the figure.

Moreover, to exemplify protein expression patterns both within one cancer type and between different types of cancer, a multitude of IHC images for more than 17.000 protein-coding genes in 20 human cancer types are also provided in this section. An example of IHC image for a gene from a selected cancer patient is shown as below.