The Human Protein Atlas

The Human Protein Atlas is a Swedish-based program initiated in 2003 with the aim to map all the human proteins in cells, tissues, and organs using an integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics, and systems biology. All the data in the knowledge resource is open access to allow scientists both in academia and industry to freely access the data for exploration of the human proteome.
The Human Protein Atlas consists of ten separate sections, each focusing on a particular aspect of the genome-wide analysis of the human proteins:

  • The Tissue section, showing the distribution of the proteins across all major tissues and organs in the human body
  • The Brain section, exploring the distribution of proteins in various regions of the mammalian brain
  • The Single Cell Type section, showing expression of protein-coding genes in single human cell types based on scRNA-seq
  • The Tissue Cell Type section, showing expression of protein-coding genes in human cell types based on bulk RNAseq data
  • The Pathology section, showing the impact of protein levels for the survival of patients with cancer
  • The Immune Cell section, showing expression of protein-coding genes in immune cell types
  • The Blood Protein section, describing proteins detected in blood and proteins secreted by human tissues
  • The Subcellular section, showing the subcellular localization of proteins in single cells
  • The Cell Line section, showing expression of protein-coding genes in human cell lines
  • The Metabolic section, exploring expression of protein-coding genes in the context of the human metabolic network

The Human Protein Atlas program has already contributed to several thousands of publications in the field of human biology and disease and it is selected by the organization ELIXIR (www.elixir-europe.org) as a European core resource due to its fundamental importance for a wider life science community. The Human Protein Atlas consortium is mainly funded by the Knut and Alice Wallenberg Foundation.

The full publication list is available here.

Tissue

This section of the Human Protein Atlas focus on the expression profiles in human tissues of genes both on the mRNA and protein level. The protein expression data from 44 normal human tissue types is derived from antibody-based protein profiling using immunohistochemistry. All underlying images of immunohistochemistry stained normal tissues are available together with knowledge-based annotation of protein expression levels. The protein data covers 15323 genes (76%) for which there are available antibodies. The mRNA expression data is derived from deep sequencing of RNA (RNA-seq) from 256 different normal tissue types.

More information about the specific content and the generation and analysis of the data in the section can be found on the Methods Summary.

Learn about:

  • protein localization in tissues at a single-cell level
  • if a gene is enriched in a particular tissue (specificity)
  • which genes have a similar expression profile across tissues (expression cluster)

Example:

MYL7
Myosin, light chain 7, regulatory.

Selective cytoplasmic expression in cardiomyocytes at the protein level, tissue enriched in heart muscle at the mRNA level.


Brain

The Brain section gives an overview of protein expression and distribution in the mammalian brain. Externally and “In-house” generated data are integrated to explore regional protein expression in the human, pig and mouse brain. Protein expression data are based on quantification of messenger RNA using RNA sequencing techniques and in situ hybridization. Protein distribution data are generated using antibody-based immunohistochemistry and immunofluorescence techniques. The brain section can be utilized to create an overview of regional and cross species expression of proteins of interest or can be used to identify regional or functional clustered genes based on expression levels across regions of the brain. More information about the specific content and the generation and analysis of the data in this section can be found in the Methods Summary.

Learn about:

  • Expression levels for all human proteins in regions and subregions of the human brain
  • Expression levels for all proteins with human orthologs in regions and subregions of the pig and mouse brain
  • Brain enriched genes with higher expression in any of the regions of the brain compared to peripheral organs
  • Regional enriched genes with higher expression in a single or few regions of the brain
  • Cell-type and cell-compartment distribution of selected proteins in the human and mouse brain
  • Differences in gene expression between mammalian species

Example:

NECAB1
N-terminal EF-hand calcium binding protein 1.

Subsets of neurons show distinct somato-dendritic immunoreactivity throughout the brain. The image show protein location in subsets of neurons in the hippocampus of mouse brain.

Single Cell Type

This section contains Single Cell Type information based on single cell RNA sequencing (scRNAseq) data from 25 human tissues and peripheral blood mononuclear cells (PBMCs), together with in-house generated immunohistochemically stained tissue sections visualizing the corresponding spatial protein expression patterns. The scRNAseq analysis was based on publicly available genome-wide expression data and comprises all protein-coding genes in 444 individual cell type clusters corresponding to 15 different cell type groups. A specificity and distribution classification was performed to determine the number of genes elevated in these single cell types, and the number of genes detected in one, several or all cell types, respectively. The genes expressed in each of the cell types can be explored in interactive UMAP plots and bar charts, with links to corresponding immunohistochemical stainings in human tissues.

More information about the specific content and the generation and analysis of the data in the section can be found on the Methods Summary.

Learn about:

  • mRNA and protein expression in single cell types
  • if a gene is enriched in a particular cell type (specificity)
  • which genes have a similar expression profile across cell types (expression cluster)

Example:

TSPY2
Testis specific protein, Y-linked 2.

Selective nuclear expression of spermatogonia at the protein level, enriched in spermatogonia at the mRNA level.

Tissue Cell Type

The Tissue Cell Type section contains cell type expression specificity predictions for all human protein coding genes, generated using integrated network analysis of publicly available bulk RNAseq data. A specificity classification is used to predict which genes are enriched in each constituent cell type within an individual tissue. The data can be explored on a tissue-by-tissue basis, together with in-house generated immunohistochemically stained tissue sections. In addition, a core cell type analysis focuses on the cell types found in all, or the majority, of the profiled tissues, e.g., endothelial cells or macrophages. Here, genes with predicted specificity in these core cell types in multiple tissues are detailed. More information about the specific content and data analysis in the section can be found in the Methods Summary.


Learn about:

  • if a gene is predicted to have cell type specificity within a given tissue
  • which genes have a common cell type specificity prediction within each tissue
  • the catalogue of genes with predicted specificity in core cell types across tissues

Example:

KRTAP2-1 Keratin associated protein 2-1.

Selective expression in hair follicle cortex cells at the protein level, mRNA specificity prediction in skin: hair follicle cortex cells.

Pathology

This section contains Pathology information based on mRNA and protein expression data from 17 different forms of human cancer, together with millions of in-house generated immunohistochemically stained tissue sections images and Kaplan-Meier plots showing the correlation between mRNA expression of each human protein gene and cancer patient survival. More information about the specific content and the generation and analysis of the data in the section can be found in the Methods Summary.

Cancer statistics from relevant international and Swedish databases are summarized here, and hallmarks of cancer are described here.

Learn about:

  • if the mRNA expression of a gene is prognostic for patient survival in each of the cancer types
  • if a gene is enriched in a particular cancer type (specificity)
  • the catalogue of genes elevated in each of the cancer types

Example:

MKI67
Marker of proliferation Ki-67.

Nuclear expression in varying fractions of tumor cells in all cancer types at protein level and expressed in all cancers at mRNA level. High expression of this gene is associated with unfavorable prognosis in renal, liver and pancreatic cancer.


Immune Cells

The Immune Cell section contains single cell information on genome-wide RNA expression profiles of human protein-coding genes covering various B- and T-cells, monocytes, granulocytes and dendritic cells. The transcriptomics analysis covers 18 cell types isolated with cell sorting and includes classification based on specificity, distribution and expression cluster across all immune cells. More information about the specific content and the generation and analysis of the data in the section can be found in the Methods Summary.

Learn about:

  • if a gene is enriched in a particular immune cell type (specificity)
  • which genes have a similar expression profile across the immune cells (expression cluster)
  • the catalogue of genes elevated in each of the immune cell types

Blood Proteins

The Blood Proteins section presents estimated plasma concentrations of the proteins detected in human blood from mass spectrometry-based proteomics studies, published immune assay data and a longitudinal study based on proximity extension assay (PEA). Further, an analysis of the “human secretome” is presented including annotation of the genes predicted to be actively secreted to human blood, as well as to other compartments or organ systems of the human body such as the digestive tract or the brain. More information about the specific content and the generation and analysis of the data in this section can be found in the Methods Summary.

Learn about:

  • the plasma levels of blood proteins in a longitudinal study of healthy individuals
  • the levels of plasma proteins using immune assays and mass spectrometry-based proteomics
  • the classification of the human secretome (proteins secreted from human cells)

Subcellular

The Subcellular section of the Human Protein Atlas provides high-resolution insights into the expression and spatiotemporal distribution of proteins encoded by 13041 genes (65% of the human protein-coding genes). For each gene, the subcellular distribution of the protein has been investigated by immunofluorescence (ICC-IF) and confocal microscopy in up to three different cell lines, selected from a subset of 36 of the cell lines found in the Cell Line Section. Upon image analysis, the subcellular localization of the protein has been classified into one or more of 35 different organelles and fine subcellular structures. In addition, the section includes an annotation of genes that display single-cell variation in protein expression levels and/or subcellular distribution, as well as an extended analysis of cell cycle dependency of such variations.

The Subcellular Section offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of proteomes in a broader context. More information about the content of the section, as well as the generation and analysis of the data, can be found in the Methods Summary.

Learn about:

  • the subcellular distribution of proteins in human cell lines.
  • the proteomes of different organelles and subcellular structures.
  • single-cell variability in the expression levels and/or localizations of proteins.

Example:

CCNB1
Cyclin B1.

The protein localizes to the cytosol in human and mouse cells, and is expressed in a cell cycle-dependent manner. The location has been validated by siRNA mediated gene silencing, analysis of GFP-tagged protein and independent antibodies.


Cell Lines

The Cell Line section contains information on genome-wide RNA expression profiles of human protein-coding genes in 69 human cell lines. The transcriptomics analysis includes classification based on specificity, distribution and expression cluster analysis across all cell lines. More information about the specific content and the generation and analysis of the data in the section can be found in the Methods summary.

Learn about:

  • if a gene is enriched in a particular human cell line (specificity)
  • which genes have a similar expression profile across the cell lines (expression cluster)
  • the catalogue of genes elevated in each of the cell lines

Metabolic

The Metabolic section enables exploration of protein function and tissue-specific gene expression in the context of the most curated human metabolic network. For proteins involved in metabolism, a metabolic summary is provided that describes the metabolic subsystems/pathways, cellular compartments, and number of reactions associated with the protein. Over 120 manually curated metabolic pathway maps facilitate the visualization of each protein's participation in different metabolic processes. Each pathway map is accompanied by a heatmap detailing the mRNA levels across 256 different tissue types for all proteins involved in the metabolic pathway. More information about the human metabolic network, including how it was generated and what information it provides, can be found in the Methods summary.

Learn about:

  • what pathways/subsystems a metabolic gene is part of
  • which genes are nearby in the metabolic network
  • how the expression of the genes in a pathway/subsystem varies across different tissues

Background and History

The Human Protein Atlas project was initiated in 2003 by funding from the Knut and Alice Wallenberg foundation. Primarily based in Sweden, the Human Protein Atlas project involves the joint efforts of the Royal Institute of Technology in Stockholm, Uppsala University, Uppsala Akademiska University Hospital, and more recently also Science for Life Laboratory based in both Uppsala and Stockholm. Formal collaborations are with groups in India, South Korea, Japan, China, Germany, France, Switzerland, USA, Canada, Denmark, Finland, The Netherlands, Spain, and Italy.

The pathologists and staff at the Pathology Clinic, Uppsala University Hospital, Uppsala, Sweden, are greatly acknowledged for all efforts regarding handling and diagnostics of the tissues used in the Human Protein Atlas. Dr Sanjay Navani and Lab Surgpath, Mumbai, India, are also acknowledged for the major contribution regarding annotation of immunohistochemically stained normal and cancer tissues.

The first version of the Human Protein Atlas website was launched in 2005 and consisted of protein expression data based on approximately 700 antibodies. Since then, each new release has included more data and new functionalities and features to the website.

Important additions include:

Version Year released Feature
2 2006 Inclusion of cell-line data and confocal images showing subcellular localizations.
3 2007 A new search function that allowed advanced query based searches was included.
4 2008 The overall database structure was shifted from a previously antibody-centric structure, to a gene-centric structure in order to include information on all genes predicted by Ensembl.
7 2010 A major restructure accompanying the introduction of the concept of annotated protein expression for paired antibodies (two independent antibodies directed against different, non-overlapping epitopes on the same protein).
12 2013 The protein atlas database was complemented with transcriptomics profiles from 27 normal tissues, and the format with four sub-atlases was introduced.
13 2014 An analysis of all major organ and tissues in the human body using transcriptomics and antibody-based profiling was included. The results were summarized on interactive knowledge-pages divided into 7 human proteomes and 27 tissues and organs.
14 2015 A new mouse brain atlas was introduced.
15 2016 Inclusion of RNA-seq data from the Genotype-Tissue Expression (GTEx) consortium.
16 2016 A new Cell Atlas was launched with subcellular localization corresponding to over 12,000 protein-coding genes, together with a new approach for visualization of antibody validation and the inclusion of transcriptomics data from the FANTOM5 program.
17 2017 The Pathology Atlas was launched, where a systems-level approach based on genome-wide transcriptomics data and clinical meta data of almost 8000 patients was used in order to analyze the proteome of 17 major cancer types.
19 2019 Three new Atlases were introduced; a Blood Atlas, a Brain Atlas and a Metabolic Atlas.