The cell cycle dependent transcriptome and proteomeThe cell cycle is an ordered and tightly regulated series of events over which the cell grows and divides into two daughter cells. It consists of four stages, during which the cell increases in size (G1), replicates its genome (S), increases further in size and prepares for mitosis (G2), and finally goes through mitosis as well as cytokinesis (M). Depending on external and internal signals, the cell may also exit the replicative cell cycle from G1 and enter a non-replicative resting state (G0). Dysregulation of the cell cycle is known to have devastating consequences, such as uncontrolled cell proliferation, genomic instability (Malumbres M et al. (2009)), and cancer (Massagué J. (2004); Hartwell LH et al. (1994)). Therefore, the cell cycle needs to be tightly controlled, while at the same time remaining responsive to various intracellular and extracellular signals (Barnum KJ et al. (2014)). The cell cycle control system involves an intricate network of proteins that are tightly regulated by mechanisms such as transcriptional regulation (Weinberg RA. (1995)), protein post-translational modifications (PTMs) (Morgan DO. (1995)), and protein degradation (Teixeira LK et al. (2013); King RW et al. (1996)). In asynchronous cell cultures, the cell cycle is a fundamental source of cell-to-cell variation in both transcript and protein abundances (Cho RJ et al. (2001); Whitfield ML et al. (2002); Boström J et al. (2017); Lane KR et al. (2013); Ohta S et al. (2010); Ly T et al. (2014); Pagliuca FW et al. (2011); Ly T et al. (2015)). The Subcellular Section provides a resource to explore protein heterogeneity at the single cell level in unperturbed log-phase growing cells. Among the 13041 genes in the Subcellular Section, a quarter (3193) show cell-to-cell variation in terms of expression level and/or spatial distribution of the encoded protein(s) in at least one cell line in the regular ICC-IF pipeline. For a subset of these genes, the temporal protein and RNA expression patterns have been further characterized in individual cells using the Fluorescent Ubiquitination-based Cell Cycle Indicator (FUCCI) U-2 OS cell line (Mahdessian D et al. (2021)). In this study, 311 of the genes now present in the Subcellular section were found to correlate with progression through interphase. In addition, there is currently 354 genes encoding proteins that are defined as cell cycle dependent (CCD) by their localization to mitotic structures, giving a total of 640 CCD proteins. Single cell sequencing of FUCCI U-2 OS cells sorted according to cell cycle phase have also identified 529 genes that encode CCD transcripts. This spatially resolved proteomic map of the cell cycle has been integrated into the Subcellular section in order to provide a resource for molecular insights into the human cell cycle and cellular proliferation. Single-cell variation in the Subcellular SectionGenetically identical cells may exhibit differences in their patterns of gene- and protein expression. This phenomenon is often referred to as cell-to-cell variation or single-cell variation (SCV). While it is hypothesized that there is an underlying functional importance to this variability, the scale and significance of variations at the single-cell level remains poorly understood (Dueck H et al. (2016)). Environmental changes, DNA damage, cell cycle progression, and stochasticity are examples of factors that may cause changes in RNA and protein expression within isogenic cell populations, and thus serve as sources of single-cell heterogeneity (Snijder B et al. (2011)). This may create different phenotypic characteristics within individual cells and provide them with a molecular and phenotypic fingerprint. Identification of all human proteins that display single-cell variation lays a foundation for characterizing the driving forces of single-cell heterogeneity, and for understanding the functional consequences. In an immunofluorescence (IF) image, single-cell protein variations can be observed as differences in the staining intensity or spatial distribution between cells, as exemplified in Figure 1. Interestingly, as many as 3193 of all human proteins localized in the Subcellular Section show single-cell variations (Thul PJ et al. (2017)). Of these, 3074 proteins show variations in expression level (staining intensity), 206 proteins show variations in spatial distribution, and 87 proteins show both types of variation.
Figure 1. Examples of proteins showing single-cell variation. GTPBP8 is a GTP binding protein (detected in U-2 OS cells). CLCN6 is a chloride transport protein (detected in U-2 OS cells). INCENP is a component of the chromosomal passenger complex (CPC) that is a key regulator of mitosis (detected in MCF7 cells). RACGAP1 has a key role in controlling cell growth and cell division (detected in U-2 OS cells). RRM2 provides precursors necessary for DNA synthesis (detected in U-2 OS cells). KIF20A is a mitotic kinesin required for cytokinesis (detected in U-2 OS cells). DUSP18 and DUSP19 are phosphatases (detected in A-431 and SK-MEL-30 cells, respectively). CCNB1 is a key regulator of the cell cycle at the G2/M transition for cell division (detected in U-2-OS cells). The target protein is shown in green, microtubules in red, and the nucleus in blue. Single-cell variation is most commonly observed for proteins in the nucleoplasm, cytosol, vesicles, nucleoli and mitochondria (Figure 2). Gene Ontology (GO)-based enrichment analysis of genes encoding proteins with single-cell variation at protein level reveals an enrichment of GO terms describing numerous biological processes, including DNA repair, translation, apoptosis, transcription, cell cycle progression and metabolism (Figure 3). The enriched terms for the GO domain Molecular Function describes many different enzymatic activities as well as binding to DNA, RNA and chromatin. Figure 2. Localizations of proteins showing single-cell variations to the different organelles, grouped by meta-compartments.
Figure 3. Gene Ontology-based enrichment analysis for genes encoding proteins with single-cell variations, showing the significantly enriched terms for the GO domain Biological Process. Each bar is clickable and gives a search result of proteins that belong to the selected category.
Figure 4. Gene Ontology-based enrichment analysis for genes encoding proteins with single-cell variations, showing the significantly enriched terms for the GO domain Molecular Function. Each bar is clickable and gives a search result of proteins that belong to the selected category. Interphase proteogenomics in single cellsPrevious studies of transcript and protein abundance in different phases of the human cell cycle have revealed variations in the expression of 400-1,200 genes (Cho RJ et al. (2001); Whitfield ML et al. (2002); Boström J et al. (2017)) and 300-700 proteins (Lane KR et al. (2013); Ohta S et al. (2010); Ly T et al. (2014); Pagliuca FW et al. (2011); Ly T et al. (2015)). However, cell synchronization is known to alter gene expression (Cooper S et al. (2007)), cell morphology and metabolism (Davis PK et al. (2001)), and precludes the discovery of expression changes within cell cycle phases. The use of single-cell RNA sequencing has allowed the analysis of transcriptional changes without the need for synchronization and has enabled the discovery of additional cell cycle regulated genes (Domenighetti G et al. (1988); Scialdone A et al. (2015)). However, studies of cell cycle dependent (CCD) variations in protein expression at single-cell level have been lacking due to technological limitations. The HPA Subcellular Section now includes a targeted single-cell transcriptomic analysis, as well as proteomic imaging (i.e., imaging proteogenomics, Figure 5) of 1137 variable proteins that are expressed in FUCCI U-2 OS cells (Sakaue-Sawano A et al. (2008); Mahdessian D et al. (2021)). This cell line expresses a pair of fluorescently tagged marker proteins, Cdt1 tagged with red fluorescent protein (RFP) and Geminin tagged with green fluorescent protein (GFP), which enable visualization of interphase progression in individual cells. The intensities of the RFP- and GFP-tagged cell cycle markers can be used to create a linear representation of cell cycle pseudo time, enabling protein and RNA expression in individual cells to be plotted along an axis representing progression through interphase.
Figure 5. Schematic overview of the single-cell imaging proteogenomic workflow. U-2 OS FUCCI cells express two fluorescently tagged cell cycle markers, CDT1 during G1 phase (red, RFP-tagged) and Geminin during S and G2 phases (green, GFP-tagged); these markers are co-expressed during the G1-S transition (yellow). By fitting a polar model to the red and green fluorescence intensities, a linear representation of cell cycle pseudotime is obtained. Independent measurements of RNA and protein expression are compared after pseudotime alignment of individual cells. The single-cell RNA-sequencing data from the FUCCI U-2 OS cells enables analysis of RNA abundance in relation to cell cycle progression. This analysis has led to the identification of 529 genes that show variance in RNA expression levels that correlate to interphase cell cycle progression. In the single-cell proteomic imaging analysis, 311 proteins display variation in protein expression levels that temporally correlate with interphase progression through G1, S and G2. These cell cycle dependent (CCD) proteins include known cell cycle regulators, such as the cyclin CCNB1 and ANLN, which is required for cytokinesis, but also novel CCD proteins, such as DUSP18 (Figure 6). However, most proteins (826) show cell-to-cell variations that are largely unexplained by cell cycle progression (non-CCD). This opens up intriguing avenues for further exploration of the stochasticity or deterministic factors that govern these variations, as well as the role of spatiotemporal proteome dynamics for regulating other cellular states and functions.
Figure 6. Examples of temporal expression profiles for single cell protein (blue) and RNA (orange) expression. The boxplot shows a mock-up bulk proteomic experiment. Proteins in mitotic structuresIn addition to proteins that show single-cell variations due to progression through interphase, there are 354 genes in the Subcellular section encoding proteins that are defined as cell cycle dependent (CCD) as they localize to mitotic structures, including mitotic chromosomes (70), mitotic spindle (89), kinetochores (5), cytokinetic bridge (160), midbody (56), midbody ring (30) and cleavage furrow (1). Examples of these can be seen in Figure 7.
Figure 7. Example images of proteins localized to mitotic substructures: KIF20A to cleavage furrow, TAF1D, TACC3, KIF11 and CKAP2L to mitotic spindle, BIRC5 to cytokinetic bridge, DVL3 and CTTNBP2 to midbody ring, and SGO1 to kinetochores. Localizations of the cell cycle dependent proteomeIn total, there are 640 genes encoding variable proteins that have been identified as cell cycle dependent (CCD) and 826 genes encoding variable proteins that have been identified as cell cycle independent (non-CCD) in the Subcellular Section. The high resolution of the HPA Subcellular Section dataset allows us to look at the subcellular localizations of proteins showing CCD and non-CCD variability in protein expression (Figure 8). Larger fractions of the CCD proteins are found in mitotic structures, while larger fractions of the non-CCD variable proteins localize to e.g. the cytosol, mitochondria and plasma membrane. Almost half of the CCD variable proteins reside in the nuclear meta compartment, including the nucleus, nuclear speckles, nuclear bodies, and nucleoli. This is in agreement with one of the main functions of the nucleus in replication and separation of DNA during the cell cycle.
Figure 8. Bar plot showing the subcellular localizations enriched for CCD proteins (blue) and non-CCD proteins (red) relative to the proteome mapped in the HPA. Temporal delay between RNA and proteinPrevious studies have shown that many RNAs peak in expression in the G1 phase, which is also the longest period of the cell cycle (Boström J et al. (2017); Grant GD et al. (2013)). Among the 529 genes for which RNA expression is correlated to the cell cycle in FUCCI U-2 OS cells, (248) peak in G1. However, most proteins that show cell cycle dependent expression (241) peak towards the end of the cell cycle, corresponding to late S and G2 (Figure 9). This seems to reflect a temporal delay between RNA and protein expression Mahdessian D et al. (2021).
Figure 9. The number of proteins peaking in each phase (interactive blue text) and the number of transcripts peaking in each phase (interactive orange text). Interestingly, only 84 of the genes encoding proteins identified as CCD proteins in interphase also display cell cycle dependent variations in RNA expression in interphase, while a large majority of the CCD proteins in interphase have non-CCD transcripts (n=375) (Figure 10). Thus, their variation in protein expression thus cannot be attributed to transcript cycling. The small overlap of CCD proteins and transcripts is corroborated by external RNA datasets (Grant GD et al. (2013); Semple JW et al. (2006)) and indicates that the temporal dynamics of proteome regulation may be largely maintained at a post-transcriptional level.
Figure 10. The numbers of cell cycle dependent proteins, transcripts, displayed as an interactive bar plot on the left. On the right, we highlight the overlap of these categories as transcriptionally regulated and non-transcriptionally regulated cell cycle dependent proteins as an interactive bar plot. Functional roles of novel cell cycle proteins in proliferationAnalysis of RNA expression of the CCD proteins across normal human tissues and tumor tissues, reveals a significantly higher expression in proliferative tissues compared to non-proliferative tissues (Figure 11). This indicates that, while the majority of the CCD proteins are not accompanied by cycling transcripts, overall transcription levels of these proteins could be important for cell proliferation.
Figure 11. A) Hierarchical clustering of bulk transcript expression (log-transformed TPM values) for CCD proteins derived from RNA sequencing of various normal and cancer tissue types. The expression levels of the proliferation markers MCM6, CDK1, PCNA, MCM2 and KI67 are highlighted on top as a general measure of the proliferative activity of the tissues. Four clusters are identified: (1) contains normal tissues with low proliferative activity, (2) contains cerebral tissues with testis, (3) contains mostly normal tissues with midrange expression level of the proliferation markers and (4) contains tissues with high expression of the proliferation markers, including tumors. B) Box plots of the average transcript level for known and novel CCD proteins, respectively, for the four different clusters from A. To confirm a functional role in proliferation, we performed siRNA-mediated gene silencing for a few selected novel CCD proteins. Silencing of DUSP18, KLHL38, CD2BP2 and SOX12 decreased cell proliferation rate relative to the control, whereas silencing of JPH3 increased cellular proliferation (Figure 12).
Figure 12. Silencing of CCD proteins DUSP18, CD2BP2, KLHL38 and JPH3. Immunofluorescence images of the control and siRNA samples, where the staining intensity is shown as a gradient from low intensity (blue) to high intensity (white). Bar plots show the differences in cell counts for control (Ctrl) and siRNA samples, and boxplots show the significant decrease of the measured intensity (too few cells were observed in DUSP18 and KLH38 siRNA samples to make this comparison). Cellular proliferation also plays an important role in tumorigenesis. The Pathology Atlas of the HPA is a comprehensive resource for studying the correlation between RNA expression for human protein-coding genes in cancer tissues and the clinical outcomes for almost 8000 cancer patients. Prognostic associations are significantly overrepresented among the genes encoding CCD proteins (424, 70%), corroborating the functional role of CCD proteins in proliferation. The novel CCD proteins, such as FAM50B and CD2BP2 (Figure 13), include both inhibitors and enhancers of proliferation, with potential anti-oncogenic or oncogenic functions. Thus, some of the novel CCD proteins may have potential to be novel diagnostic or therapeutic targets for human cancers.
Figure 13. Kaplan-Meier plots showing the correlation between survival and gene expression (FPKM) for CD2BP2 (top panel) and FAM50B (bottom). Higher expression of FAM50B was associated with longer survival (favorable) in renal cancer, and higher expression of CD2BP2 was associated with shorter survival (unfavorable) in liver cancer. Immunohistochemistry images (target protein: brown, nuclei: blue) show lower expression of FAM50B in renal cancer than normal kidney and higher expression of CD2BP2 in liver cancer than normal liver. Relevant links and publications Uhlen M et al., A proposal for validation of antibodies. Nat Methods. (2016) |