Performing complete deconvolution analysis for bulk RNA-seq data to obtain both cell type specific gene expression profiles (GEP) and relative cell abundances is a challenging task. One of the fundamental models used, the nonnegative matrix factorization (NMF), is mathematically ill-posed. Although several complete deconvolution methods have been developed, and their estimates compared to ground truth for some datasets appear promising, a comprehensive understanding of how to circumvent the ill-posedness and improve solution accuracy is lacking. In this paper, we first investigated the necessary requirements for a given dataset to satisfy the solvability conditions in NMF theory. Even with solvability conditions, the "unique" solutions of NMF are subject to a rescaling matrix. Therefore, we provide estimates of the converged local minima and the possible rescaling matrix, based on informative initial conditions. Using these strategies, we developed a new pipeline of pseudo-bulk tissue data augmented, geometric structure guided NMF model (GSNMF). In our approach, pseudo-bulk tissue data was generated, by statistical distribution simulated pseudo cellular compositions and single-cell RNA-seq (scRNA-seq) data, and then mixed with the original dataset. The constituent matrices of the hybrid dataset then satisfy the weak solvability conditions of NMF. Furthermore, an estimated rescaling matrix was used to adjust the minimizer of the NMF, which was expected to reduce mean square root errors of solutions. Our algorithms are tested on several realistic bulk-tissue datasets and showed significant improvements in scenarios with singular cellular compositions.
Citation: Shaoyu Li, Su Xu, Xue Wang, Nilüfer Ertekin-Taner, Duan Chen. An augmented GSNMF model for complete deconvolution of bulk RNA-seq data[J]. Mathematical Biosciences and Engineering, 2025, 22(4): 988-1018. doi: 10.3934/mbe.2025036
[1] | Timothy Hamerly, Margaret H. Butler, Steve T. Fisher, Jonathan K. Hilmer, Garth A. James, Brian Bothner . Mass Spectrometry Imaging of Chlorhexidine and Bacteria in a Model Wound. AIMS Medical Science, 2015, 2(3): 150-161. doi: 10.3934/medsci.2015.3.150 |
[2] | Sharon Lu . Clinical pharmacology to support monoclonal antibody drug development. AIMS Medical Science, 2022, 9(2): 322-341. doi: 10.3934/medsci.2022014 |
[3] | Louis Papageorgiou, Dimitrios Vlachakis . Antisoma Application: A Fully Integrated V-Like Antibodies Platform. AIMS Medical Science, 2017, 4(4): 382-394. doi: 10.3934/medsci.2017.4.382 |
[4] | Claudia Francesca Oliva, Gloria Gangi, Silvia Marino, Lidia Marino, Giulia Messina, Sarah Sciuto, Giovanni Cacciaguerra, Mattia Comella, Raffaele Falsaperla, Piero Pavone . Single and in combination antiepileptic drug therapy in children with epilepsy: how to use it. AIMS Medical Science, 2021, 8(2): 138-146. doi: 10.3934/medsci.2021013 |
[5] | Kwon Yong, Martin Brechbiel . Application of 212Pb for Targeted α-particle Therapy (TAT): Pre-clinical and Mechanistic Understanding through to Clinical Translation. AIMS Medical Science, 2015, 2(3): 228-245. doi: 10.3934/medsci.2015.3.228 |
[6] | Somayeh Boshtam, Mohammad Shokrzadeh, Nasrin Ghassemi-Barghi . Fluoxetine induces oxidative stress-dependent DNA damage in human hepatoma cells. AIMS Medical Science, 2023, 10(1): 69-79. doi: 10.3934/medsci.2023007 |
[7] | Ying Chen, Sok Kean Khoo, Richard Leach, Kai Wang . MTA3 Regulates Extravillous Trophoblast Invasion Through NuRD Complex. AIMS Medical Science, 2017, 4(1): 17-27. doi: 10.3934/medsci.2017.1.17 |
[8] | Ekaterina Kldiashvili, Archil Burduli, Gocha Ghortlishvili . Application of Digital Imaging for Cytopathology under Conditions of Georgia. AIMS Medical Science, 2015, 2(3): 186-199. doi: 10.3934/medsci.2015.3.186 |
[9] | Michiro Muraki . Sensitization to cell death induced by soluble Fas ligand and agonistic antibodies with exogenous agents: A review. AIMS Medical Science, 2020, 7(3): 122-203. doi: 10.3934/medsci.2020011 |
[10] | Yonggang Cui, Giuseppe S. Camarda, Anwar Hossain, Ge Yang, Utpal N. Roy, Terry Lall, Ralph B. James . Modeling an Interwoven Collimator for A 3D Endocavity Gamma Camera. AIMS Medical Science, 2016, 3(1): 114-125. doi: 10.3934/medsci.2016.1.114 |
Performing complete deconvolution analysis for bulk RNA-seq data to obtain both cell type specific gene expression profiles (GEP) and relative cell abundances is a challenging task. One of the fundamental models used, the nonnegative matrix factorization (NMF), is mathematically ill-posed. Although several complete deconvolution methods have been developed, and their estimates compared to ground truth for some datasets appear promising, a comprehensive understanding of how to circumvent the ill-posedness and improve solution accuracy is lacking. In this paper, we first investigated the necessary requirements for a given dataset to satisfy the solvability conditions in NMF theory. Even with solvability conditions, the "unique" solutions of NMF are subject to a rescaling matrix. Therefore, we provide estimates of the converged local minima and the possible rescaling matrix, based on informative initial conditions. Using these strategies, we developed a new pipeline of pseudo-bulk tissue data augmented, geometric structure guided NMF model (GSNMF). In our approach, pseudo-bulk tissue data was generated, by statistical distribution simulated pseudo cellular compositions and single-cell RNA-seq (scRNA-seq) data, and then mixed with the original dataset. The constituent matrices of the hybrid dataset then satisfy the weak solvability conditions of NMF. Furthermore, an estimated rescaling matrix was used to adjust the minimizer of the NMF, which was expected to reduce mean square root errors of solutions. Our algorithms are tested on several realistic bulk-tissue datasets and showed significant improvements in scenarios with singular cellular compositions.
Today, in addition to chemical compounds, many biologics have been developed for clinical application. For both categories of drug, validation of pharmacological efficacy by pharmacokinetics (PK) and pharmacodynamics (PD) studies is important for successful development of safe and effective therapeutics [1,2]. In conventional PK/PD analysis, drug concentrations are examined in a blood or tumor sample. The profile of a small-molecule chemical compound can be predicted based on the profiles of similar compounds, or by using any of several pharmacological compartment models [2,3,4].
Unlike chemical compounds, biologics (especially antibodies) have unique PK/PD profiles and multiple mechanisms of action, including neutralizing effects, induction of apoptosis, and immunoreactions such as ADCC (antibody-dependent cellular cytotoxicity), CDC (complement dependent cytotoxicity), and ADPC (antibody-dependent cellular phagocytosis) [5,6]. Moreover, the PK profile of an antibody has distinct features that depend on the structure of the antibody itself or the biology of the targeted antigen [5,7,8,9,10]. In regard to PD, efficient drug delivery into the targeted site and accumulation in normal tissues should be checked carefully. Traditionally, however, such PK/PD studies are usually conducted in the middle or late stage of drug discovery and development.
A more convenient evaluation method for visualizing drug distribution in the early stage of development would improve the success rate. Mass spectrometry imaging (MSI) is a method for viewing a biomolecule or metabolite in a tissue sample using MS analysis [11,12,13,14,15,16,17,18,19,20]. We applied this approach to drug imaging as a form of in situ PK/PD analysis. Although in situ evaluations usually employ autoradiograms with radiolabeled drugs, the associated techniques are complicated and expensive [14,21,22]. By contrast, MSI is convenient, and (excluding the cost of the analytical device itself) the running cost is low.
Here, we review advances in the use of MSI for early drug discovery and development, and describe our recent relevant work.
Chemical compounds have linear PK profiles, and their biodistribution can be predicted using pharmacological compartment models [2,3,4]. By contrast, antibodies have complex non-linear PK profiles with substantial between- and within-patient variability. In particular, the target-binding activity and immunogenicity of the antibody strongly influence the PK profile. The former phenomenon is referred to as target-mediated drug disposition (TMDD). High-affinity target binding or high levels of target will strongly influence the PK profile, typically resulting in non-linear PK characteristics [5,8,23]. The immunogenicity of an antibody is another important factor influencing the PK profile [9,24].
Several factors are involved in forming an anti-drug antibody (ADA) against a therapeutic antibody. ADAs are classified into two types, binding antibody (Bab) and neutralizing antibody (NAb). ADA-Babs are produced even against fully human therapeutic antibodies that avoid immunogenicity based on species differences. Babs react to allotypes that vary based on genetic differences between populations. On the other hand, NAbs react to the epitope that determines the specificity of the antibody. In immunology, this phenomenon is referred to as an idiotype network. Both B and T lymphocytes recognize antigens specifically, either via antibodies (B cells) or T-cell receptors. Idiotype networks enable the host to avoid the expansion of autoreactive B or T cells [7,9,24,25].
ADAs can be produced not only against antibodies, but also against chemical compounds. Because it is still difficult to abolish the immunogenicity of a drug, especially an antibody drug, the detection, characterization, and control of ADAs have become more and more important. These factors, which influence drug development, can be predicted using molecular imaging and DDS technologies. Genetically engineered mice with targets derived from humans or monkeys would be helpful for such predictions [7,9].
In the drug discovery, determination of the mode of action (MOA) and its uniqueness in comparison with conventional drugs are also important factors influencing the success rate of drug development. Chemical compounds, especially molecular targeting drugs such as kinase inhibitors, have simple straightforward MOAs. By contrast, antibodies have multiple functions: (1) neutralizing effect; (2) apoptosis-inducing effect; (3) ADCC (antibody-dependent cellular cytotoxicity); (4) CDC (complement dependent cytotoxicity); (5) ADPC (antibody-dependent cellular phagocytosis) [5,6]. Furthermore, next-generation antibody therapeutics have additional distinctive actions. An antibody-drug conjugate (ADC), a combination of an antibody and chemical compound connected via a specialized linker, has a chemotherapeutic effect as an additional action [14,26,27,28,29,30,31]. In radioimmunotherapy (RIT) and photoimmunotherapy (PIT), the new properties are radiotherapeutic and phototherapeutic, respectively [32,33]. The recently approved anti-PD-1, anti-PD-L1, and anti-CTLA-4 antibodies have a unique MOA; they block the immune checkpoint to enhance the anti-tumor immunoreaction [34,35,36].
A drug delivery system (DDS) is a method that delivers a required dose of drug into a specified area in the body when needed. Anticancer agents (ACAs) are distributed throughout the body, leading to adverse side effects. On the other hand, a DDS can target tumors, thereby enhancing the action of the drug and minimizing toxicity [14,37,38]. The EPR (enhanced permeability and retention) effect is a very important concept for DDSs. In normal tissues, low-molecular weight (MW) agents can extravasate easily, whereas high-MW (HMW) agents cannot. By contrast, in tumors, HMW agents can extravasate due to leaky tumor vessels and increased permeability [37,39]. HMW agents from 10 to 200 nm in size can extravasate into the tumor selectively, dependent on the EPR effect (Figure 1).
Accordingly, many DDS drugs, such as liposome or micelles, have been developed [37,38,40,41,42,43,44,45,46,47]. We consider antibody-drug conjugates (ADCs) to be DDS drugs because the antibody ranges in size from 10 to 20 nm and can extravasate via the EPR effect [14,39]. Figure 2 shows a comparison among liposomes, micelles, and ADCs. Liposomes range in size from 50 to 200 nm, and the drugs they contain are released by natural breakdown or enzymatic cleavage [41,42,48]. Micelles range in size from 30 to 100 nm, and the drugs they contain are released by natural breakdown or unique technologies such as pH-dependent cleavage [38,40,47]. ADCs range in size from 10 to 20 nm. In an ADC, a specialized linker is used for the control of drug release. Importantly, ACAs are very small size, below 1 nm; consequently, they can distribute into both tumor and normal tissues, often causing adverse effects. By contrast, DDS drugs cannot distribute in this fashion, and the frequency of adverse effects is correspondingly reduced. DDS drugs and ADCs act via four steps: (1) systemic circulation; (2) EPR effect (passive targeting); (3) penetration within tumor tissue; (4) action on cells [14]. To evaluate these four steps, delivery of the drug carrier or antibody should be examined. In addition, to evaluate the final step, controlled release should be examined. In regard to the final step, our strategy of using MSI to visualize the drug released from a DDS drug or ADC is very useful (Figure 3).
MSI is a method for directly visualizing biomolecules or metabolites in tissue samples [11,12,13,14,15,16,17,18,19]. In MS, it is important to ionize the targeted molecule. Methods for ionization include MALDI (matrix-assisted laser desorption ionization), ESI (electrospray ionization), and SIMS (secondary-ion mass spectrometry). In MALDI, the matrix must be sprayed on the tissue sample. After the treated sample is laser-irradiated, ion exchange occurs between ionized matrix transfer protons and the analyte molecules (i.e., biomolecules and metabolites). Finally, the molecules become ionized [11,12,18]. ESI uses an electrospray in which a high voltage is applied to a liquid to create charged aerosol droplets. A wide range of molecules, including chemical compounds, can be ionized without any addition of matrix under ambient conditions [12,13,15,49,50,51]. DESI (desorption electrospray ionization) is a specialized ambient molecular imaging technology, performed without pretreatment, that allows visualization of the spatial localization of targeted molecules, including chemical compounds [52,53,54].
In SIMS, the molecules are secondarily ionized from the surface of the sample, which is bombarded by an energetic primary ion beam (e.g., metal ions such as Au or Bi). This method can provide detailed chemical information about a material of interest, and is therefore useful for identification and localization of metal composites such as gold, magnetic, and semiconducting nanoparticles. Ionized molecules are analyzed by transmission quadrupole, time-of-flight (TOF) or Fourier transform MS [12,16,18,55]. MALDI-TOF-MS and ESI with LC-MS are described in later sections.
The principles underlying MALDI are shown in Figure 4. Tissue samples are prepared with a sprayed matrix. After the matrix is irradiated by the laser, it absorbs energy, causing it to be desorbed and ionized. Biomolecules are not ionized directly, but are instead desorbed with the matrix around the irradiated site. Subsequently, protons are exchanged between the ionized matrix and the biomolecules, and the biomolecules are ultimately ionized. Matrices used for MALDI include sinapinic acid (SA), α-cyano-4-hydroxycinnamic acid (CHCA), and 2,5-dihydroxybenzoic acid (DHBA). SA is commonly used for protein analysis. Both CHCA and DHBA are used for analysis of peptides or tryptic peptides. Low-MW metabolites, lipids, or compounds are often measured as a matrix in MALDI [11,12,18,56].
Recently, other matrices (e.g., graphene, 2-amino-4,5-diphenylfuran-3-carboxylic acid, and graphene oxide (GO)-modified sinapinic acid) or the integration of nanotechnology with mass spectrometry have been used to measure ionized molecules with high sensitivity and selectivity [57,58,59,60,61,62].
In TOF-MS, the ionized sample reaches the detector (Figure 5). Light ions arrive faster than heavy ions, and molecular size is determined by the velocity. A doubly charged ion arrives twice as fast as a singly charged ion. Therefore, to state it more precisely, the mass-to-charge ratio (m/z) can be measured. The tissue distribution of a targeted molecule can be acquired as a picture after the measurement of multiple small areas by MALDI-TOF-MS. In some cases, MS/MS analysis can be used to verify the identity of drugs and drug metabolites.
The MSI device we used for drug imaging is called a mass microscope [21,22,63,64,65,66]. This instrument is a conceptually new imaging device that combines an optical microscope system with a high resolution (≤ 10 µm) with atmospheric-pressure MALDI, which is distinct from conventional MALDI-TOF-MS. A transmitted light image of a target tissue can be obtained directly via the dedicated microscope apparatus, making comparison with the MSI image easy. Figure 6 shows the difference in resolution between 100 µm and 10 µm. At a resolution of 100 µm, gray matter and white matter can be discriminated, but it is difficult to obtain more detail. On the other hand, at a resolution of 10 µm, the distribution of the granule layer in white matter can be easily distinguished. Thus, the use of a high-resolution mass microscope is advantageous for observing finer tissue distributions.
Figure 7 shows a schematic of drug imaging using a mass microscope. The results of this analysis, which used a dilution series of paclitaxel as a standard, revealed that the imaging intensity of the drug was correlated with the amount of the drug. It was able to be used as a semi-quantitative measurement (Figure 7).
As mentioned above, many DDS drugs have been developed to take advantage of the EPR effect. To date, SMANCS (polymer conjugate neocarzinostatin), Doxil (doxorubicin-enclosing liposome), and abraxane (paclitaxel/albumin suspension) have already been clinically applied [67,68,69]. In Japan, which leads the world in DDS research based on nanotechnology, excellent nanocarriers such as micelles, MENDs (multifunctional envelope-type nano devices) and improved liposomes have been successfully developed [40,41,47,70]. In addition, many technologies, including antibody or ligand conjugation, size reduction or pH-dependent dissociation to improve cancer targeting, tumor penetration, and controlled release of drug, have been exploited [38,40,47,71,72].
Another representative DDS drug is paclitaxel (PTX)-encapsulated micelles [73]. In preclinical settings, PTX-micelles exerted stronger anti-tumor effects than free PTX. Peripheral neuropathy is a major adverse effect of PTX, but in a mouse model of mechanical sensory threshold testing, free PTX caused peripheral neuropathy whereas PTX-micelle did not.
We evaluated the drug distribution in both tumor and peripheral neuronal tissue using a mass microscope. In the tumor, free PTX and PTX released from micelles were detected. A free-PTX signal was detected 15 min and 1 hr after administration, but decreased gradually and disappeared by 24 hr. In contrast to free PTX, released PTX was detected more than 24 hr after the administration, and the signal intensity was greater at 24 hr than at 15 min or 1 hr (Figure 8) [22].
Next, we conducted drug imaging in normal neuronal tissue. A strong free-PTX signal was detected in perineuronal lesions 30 min and 1 h after administration. By contrast, the released PTX signal from PTX micelles was extremely weak around the neuron (Figure 8). This significant difference in distribution may explain why PTX micelles do not cause neurotoxicity. Thus, we succeeded in visualizing the EPR effect for the first time [22].
Several ADCs have been used in the clinic already, and more than 40 ADCs have been studied in clinical trials worldwide [26,27,30]. One important application of ADCs is in treating relapsed or refractory malignant diseases. For example, SGN-35 is effective for patients with CD30-positive relapsed or refractory malignant lymphoma, and T-DM1 is also effective for patients with HER2-positive advanced or remnant breast cancer previously treated with standard drugs, including naked anti-HER2 antibody [34,74,75,76]. An ADC has three parts: Antibody, linker, and drug. The drug is conjugated to the antibody via the specialized linker.
As described above, ADCs act via steps, and evaluation of antibody delivery and controlled release throughout these steps is very important. ADCs are also capable of active targeting, depending on specific recognition of and binding to the target antigen [10,28]. The linker is stable in the bloodstream, but should efficiently release the drug in the tumor cells or within their microenvironment [27,77,78]. The total number of drug molecules that can be conjugated with a single antibody molecule is usually about four, but can be up to eight. Therefore, highly toxic agents must be used [26,27,77,78].
Linker technology is a typical controlled-release technology in DDS. Therefore, it is clear that ADC should be considered to belong to the DDS drug category. We previously succeeded in visualizing the four steps of antibody delivery using molecular imaging modalities such as fluorescence or PET (positron emission tomography)/SPECT (single photon emission computed tomography) [79,80]. We succeeded in developing an anti-tissue factor (TF)-ADC, which had a significant anti-tumor effect in a xenograft model of PC [81]. We sought to visualize the controlled release of anti-TF-ADC in the final step.
The MW of monomethyl auristatin E (MMAE) is 717.5. In MS analysis, three positive-ion peaks are derived from MMAE: m/z 718.4, 740.4, and 756.4, representing singly charged hydrogen [M + H]+, sodium [M + Na]+, and potassium [M + K]+ ions, respectively. Finally, we selected and confirmed the prominent fragment m/z 496.3 as a MMAE-specific fragment peak, detected when m/z 740.4 was used as a precursor ion. Thus, we succeeded in visualizing and quantifying MMAE separately from other biomolecules. We also found that most MMAE was not released from the ADC during the MALDI process. Therefore, we concluded that controlled release of ADC can be visualized and quantified by MSI [21].
Control ADC or anti-TF-ADC was administered into a mouse bearing a human pancreatic cancer tumor. When the tumor samples were examined by MSI, a stronger released MMAE signal was detected from anti-TF-ADC than from control ADC. The signal was strongest at 24 hr after the administration (Figure 9). We concluded that ADC distribution and controlled drug release in the tumor area were successful [21].
MALDI-MSI, including mass microscopy, is widely used worldwide. Some drugs cannot be visualized because of low ionization efficiency due to matrix dependency. Moreover, matrix preparation makes it difficult to visualize living cells. However, essentially all drugs can be ionized using the ESI method [12,13,15,16,50]. Moreover, ESI can generate multiply charged ion species, thereby effectively extending the dynamic range. Even if a molecule has a MW of 10,000, the 20-valent ions would be m/z 500 and the 40-valent ions m/z 250, which could be identified by MS (Figure 10).
Molecular weight is determined by computer analysis by calculating each multiply charged ion. Indeed, ESI has been widely used for analysis of not only drugs but also other materials, such as natural products and biopolymers [11,12,15]. Because of the very soft ionization process in the ESI method, it is suitable for visualization of high polarity, barely volatile, and thermally unstable drugs. DESI, derived from ESI, is an ambient ionization technique performed without pretreatment that can be coupled to mass spectrometry to visualize analyte molecules at atmospheric conditions (Figure 11) [52,53,54]. However, ionized molecules must be transferred to the mass analyzer via the same inlet, posing the risk of contamination caused by carryover of sequential samples [82].
Therefore, we have focused on ESI-MSI using liquid extraction surface analysis (LESA) as another type of ambient ionization technique performed without pretreatment. This method has the advantage of fully automated liquid extraction—based surface sampling, and can provide information about drug distribution (Figure 12). Because the tips and nanospray nozzles for ESI-MSI with LESA are single-use, there is no carryover-dependent contamination [82]. We injected imatinib, a small compound, into mice bearing gastrointestinal tumors (GISTs), and observed strong imatinib signals in the tumor (Figure 13) [83]. Next, we visualized controlled release of MMAE from the ADC in a mouse model of brain tumor. Unlike the skin xenograft model, we hardly detected the released MMAE signal in the brain tumor samples. We speculated that antibody distribution was suppressed by low blood supply or the blood-brain barrier. Subsequently, we attempted to use LISA-ESI-MSI for the visualization of controlled released of MMAE. As expected, we observed a strong signal of released MMAE (our unpublished data).
The measurement sensitivity of ESI-MSI is higher than that of MALDI-MSI, although the spatial resolution of mass microscopy (5-10 µm) is still superior to that of LISA-ESI-MSI (0.5-1 mm). Recently, however, technologies for single-cell ESI-MSI analysis have progressed considerably. For example, a nano-ESI tip robotically controlled under a TV monitor enables the evaluation of single-organelle proteomics from living cells [72,84]. These technologies will be useful to improve the spatial resolution of ESI-MSI.
We emphasize the usefulness of MSI, which can provide information about not only drug delivery but also controlled release of the drug. Moreover, with drugs and their metabolites, many biomolecules can be visualized at the same time. Markers involved in the efficacy and toxicity of drugs should be evaluated, and it should be possible to discover a new biomarker to predict or monitor drug efficacy and toxicity. We developed several ADCs such as stroma-targeting ADC (CAST therapy), anti-TF-ADC, and anti-IL-7R ADC [81,85,86,87,88]. We used fluorescence and PET/SPECT for the evaluation of antibody delivery [21,80].
We also conducted general PK analysis using homogenized tissue samples; however, this approach provided only average drug concentration, but no information about drug delivery, controlled release, or action on cells within tumor tissue. Visual observation of these aspects is necessary to determine the MOA of CAST therapy. To address this need, we introduced MSI into our approach to early drug discovery and development. Although the spatial resolution and measurement sensitivity are not sufficient for wide use, technological advances (including single-cell analysis) will make more general application possible in the near future [84].
Single-cell MS analysis will also provide molecular-level insight into cancer cell heterogeneity and complex microenvironments consisting of multiple varieties of cells. In addition, it would be helpful for finding or validating druggable targets and biomarkers of drug efficacy and toxicity. Therefore, MSI is a very attractive and beneficial method for early drug discovery and development.
The authors thank S. Saijou, S. Hanaoka, and R. Tsumura for assistance in producing antibodies. We also thank Y. Fujiwara for assistance with the study of MS imaging, and M. Nakayama and M. Shimada for secretarial support. This work was financially supported by grants from the National Cancer Center Research and Development Fund (27-S-5 and 29-S-1 to Masahiro Yasunaga and 26-A-14 to Yasuhiro Matsumura); a Grant-in-Aid for Scientific Research on Priority Areas from the Ministry of Education, Culture, Sports and Science of Japan (Yasuhiro Matsumura); JSPS KAKENHI Grant (15H04316 to Masahiro Yasunaga, 16K15600 to Toshirou Nishida and Masahiro Yasunaga and 16H05419 to Toshirou Nishida and Masahiro Yasunaga); Number 15H04316 (Masahiro Yasunaga); Practical Research for Innovative Cancer Control (16ck0106114h0003) from the Japan Agency for Medical Research and Development, AMED (Masahiro Yasunaga); Project Mirai Cancer Research Grants (Masahiro Yasunaga); the Princess Takamatsu Cancer Research Fund (Masahiro Yasunaga); Japan Leukemia Research Fund (Masahiro Yasunaga); Kawano Masanori Memorial Public Interest Incorporated Foundation for Promotion of Pediatrics (Masahiro Yasunaga).
The authors declare no competing financial interests.
[1] |
Z. Cang, Q. Nie, Inferring spatial and signaling relationships between cells from single cell transcriptomic data, Nat. Commun., 11 (2020), 1–13. https://doi.org/10.1016/S1350-4789(20)30374-3 doi: 10.1016/S1350-4789(20)30374-3
![]() |
[2] |
S. Jin, L. Zhang, Q. Nie, scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles, Genome Biol., 21 (2020), 1–19. https://doi.org/10.1186/s13059-020-1932-8 doi: 10.1186/s13059-020-1932-8
![]() |
[3] |
J. Zhang, Q. Nie, T. Zhou, Revealing dynamic mechanisms of cell fate decisions from single-cell transcriptomic data, Front. Genet., 10 (2019), 1280. https://doi.org/10.1039/C9PY90042J doi: 10.1039/C9PY90042J
![]() |
[4] | H. Harrington, E. Drellich, A. Gainer-Dewar, Q. He, C. Heitsch, S. Poznanovic, Geometric Combinatorics and Computational Molecular Biology: Branching Polytopes for RNA Sequences, 2017. |
[5] |
Ö. İş, X. Wang, J. S. Reddy, Y. Min, E. Yilmaz, P. Bhattarai, et al., Gliovascular transcriptional perturbations in Alzheimer's disease reveal molecular mechanisms of blood brain barrier dysfunction, Nat. Commun., 15 (2024), 4758. https://doi.org/10.1038/s41467-024-48926-6 doi: 10.1038/s41467-024-48926-6
![]() |
[6] |
X. Wang, M. Allen, S. Li, Z. S. Quicksall, T. A. Patel, T. P. Carnwath, et al., Deciphering cellular transcriptional alterations in Alzheimer's disease brains, Mol. Neurodegener., 15 (2020), 38. https://doi.org/10.1186/s13024-020-00392-6 doi: 10.1186/s13024-020-00392-6
![]() |
[7] |
Y. Min, X. Wang, Ö. İş, T. A. Patel, J. Gao, J. S. Reddy, et al., Cross species systems biology discovers glial DDR2, STOM, and KANK2 as therapeutic targets in progressive supranuclear palsy, Nat. Commun., 14 (2023), 6801. https://doi.org/10.1038/s41467-023-42626-3 doi: 10.1038/s41467-023-42626-3
![]() |
[8] |
X. Wang, M. Allen, Ö. İş, J. S. Reddy, F. Q. Tutor-New, M. C. Casey, et al., Alzheimer's disease and progressive supranuclear palsy share similar transcriptomic changes in distinct brain regions, J. Clin. Invest., 132 (2022). https://doi.org/10.1172/JCI149904 doi: 10.1172/JCI149904
![]() |
[9] |
J. S. Reddy, M. Allen, C. C. Ho, S. R. Oatman, Ö. İş, Z. S. Quicksall, et al., Genome-wide analysis identifies a novel LINC-PINT splice variant associated with vascular amyloid pathology in Alzheimer's disease, Acta Neuropathol. Commun., 9 (2021). https://doi.org/10.1186/s40478-021-01199-2 doi: 10.1186/s40478-021-01199-2
![]() |
[10] |
M. Allen, X. Wang, J. D. Burgess, J. Watzlawik, D. J. Serie, C. S. Younkin, et al., Conserved brain myelination networks are altered in Alzheimer's and other neurodegenerative diseases, Alzheimer's Dementia, 14 (2018), 352–366. https://doi.org/10.1016/j.jalz.2017.09.012 doi: 10.1016/j.jalz.2017.09.012
![]() |
[11] |
Y. Zhang, S. A. Sloan, L. E. Clarke, C. Caneda, C. A. Plaza, P. D. Blumenthal, et al., Purification and characterization of progenitor and mature human astrocytes reveals transcriptional and functional differences with mouse, Neuron, 89 (2016), 37–53. https://doi.org/10.1016/j.neuron.2015.11.013 doi: 10.1016/j.neuron.2015.11.013
![]() |
[12] |
S. Darmanis, S. A. Sloan, Y. Zhang, M. Enge, C. Caneda, L. M. Shuer, et al., A survey of human brain transcriptome diversity at the single cell level, PNAS, 112 (2015), 7285–7290. https://doi.org/10.1073/pnas.150712511 doi: 10.1073/pnas.150712511
![]() |
[13] |
B. B. Lake, S. Chen, B. C. Sos, J. Fan, G. E. Kaeser, Y. C. Yung, et al., Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain, Nat. Biotechnol., 36 (2018), 70–80. https://doi.org/10.1038/nbt.4038 doi: 10.1038/nbt.4038
![]() |
[14] |
H. M. Davey, D. B. Kell, Flow cytometry and cell sorting of heterogeneous microbial populations: the importance of single-cell analyses, Microbiol. Rev., 60 (1996), 641–696. https://doi.org/10.1128/mr.60.4.641-696.1996 doi: 10.1128/mr.60.4.641-696.1996
![]() |
[15] |
A. R. Whitney, M. Diehn, S. J. Popper, A. A. Alizadeh, J. C. Boldrick, D. A. Relman, et al., Individuality and variation in gene expression patterns in human blood, PNAS, 100 (2003), 1896–1901. https://doi.org/10.1073/pnas.252784499 doi: 10.1073/pnas.252784499
![]() |
[16] |
D. de Ridder, C. Van Der Linden, T. Schonewille, W. Dik, M. Reinders, J. Van Dongen, et al., Purity for clarity: the need for purification of tumor cells in DNA microarray studies, Leukemia, 19 (2005), 618–627. https://doi.org/10.1038/sj.leu.2403685 doi: 10.1038/sj.leu.2403685
![]() |
[17] |
A. T. McKenzie, S. Moyon, M. Wang, I. Katsyv, W. M. Song, X. Zhou, et al., Multiscale network modeling of oligodendrocytes reveals molecular components of myelin dysregulation in Alzheimer's disease, Mol. Neurodegener., 12 (2017), 82. https://doi.org/10.1186/s13024-017-0219-3 doi: 10.1186/s13024-017-0219-3
![]() |
[18] |
S. Mostafavi, C. Gaiteri, S. E. Sullivan, C. C. White, S. Tasaki, J. Xu, et al., A molecular network of the aging human brain provides insights into the pathology and cognitive decline of Alzheimer's disease, Nat. Neurosci., 21 (2018), 811–819. https://doi.org/10.1038/s41593-018-0154-9 doi: 10.1038/s41593-018-0154-9
![]() |
[19] |
P. L. De Jager, Y. Ma, C. McCabe, J. Xu, B. N. Vardarajan, D. Felsky, et al., A multi-omic atlas of the human frontal cortex for aging and Alzheimer's disease research, Sci. Data, 5 (2018), 180142. https://doi.org/10.1038/sdata.2018.142 doi: 10.1038/sdata.2018.142
![]() |
[20] |
M. Allen, M. M. Carrasquillo, C. Funk, B. D. Heavner, F. Zou, C. S. Younkin, et al., Human whole genome genotype and transcriptome data for Alzheimer's and other neurodegenerative diseases, Sci. Data, 3 (2016), 160089. https://doi.org/10.1038/sdata.2016.89 doi: 10.1038/sdata.2016.89
![]() |
[21] |
A. Kuhn, D. Thu, H. J. Waldvogel, R. L. Faull, R. Luthi-Carter, Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain, Nat. Methods, 8 (2011), 945–947. https://doi.org/10.1038/nmeth.1710 doi: 10.1038/nmeth.1710
![]() |
[22] |
M. Chikina, E. Zaslavsky, S. C. Sealfon, CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations, Bioinformatics, 31 (2015), 1584–1591. https://doi.org/10.1093/bioinformatics/btv015 doi: 10.1093/bioinformatics/btv015
![]() |
[23] |
X. Wang, J. Park, K. Susztak, N. R. Zhang, M. Li, Bulk tissue cell type deconvolution with multi-subject single-cell expression reference, Nat. Commun., 10 (2019), 1–9. https://doi.org/10.1038/s41467-018-08023-x doi: 10.1038/s41467-018-08023-x
![]() |
[24] |
A. M. Newman, C. B. Steen, C. L. Liu, A. J. Gentles, A. A. Chaudhuri, F. Scherer, et al., Determining cell type abundance and expression from bulk tissues with digital cytometry, Nat. Biotechnol., 37 (2019), 773–782. https://doi.org/10.1038/s41587-019-0114-2 doi: 10.1038/s41587-019-0114-2
![]() |
[25] |
K. Zaitsev, M. Bambouskova, A. Swain, M. N. Artyomov, Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures, Nat. Commun., 10 (2019), 1–16. https://doi.org/10.1038/s41467-019-09990-5 doi: 10.1038/s41467-019-09990-5
![]() |
[26] |
T. Chu, Z. Wang, D. Pe'er, C. G. Danko, Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology, Nat. Cancer, 3 (2022), 505–517. https://doi.org/10.1038/s43018-022-00356-3 doi: 10.1038/s43018-022-00356-3
![]() |
[27] |
E. Becht, N. A. Giraldo, L. Lacroix, B. Buttard, N. Elarouci, F. Petitprez, et al., Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression, Genome Biol., 17 (2016), 1–20. https://doi.org/10.1186/s13059-016-1070-5 doi: 10.1186/s13059-016-1070-5
![]() |
[28] |
D. Aran, Z. Hu, A. J. Butte, xcell: digitally portraying the tissue cellular heterogeneity landscape, Genome Biol., 18 (2017), 1–14. https://doi.org/10.1186/s13059-017-1349-1 doi: 10.1186/s13059-017-1349-1
![]() |
[29] |
J. Ahn, Y. Yuan, G. Parmigiani, M. B. Suraokar, L. Diao, I. I. Wistuba, et al., Demix: deconvolution for mixed cancer transcriptomes using raw measured data, Bioinformatics, 29 (2013), 1865–1871. https://doi.org/10.1093/bioinformatics/btt301 doi: 10.1093/bioinformatics/btt301
![]() |
[30] |
X. L. Peng, R. A. Moffitt, R. J. Torphy, K. E. Volmar, J. J. Yeh, De novo compartment deconvolution and weight estimation of tumor samples using decoder, Nat. Commun., 10 (2019), 4729. https://doi.org/10.1038/s41467-019-12517-7 doi: 10.1038/s41467-019-12517-7
![]() |
[31] |
K. Kang, Q. Meng, I. Shats, D. M. Umbach, M. Li, Y. Li, et al., CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data, PLoS Comput. Biol., 15 (2019), e1007510. https://doi.org/10.1371/journal.pcbi.1007510 doi: 10.1371/journal.pcbi.1007510
![]() |
[32] | G. Monaco, B. Lee, W. Xu, S. Mustafah, Y. Y. Hwang, C. Carré, et al., RNA-Seq signatures normalized by mRNA abundance allow absolute deconvolution of human immune cell types, Cell Rep., 26 (2019), 1627–1640. |
[33] |
Y. Im, Y. Kim, A comprehensive overview of rna deconvolution methods and their application, Mol. Cells, 46 (2023), 99–105. https://doi.org/10.14348/molcells.2023.2178 doi: 10.14348/molcells.2023.2178
![]() |
[34] |
H. Nguyen, H. Nguyen, D. Tran, S. Draghici, T. Nguyen, Fourteen years of cellular deconvolution: methodology, applications, technical evaluation and outstanding challenges, Nucleic Acids Res., 52 (2024), 4761–4783. https://doi.org/10.1093/nar/gkae267 doi: 10.1093/nar/gkae267
![]() |
[35] |
P. Paatero, U. Tapper, Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, 5 (1994), 111–126. https://doi.org/10.1002/env.3170050203 doi: 10.1002/env.3170050203
![]() |
[36] |
W. K. Ma, J. M. Bioucas-Dias, T. H. Chan, N. Gillis, P. Gader, A. J. Plaza, et al., A signal processing perspective on hyperspectral unmixing: Insights from remote sensing, IEEE Signal Process Mag., 31 (2013), 67–81. https://doi.org/10.1109/MSP.2013.2279731 doi: 10.1109/MSP.2013.2279731
![]() |
[37] |
X. Fu, W. K. Ma, T. H. Chan, J. M. Bioucas-Dias, Self-dictionary sparse regression for hyperspectral unmixing: Greedy pursuit and pure pixel search are related, IEEE J. Sel. Top. Signal Process., 9 (2015), 1128–1141. https://doi.org/10.1109/JSTSP.2015.2410763 doi: 10.1109/JSTSP.2015.2410763
![]() |
[38] | S. Zhang, W. Wang, J. Ford, F. Makedon, Learning from incomplete ratings using non-negative matrix factorization, in Proceedings of the 2006 SIAM International Conference on Data Mining, SIAM, (2006), 549–553. https://doi.org/10.1137/1.9781611972764.58 |
[39] |
M. D. Craig, Minimum-volume transforms for remotely sensed data, IEEE Trans. Geosci. Remote Sens., 32 (1994), 542–552. https://doi.org/10.1109/36.297973 doi: 10.1109/36.297973
![]() |
[40] |
D. D. Lee, H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, 401 (1999), 788–791. https://doi.org/10.1038/44565 doi: 10.1038/44565
![]() |
[41] |
D. Tsoucas, R. Dong, H. Chen, Q. Zhu, G. Guo, G. C. Yuan, Accurate estimation of cell-type composition from gene expression data, Nat. Commun., 10 (2019), 1–9. https://doi.org/10.1038/s41467-019-10802-z doi: 10.1038/s41467-019-10802-z
![]() |
[42] |
F. Avila Cobos, J. Vandesompele, P. Mestdagh, K. De Preter, Computational deconvolution of transcriptomics data from mixed cell populations, Bioinformatics, 34 (2018), 1969–1979. https://doi.org/10.1093/bioinformatics/bty019 doi: 10.1093/bioinformatics/bty019
![]() |
[43] |
S. Mohammadi, N. Zuckerman, A. Goldsmith, A. Grama, A critical survey of deconvolution methods for separating cell types in complex tissues, Proc. IEEE, 105 (2016), 340–366. https://doi.org/10.1109/JPROC.2016.2607121 doi: 10.1109/JPROC.2016.2607121
![]() |
[44] |
A. M. Newman, C. L. Liu, M. R. Green, A. J. Gentles, W. Feng, Y. Xu, et al., Robust enumeration of cell subsets from tissue expression profiles, Nat. Methods, 12 (2015), 453–457. https://doi.org/10.1038/nmeth.3337 doi: 10.1038/nmeth.3337
![]() |
[45] |
W. Qiao, G. Quon, E. Csaszar, M. Yu, Q. Morris, P. W. Zandstra, PERT: a method for expression deconvolution of human blood samples from varied microenvironmental and developmental conditions, PLoS Comput. Biol., 8 (2012), e1002838. https://doi.org/10.1371/journal.pcbi.1002838 doi: 10.1371/journal.pcbi.1002838
![]() |
[46] |
Y. Zhong, Y. W. Wan, K. Pang, L. M. Chow, Z. Liu, Digital sorting of complex tissues for cell type-specific gene expression profiles, BMC Bioinf., 14 (2013), 89. https://doi.org/10.1186/1471-2105-14-89 doi: 10.1186/1471-2105-14-89
![]() |
[47] |
T. Gong, J. D. Szustakowski, DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data, Bioinformatics, 29 (2013), 1083–1085. https://doi.org/10.1093/bioinformatics/btt090 doi: 10.1093/bioinformatics/btt090
![]() |
[48] |
A. Cui, G. Quon, A. M. Rosenberg, R. S. Yeung, Q. Morris, B. S. Consortium, Gene expression deconvolution for uncovering molecular signatures in response to therapy in juvenile idiopathic arthritis, PloS One, 11 (2016), e0156055. https://doi.org/10.1371/journal.pone.0156055 doi: 10.1371/journal.pone.0156055
![]() |
[49] |
A. R. Abbas, K. Wolslegel, D. Seshasayee, Z. Modrusan, H. F. Clark, Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus, PloS One, 4 (2009), e6098. https://doi.org/10.1371/journal.pone.0006098 doi: 10.1371/journal.pone.0006098
![]() |
[50] |
R. Gaujoux, C. Seoighe, Semi-supervised nonnegative matrix factorization for gene expression deconvolution: a case study, Infect. Genet. Evol., 12 (2012), 913–921. https://doi.org/10.1016/j.meegid.2011.08.014 doi: 10.1016/j.meegid.2011.08.014
![]() |
[51] |
S. S. Shen-Orr, R. Gaujoux, Computational deconvolution: extracting cell type-specific information from heterogeneous samples, Curr. Opin. Immunol., 25 (2013), 571–578. https://doi.org/10.1016/j.coi.2013.09.015 doi: 10.1016/j.coi.2013.09.015
![]() |
[52] | N. Gillis, Nonnegative Matrix Factorization: Complexity, Algorithms and Applications, Ph.D theis, Université catholique de Louvain. Louvain-La-Neuve: CORE, 2011. |
[53] | A. Cichocki, R. Zdunek, A. H. Phan, S. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation, John Wiley & Sons, 2009. |
[54] | D. Lee, H. S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems, 13 (2000). |
[55] |
Y. X. Wang, Y. J. Zhang, Nonnegative matrix factorization: A comprehensive review, IEEE Trans. Knowl. Data Eng., 25 (2012), 1336–1353. https://doi.org/10.1109/TKDE.2012.51 doi: 10.1109/TKDE.2012.51
![]() |
[56] |
X. Fu, K. Huang, N. D. Sidiropoulos, W. K. Ma, Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications, IEEE Signal Process. Mag., 36 (2019), 59–80. https://doi.org/10.1109/MSP.2018.2877582 doi: 10.1109/MSP.2018.2877582
![]() |
[57] |
K. Huang, N. D. Sidiropoulos, A. Swami, Non-negative matrix factorization revisited: Uniqueness and algorithm for symmetric decomposition, IEEE Trans. Signal Process., 62 (2013), 211–224. https://doi.org/10.1109/TSP.2013.2285514 doi: 10.1109/TSP.2013.2285514
![]() |
[58] | D. Donoho, V. Stodden, When does non-negative matrix factorization give a correct decomposition into parts?, in Advances in Neural Information Processing Systems 16 (NIPS 2003), (2004), 1141–1148. |
[59] |
H. Laurberg, M. G. Christensen, M. D. Plumbley, L. K. Hansen, S. H. Jensen, Theorems on positive data: On the uniqueness of NMF, Comput. Intell. Neurosci., 2008 (2008). https://doi.org/10.1155/2008/764206 doi: 10.1155/2008/764206
![]() |
[60] |
D. Chen, S. Li, X. Wang, Geometric structure guided model and algorithms for complete deconvolution of gene expression data, Found. Data Sci., 4 (2022), 441. https://doi.org/10.3934/fods.2022013 doi: 10.3934/fods.2022013
![]() |
[61] | M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in Advances in Neural Information Processing Systems 14 (NIPS 2001), (2002), 585–591. |
[62] | D. Cai, X. Wang, X. He, Probabilistic dyadic data analysis with local and global consistency, in Proceedings of the 26th Annual International Conference on Machine Learning, (2009), 105–112. https://doi.org/10.1145/1553374.1553388 |
[63] | X. He, P. Niyogi, Locality preserving projections, in Advances in Neural Information Processing Systems 16 (NIPS 2003), (2004), 153–160. |
[64] |
U. Von Luxburg, A tutorial on spectral clustering, Stat. Comput., 17 (2007), 395–416. https://doi.org/10.1007/s11222-007-9033-z doi: 10.1007/s11222-007-9033-z
![]() |
[65] | J. Qin, H. Lee, J. T. Chi, Y. Lou, J. Chanussot, A. L. Bertozzi, Fast blind hyperspectral unmixing based on graph laplacian, in 2019 10th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), IEEE, (2019), 1–5. https://doi.org/10.1109/WHISPERS.2019.8921375 |
[66] | J. Eckstein, W. Yao, Augmented Lagrangian and alternating direction methods for convex optimization: A tutorial and some illustrative computational results, RUTCOR Res. Rep., 32 (2012), 44. |
[67] |
S. S. Shen-Orr, R. Tibshirani, P. Khatri, D. L. Bodian, F. Staedtler, N. M. Perry, et al., Cell type–specific gene expression differences in complex tissues, Nat. Methods, 7 (2010), 287–289. https://doi.org/10.1038/nmeth.1439 doi: 10.1038/nmeth.1439
![]() |
[68] |
H. Mathys, J. Davila-Velderrain, Z. Peng, F. Gao, S. Mohammadi, J. Z. Young, et al., Single-cell transcriptomic analysis of Alzheimer's disease, Nature, 570 (2019), 332–337. https://doi.org/10.1038/s41586-019-1195-2 doi: 10.1038/s41586-019-1195-2
![]() |
[69] |
W. V. Li, J. J. Li, A statistical simulator scdesign for rational scrna-seq experimental design, Bioinformatics, 35 (2019), i41–i50. https://doi.org/10.1093/bioinformatics/btz321 doi: 10.1093/bioinformatics/btz321
![]() |
1. | Masahiro Yasunaga, Antibody therapeutics and immunoregulation in cancer and autoimmune disease, 2020, 64, 1044579X, 1, 10.1016/j.semcancer.2019.06.001 | |
2. | Atsushi B. Tsuji, Tsuneo Saga, 2019, Chapter 13, 978-4-431-56878-0, 289, 10.1007/978-4-431-56880-3_13 |