I will be joining AITHYRA in Vienna, Austria as a principle investigator. I completed my PhD in the EECS department at MIT, advised by Prof. Caroline Uhler. Our group, in a highly interdisciplinary institute, offers exciting opportunities for innovations in machine learning methods to integrate multimodal and spatiotemporal data to achieve a holistic understanding of cell states, tissue microenvironments, and perturbation effects. The first call for PhD students is out, due September 10th: https://apply.cemm.at/. We also invite outstanding candidates to apply for postdoctoral research positions (postdoc call).

Publications

An Information Criterion for Controlled Disentanglement of Multimodal Data. Wang, C., Gupta, S., Zhang, X., Tonekaboni, S., Jegelka, S., Jaakkola, T. & Uhler, C., ICLR 2025.
Prediction of protein subcellular localization in single cells. Zhang, X.*, Tseo Y.*, Bai, Y., Chen, F., & Uhler, C., Nat. Methods 2025 (*: equal contribution). https://github.com/uhlerlab/PUPS.
- The subcellular localization of a protein is important for its function, and its mislocalization is linked to numerous diseases. Existing datasets capture limited pairs of proteins and cell lines, and existing protein localization prediction models either miss cell-type specificity or cannot generalize to unseen proteins. Our model for protein localization prediction combines a protein language model and an image inpainting model to utilize both protein sequence and cellular images. We demonstrate that the protein sequence input enables generalization to unseen proteins, and the cellular image input captures single-cell variability, enabling cell-type-specific predictions. Experimental validation shows that PUPS can predict protein localization in newly performed experiments outside of the Human Protein Atlas used for training. Our model provides a framework for predicting protein localization, leading to quantification of mutation effects and differential protein localization across cell lines and single cells within a cell line.
Partially Shared Multi-Modal Embedding Learns Holistic Representation of Cell State. Zhang, X., Shivashankar, G. V. & Uhler, C., bioRxiv 2024. https://github.com/uhlerlab/APOLLO/.
- Using measurements with multiple modalities on the same cells, our method disentangles the shared information between different data modalities and information unique to a particular modality, which could lead to a better understanding of the underlying cellular regulatory mechanisms. We demonstrate that our model is a general framework with applications to paired scRNA-seq and scATAC-seq, paired scRNA-seq and surface protein data, as well as multiplexed imaging. We elucidated whether the activation of a particular genetic pathway is captured by scRNA-seq, scATAC-seq, or both and which morphological features of protein staining are also captured by chromatin staining. While causes of single-cell variations in protein subcellular localization are not well understood, our method is able to determine which cellular components are associated with the observed localization variations for a particular protein. Our model is a general framework that can be applied to any multi-modal data well beyond the single-cell domain including, for example, large-scale medical biobanks.
Unsupervised representation learning of chromatin images identifies changes in cell state and tissue organization in DCIS. Zhang, X., Venkatachalapathy S., Paysan, D., Schaerer P., Tripodo C., Uhler, C. & Shivashankar, G. V., Nat. Commun. 15, 6112 (2024). https://github.com/uhlerlab/DCISprogression.
- Ductal carcinoma in situ (DCIS), accounting for 25% of breast cancer diagnosis, is characterized by its pathological heterogeneity and the consequent high variability in the choice of treatment. It is thus important to develop a pipeline for efficient characterization of DCIS samples. We demonstrate that cheap and efficient chromatin staining paired with a machine learning model can be used for quantitatively comparing cell states and tissue microenvironment across patients. Our pipeline enabled the collection of 560 samples from 122 patients and we found that cell states enriched in invasive cancer exist in small fractions in normal breast tissue. Tissue-level analysis reveals significant changes in the spatial organization of cell states across disease stages, which is predictive of disease stage and phenotypic category.
Graph-based autoencoder integrates spatial transcriptomics with chromatin images and identifies joint biomarkers for Alzheimer’s disease. Zhang, X., Wang, X., Shivashankar, G. V. & Uhler, C., Nat. Commun. 13, 7480 (2022). https://github.com/uhlerlab/STACI.
- We developed a computational method that enables the analysis of cell states based on cellular organization in tissue, nuclear morphology, and gene expression, which can be jointly measured by spatial transcriptomic technologies. Our method enables downstream analysis that simultaneously takes into account all three modalities, prediction of spatial transcriptomic data from nuclear images in unseen tissue sections, and built-in batch effects correction of both gene expression and tissue morphology. We apply our model to analyze the spatio-temporal progression of Alzheimer’s disease in mouse models, which identifies novel separation of cortex regions that were not previously annotated and showed differences in number and size of amyloid plaques as well as gene expression and chromatin condensation states of cells.
Phototoxic effects of nonlinear optical microscopy on cell cycle, oxidative states, and gene expression. Zhang, X., Dorlhiac, G., Landry, M. P. & Streets, A., Sci. Rep. 12, 18796 (2022).

Xinyi Zhang

Publications