I am a PhD candidate in the Department of Statistics and Data Science in Carnegie Mellon University. I am very fortunate to be advised by Professor Kathryn Roeder. While she is my primary thesis advisor, I also collaborate extensively with Professors Tianyu Zhang and Yuting Wei on a variety of projects, many of which are joint efforts with Professor Roeder. We worked on developing and applying deep learning methods on high dimensional, nonparamatric data, particularly in settings with strong dependencies. I am also interested in algorithm design for improving transformer architectures and large language models (LLMs) via reinforcement learning methods such as PPO and GRPO, in close collaboration with Professor Linjun Zhang. I have also worked extensively with Professor Weijing Tang on LLM benchmarking and statistical network inference. Before coming to CMU, I obatined a Bachelor’s degree in Mathematics from University of Science and technology of China in 2023.
Research Interests
I’m interested in developing and applying machine learning methodologies on real-data applications including:
- Advance LLM performance and benchmark frameworks via reinforcement learning algorithms, including PPO and GRPO.
- Single-cell RNA sequencing analysis (both novel scientific questions and methodological developments)
- Generative Models and Synthetic Data Analysis (especially the diffusion model and data debiasing)
- Statistical Network Inference
Publications
- Ergan Shang, Yuting Wei, Kathryn Roeder, “Predicting the unseen: a diffusion-based debiasing framework for transcriptional response prediction at single-cell resolution”. Paper
- Tianyu Zhang, Ergan Shang and Kathryn Roeder, “Genetic Convergence Analysis of CRISPR Perturbations Deciphers Gene Functional Similarity”. Paper
Working Papers
- Ergan Shang, Weijing Tang and Yinqiu He, “LLM evaluation, out-of-sample performance prediction, multidimensional item response theory, contextual embeddings”
- Ergan Shang, Kathryn Roeder and Yuting Wei, “SCARF: A Single-Cell perturbAtion pRediction Framework
- Ergan Shang, Weijing Tang and Yuan Zhang, “Inference for Balance Theory in Time-Varying Signed Network”
Software
HMC. Github. R CRAN. Tutorial (co-developed with Tianyu Zhang):
Tianyu Zhang, Jing Lei, and Kathryn Roeder. “Debiased Projected Two-Sample Comparisonscfor Single-Cell Expression Data.”
News
- 2026.05. I will start to work as a Machine Learning Research Scientist Intern at Meta in Seattle.
- 2025.06. I presented a poster titled “Inference for Balance Theory in Time-Varying Signed Network” in the 2025 Workshop on Statistical Network Analysis and Beyond (SNAB 2025) in Tokyo, Japan.
- 2025.04. I am honored to receive the Mihaela Serban Award for the Best Student Poster in the American Statistical Association(ASA) Banquet in Pittsburgh.
Life and Hobby
I’m a big fan of Japanese culture, JPOP, and both anime and animation. I’ve been learning Japanese on my own for more than two years so that I can sing Japanese songs and sometimes watch Japanese TV series without subtitles. My favorite animation character is Usagi in Chiikawa, and my favorite animation is Evangelion. Here is a photo of my collections of souvenirs of Evangelion.
