Alan Sun

awsun @ mit.edu

In Fall 2026, I will start as a PhD student at MIT CSAIL where I will be advised by Yoon Kim and Jacob Andreas. Currently, I’m a second-year MSCS student at Carnegie Mellon University. I’m grateful to be supported by an NSF Graduate Research Fellowship.

Before CMU, I was a visiting scholar at the Max-Planck Institute for Software Systems advised by Mariya Toneva. I earned my undergraduate degree in Computer Science and Mathematics at Dartmouth College with high honors. At Dartmouth, I did research with Soroush Vosoughi where I created a formal framework to characterize the robustness of language models.

Research

I'm broadly interested in improving the reliability of language models. My work has approached this through three axes: (a) evaluation: how can we develop practical, principled measures of performance? (b) attribution: how do models encode and use structural patterns downstream? (c) intervention: how do we distill actionable insight from attributions for model improvement and control?

Most recently, I've become interested in (1) understanding how/when language models acquire foundational, atomic skills during pre-training and how those acquisition schedules affect its ability to acquire capabilities downstream; (2) significantly decreasing pre-training latency through data pruning (3) diffusion language models as a way to achieve human-like language generation and understanding.

Publications

(Ordered chronologically)

Tracking Equivalent Mechanistic Interpretations Across Neural Networks.
Alan Sun, Mariya Toneva.
ICLR (2026)
Circuit Stability Characterizes Language Model Generalization.
Alan Sun.
ACL (2025)
Achieving Domain-Independent Certified Robustness via Knowledge Continuity.
Alan Sun, Chiyu Ma, Kenneth Ge, Soroush Vosoughi.
NeurIPS (2024)
Deciphering Stereotypes in Pre-Trained Language Models.
Weicheng Ma, Henry Scheible, Brian Wang, Goutham Veeramachaneni, Pratim Chowdhary, Alan Sun, Andrew Koulogeorge, Lili Wang, Diyi Yang, Soroush Vosoughi.
EMNLP (2023)
ThanosNet: A Novel Trash Classification Method Using Metadata.
Alan Sun, Harry Xiao.
IEEE Big Data (2020)