UChicago Large Language Models Course

Course Staff

Chenhao Tan

Instructor

Dang Nguyen

Teaching Assistant

Harvey Fu

Teaching Assistant

Logistics

Lectures: Mondays/Wednesdays 3-4:20, Ryerson 276
Office hours: (DSI -> 5460 University Ave)
- Chenhao: Tuesdays 1-2pm (DSI 321)
- Dang: Mondays 11am-12pm (DSI 363)
- Harvey: Wednesdays 12:30pm-1:30pm (JCL Common Area 3C)
Contact: Email (chenhao@uchicago.edu, dangnguyen@uchicago.edu, harveyfu@uchicago.edu); Ed Discussion Forum on Canvas; Discord (particularly useful for project related discussions).

Content

What is this course about?

Large language models are rapidly reshaping machine learning research and practice, yet many questions remain about how they work, how to ensure they behave as intended, and how to build reliable systems on top of them. This course dives into three core areas at the frontier of LLM research: interpretability, alignment, and agents. Students will learn to analyze circuits and internal representations, probe the geometry of model features through sparse autoencoders and linear representations, and reason about scalable oversight and emergent misalignment. The course will also cover how LLMs are deployed as autonomous agents for software engineering and scientific research, how they are used to simulate human behavior, and how they can complement human decision-making. This is an advanced course and assumes familiarity with transformers and language modeling. We will read and discuss recent publications, with importance placed on analyzing, interpreting, and making arguments from necessarily incomplete empirical evidence. Students will get hands-on experience through assignments and a quarter-long research project that pushes into open problems in the field.

Prerequisites

CMSC 25700/CMSC 35100: Natural Language Processing
DATA 37712/CMSC 37712: Foundations of Machine Learning II: Generative Models

You are expected have understoond the transformer architecture and have experience with training and analyzing language models. Research experience is preferred too.

Coursework

Grading

Quizzes: 20%
Roast or Toast: 10%
Assignments: 20%
Project: 50%

Quizzes

Short quizzes will be held at the beginning of the lecture to assess understanding of the readings.

Roast or Toast

Students will either critically analyze (roast) a paper or propose (imagine) an extension or question from the course readings.

Assignments

There will be three assignments throughout the quarter.

Project

Project Proposal: Due Friday, March 28
Proposal Revision: Due Friday, April 4
Weekly Blog Entries: April 10, April 17, April 24, May 1, May 15
First Draft: Due Thursday, May 8
Final Report: Due Thursday, May 22

Compute

Modal has generously offered compute to each student. See details on Ed.

Textbook

There is no required textbook. Reading materials for each week will be a combination of technical papers and online resources.

Honor Code

We expect students to not look at solutions or implementations online. Like all other classes at UChicago, we take academic honesty very seriously. Please make sure to read the UChicago Academic Honesty page.

Collaboration policy

For individual assignments, collaboration with fellow students is encouraged as long as they are properly disclosed for each submission. However, you should not share any written work or code for your assignments. After discussing a problem with others, you should write the solution by yourself. For final projects, you are expected to work in groups of 1-2, preferrably 1.

AI tools policy

Using generative AI tools such as Claude Code and ChatGPT is allowed as long as they are properly disclosed for each submission. You are encouraged to use AI (e.g., NeuriCo) heavily for the project.

Additional course policies can be found on Canvas.

Submitting Coursework

All coursework should be submitted via Gradescope by the deadline.

Late Days

Each student has 3 late days to use throughout the quarter for assignments. No late submissions will be accepted for project related work.

Other Resources

Alignment Research Engineer Accelerator

Preliminary Schedule

#	Date	Topic	Readings	Deadlines
Interpretability
1	Mon Mar 23	Introduction	Lecture
2	Wed Mar 25	Attention	Lecture What Does BERT Look At? An Analysis of BERT's Attention by Clark et al., 2019; Play around with BertViz by Jesse Vig for at least 10 minutes. Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small by Wang et al., 2022; Function Vectors in Large Language Models by Todd et al., 2023.	Project Proposal due (Fri Mar 28)
3	Mon Mar 30	MLPs and Factual Recall	Lecture Locating and Editing Factual Associations in GPT by Meng et al., 2022; What does the Knowledge Neuron Thesis Have to do with Knowledge? by Niu et al., 2024 Transformer Feed-Forward Layers Are Key-Value Memories by Geva et al., 2020;
4	Wed Apr 1	Transformer Circuits	Lecture A Mathematical Framework for Transformer Circuits by Elhage et al., 2021 (read until but not including Two Layer Transformers); In-context Learning and Induction Heads by Olsson et al., 2022 (read through "Arguments"); interpreting GPT: the logit lens by nostalgebraist	Proposal Revision due (Fri Apr 4)
5	Mon Apr 6	Geometry of Representations (guest lecture by Todd Nief)	Slides from Todd. * The Linear Representation Hypothesis and the Geometry of Large Language Models by Park et al., 2023; * The Information Geometry of Softmax: Probing and Steering by Park et al., 2026; The Geometry of Truth by Marks & Tegmark, 2023 (main body, first ten pages)
6	Wed Apr 8	Superposition & Sparse Autoencoders	Lecture * Toy Models of Superposition by Elhage et al., 2022; * Towards Monosemanticity: Decomposing Language Models With Dictionary Learning by Bricken et al., 2023; Scaling and Evaluating Sparse Autoencoders by Gao et al., 2024	Blog Entry 1 due (Fri Apr 10)
7	Mon Apr 13	Chain of Thought	Lecture * Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting by Turpin et al., 2023; * Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation by Baker et al., 2025; Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety by Korbak et al., 2025
8	Wed Apr 15	Interpretability for Science	Lecture * From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models by Lam et al., 2025; * Protein Language Models Learn Evolutionary Statistics of Interacting Sequence Motifs by Zhang et al., 2024; BERTology Meets Biology: Interpreting Attention in Protein Language Models by Vig et al., 2020	Blog Entry 2 due (Fri Apr 17)
Alignment
9	Mon Apr 20	The Alignment Problem	Lecture * Concrete Problems in AI Safety by Amodei et al., 2016; * What Failure Looks Like by Christiano (blog post); Scheming AIs: Will AIs Fake Alignment During Training in Order to Get Power? by Carlsmith, 2023
10	Wed Apr 22	Scalable Oversight	Lecture * Debating with More Persuasive LLMs Leads to More Truthful Answers by Khan et al., 2024; * On Scalable Oversight with Weak LLMs Judging Strong LLMs by Kenton et al., 2024; Measuring Progress on Scalable Oversight for Large Language Models by Bowman et al., 2022;	Blog Entry 3 due (Fri Apr 24)
11	Mon Apr 27	Emergent Misalignment (guest lecture by Shi Feng)	* Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs by Betley et al., 2025; * Risks from Learned Optimization in Advanced Machine Learning Systems by Hubinger et al., 2019; Alignment Faking in Large Language Models by Greenblatt et al., 2024; Shi will actually cover these two papers: Sycophancy Towards Researchers Drives Performative Misalignment and Self-Recognition Finetuning can Reverse and Prevent Emergent Misalignment.
12	Wed Apr 29	Sycophancy & Fundamental Limitations of RLHF	Lecture * Towards Understanding Sycophancy in Language Models by Sharma et al., 2023; * Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback by Casper et al., 2023; Sycophancy to Subterfuge: Investigating Reward Tampering in Language Models by Denison et al., 2024	Blog Entry 4 due (Fri May 1)
13	Mon May 4	Finding Novel Behavior	Lecture * Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training by Hubinger et al., 2024; * AI Sandbagging: Language Models can Strategically Underperform on Evaluations by van der Weij et al., 2024 Frontier Models are Capable of In-context Scheming by Meinke et al., 2024;
Agents
14	Wed May 6	Agents & Agentic RL (guest lecture by Ofir Press)	* SWE-bench: Can Language Models Resolve Real-World GitHub Issues? by Jimenez et al., 2023; * SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering by John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press * CodeClash: Benchmarking Goal-Oriented Software Engineering by John Yang, Kilian Lieret, Joyce Yang, Carlos E. Jimenez, Ofir Press, Ludwig Schmidt, Diyi Yang Agentless: Demystifying LLM-based Software Engineering Agents by Xia et al., 2024 ReAct: Synergizing Reasoning and Acting in Language Models by Yao et al., 2022;	First Draft due (Fri May 8)
15	Mon May 11	Research Agents	Lecture * AlphaEvolve: A Gemini-based Coding Agent for Mathematical and Algorithmic Discovery by Novikov et al., 2025; * Agent Laboratory: Using LLM Agents as Research Assistants by Schmidgall et al., 2025; The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research by Bai et al., 2026
16	Wed May 13	Simulation	Lecture * Generative Agents: Interactive Simulacra of Human Behavior by Park et al., 2023; * Synthetic Replacements for Human Survey Data? The Perils of Large Language Models by Bisbee et al., 2024; Out of One, Many: Using Language Models to Simulate Human Samples by Argyle et al., 2022	Blog Entry 5 due (Fri May 15)
17	Mon May 18	Complementary AI	* Superhuman Artificial Intelligence Can Improve Human Decision-Making by Increasing Novelty by Shin et al., 2023; * How AI Impacts Skill Formation by Shen and Tamkin, 2026; Machine Explanations and Human Understanding by Chen et al., 2022
18	Wed May 20	Final Presentations		Final Report due (Fri May 22)

Acknowledgments

This course website is adapted from the Stanford CS336 course website.

CMSC 25750/35750: Large Language Models

University of Chicago / Spring 2026