Pythia: A suite for analyzing large language models across training and scaling S Biderman, H Schoelkopf, QG Anthony, H Bradley, K O’Brien, E Hallahan, ... ICML 2023, 2397-2430, 2023 | 973 | 2023 |
Composable interventions for language models A Kolbeinsson, K O'Brien, T Huang, S Gao, S Liu, JR Schwarz, A Vaidya, ... ICLR 2025, 2024 | 4 | 2024 |
Recite, reconstruct, recollect: Memorization in LMs as a multifaceted phenomenon US Prashanth*, A Deng*, K O'Brien*, J SV*, MA Khan, J Borkar, ... ICLR 2025, 2024 | 3 | 2024 |
Improving Black-box Robustness with In-Context Rewriting K O'Brien, N Ng, I Puri, J Mendez, H Palangi, Y Kim, M Ghassemi, ... TMLR, 2024 | 2 | 2024 |
Steering language model refusal with sparse autoencoders K O'Brien, D Majercak, X Fernandes, R Edgar, J Chen, H Nori, ... arXiv preprint arXiv:2411.11296, 2024 | 1 | 2024 |