Pythia: A suite for analyzing large language models across training and scaling S Biderman, H Schoelkopf, Q Anthony, H Bradley, K O'Brien, E Hallahan, ... International conference on machine learning (ICML), 2023 | 1113 | 2023 |
Gpt-neox-20b: An open-source autoregressive language model S Black, S Biderman, E Hallahan, Q Anthony, L Gao, L Golding, H He, ... Proceedings of the ACL Workshop on Challenges & Perspectives in Creating …, 2022 | 932* | 2022 |
Rwkv: Reinventing rnns for the transformer era B Peng, E Alcaide, Q Anthony, A Albalak, S Arcadinho, S Biderman, ... arXiv preprint arXiv:2305.13048, 2023 | 554 | 2023 |
Emergent and Predictable Memorization in Large Language Models S Biderman, US Prashanth, L Sutawika, H Schoelkopf, Q Anthony, ... https://arxiv.org/pdf/2304.11158.pdf, 2023 | 171 | 2023 |
Continual Pre-Training of Large Language Models: How to (re) warm your model? K Gupta, B Thérien, A Ibrahim, ML Richter, Q Anthony, E Belilovsky, I Rish, ... | 78 | 2023 |
Simple and scalable strategies to continually pre-train large language models A Ibrahim, B Thérien, K Gupta, ML Richter, Q Anthony, T Lesort, ... arXiv preprint arXiv:2403.08763, 2024 | 62 | 2024 |
Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training A Jain, AA Awan, AM Aljuhani, JM Hashmi, QG Anthony, H Subramoni, ... SC20: international conference for high performance computing, networking …, 2020 | 61 | 2020 |
Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence B Peng, D Goldstein, Q Anthony, A Albalak, E Alcaide, S Biderman, ... arXiv preprint arXiv:2404.05892 3, 2024 | 52 | 2024 |
Performance characterization of dnn training using tensorflow and pytorch on modern clusters A Jain, AA Awan, Q Anthony, H Subramoni, DKDK Panda 2019 IEEE International Conference on Cluster Computing (CLUSTER), 1-11, 2019 | 51 | 2019 |
Gpt-neox-20b: An open-source autoregressive language model, 2022 S Black, S Biderman, E Hallahan, Q Anthony, L Gao, L Golding, H He, ... URL https://arxiv. org/abs/2204.06745, 2022 | 44 | 2022 |
Blackmamba: Mixture of experts for state-space models Q Anthony, Y Tokpanov, P Glorioso, B Millidge arXiv preprint arXiv:2402.01771, 2024 | 40 | 2024 |
trlX: A framework for large scale reinforcement learning from human feedback A Havrilla, M Zhuravinskyi, D Phung, A Tiwari, J Tow, S Biderman, ... Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023 | 38 | 2023 |
Redpajama: an open dataset for training large language models M Weber, D Fu, Q Anthony, Y Oren, S Adams, A Alexandrov, X Lyu, ... Advances in neural information processing systems 37, 116462-116492, 2024 | 36 | 2024 |
Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Bolun Wang, Johan S B Peng, E Alcaide, Q Anthony, A Albalak, S Arcadinho, S Biderman, ... Wind, Stanislaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng …, 2023 | 32 | 2023 |
Zamba: A compact 7b ssm hybrid model P Glorioso, Q Anthony, Y Tokpanov, J Whittington, J Pilault, A Ibrahim, ... arXiv preprint arXiv:2405.16712, 2024 | 31 | 2024 |
Accelerating mpi all-to-all communication with online compression on modern gpu clusters Q Zhou, P Kousha, Q Anthony, K Shafie Khorassani, A Shafi, ... International Conference on High Performance Computing, 3-25, 2022 | 26 | 2022 |
Adaptive and hierarchical large message all-to-all communication algorithms for large-scale dense gpu systems KS Khorassani, CH Chu, QG Anthony, H Subramoni, DK Panda 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet …, 2021 | 17 | 2021 |
Hypar-flow: Exploiting mpi and keras for scalable hybrid-parallel dnn training using tensorflow AA Awan, A Jain, Q Anthony, H Subramoni, DK Panda arXiv preprint arXiv:1911.05146, 2019 | 15* | 2019 |
GPT-NeoX: Large scale autoregressive language modeling in pytorch A Andonian, Q Anthony, S Biderman, S Black, P Gali, L Gao, E Hallahan, ... | 14* | 2021 |
Accelerating distributed deep learning training with compression assisted allgather and reduce-scatter communication Q Zhou, Q Anthony, L Xu, A Shafi, M Abduljabbar, H Subramoni, ... 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2023 | 12 | 2023 |