Follow
Mayank Mishra
Mayank Mishra
MIT-IBM Watson Lab
Verified email at ibm.com - Homepage
Title
Cited by
Cited by
Year
Bloom: A 176b-parameter open-access multilingual language model
T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ...
JMLR 2024, 2023
16272023
StarCoder: may the source be with you!
R Li, LB Allal, Y Zi, N Muennighoff, D Kocetkov, C Mou, M Marone, C Akiki, ...
arXiv preprint arXiv:2305.06161, 2023
7222023
SantaCoder: don't reach for the stars!
LB Allal, R Li, D Kocetkov, C Mou, C Akiki, CM Ferrandis, N Muennighoff, ...
arXiv preprint arXiv:2301.03988, 2023
2052023
StarCoder 2 and The Stack v2: The Next Generation
A Lozhkov, R Li, LB Allal, F Cassano, J Lamy-Poirier, N Tazi, A Tang, ...
arXiv preprint arXiv:2402.19173, 2024
1422024
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
M Mishra, M Stallone, G Zhang, Y Shen, A Prasad, AM Soria, M Merler, ...
arXiv preprint arXiv:2405.04324, 2024
282024
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
W Brandon, M Mishra, A Nrusimha, R Panda, JR Kelly
NeurIPS 2024, 2024
212024
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
B Pan, Y Shen, H Liu, M Mishra, G Zhang, A Oliva, C Raffel, R Panda
arXiv preprint arXiv:2404.05567, 2024
112024
Adversarial approximate inference for speech to electroglottograph conversion
AP Prathosh, V Srivastava, M Mishra
IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (12 …, 2019
82019
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the US Executive Order
T Nakamura, M Mishra, S Tedeschi, Y Chai, JT Stillerman, F Friedrich, ...
arXiv preprint arXiv:2404.00399, 2024
72024
Variational Inference with Latent Space Quantization for Adversarial Resilience
V Kyatham, M Mishra, TK Yadav, D Mishra, AP Prathosh
arXiv preprint arXiv:1903.09940, 2019
52019
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
A Nrusimha, M Mishra, N Wang, D Alistarh, R Panda, Y Kim
arXiv preprint arXiv:2404.03605, 2024
42024
Prompting with Pseudo-Code Instructions
M Mishra, P Kumar, R Bhat, R Murthy V, D Contractor, S Tamilselvam
EMNLP 2023, 2023
32023
Bloom: A 176b-parameter open-access multilingual language model
BS Workshop, TL Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, ...
arXiv preprint arXiv:2211.05100, 2022
32022
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
Y Shen, M Stallone, M Mishra, G Zhang, S Tan, A Prasad, AM Soria, ...
arXiv preprint arXiv:2408.13359, 2024
22024
Enhancing Training Efficiency Using Packing with Flash Attention
A Kundu, RD Lee, L Wynter, RK Ganti, M Mishra
arXiv preprint arXiv:2407.09105, 2024
22024
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback
G Pandey, Y Nandwani, T Naseem, M Mishra, G Xu, D Raghu, S Joshi, ...
ICML 2024, 2024
22024
Variational Learning for Unsupervised Knowledge Grounded Dialogs
M Mishra, D Madan, G Pandey, D Contractor
31st International Joint Conference on Artificial Intelligence (IJCAI 2022), 2021
22021
Scaling Granite Code Models to 128K Context
M Stallone, V Saxena, L Karlinsky, B McGinn, T Bula, M Mishra, AM Soria, ...
arXiv preprint arXiv:2407.13739, 2024
12024
The infrastructure powering IBM's Gen AI model development
T Gershon, S Seelam, B Belgodere, M Bonilla, L Hoang, D Barnett, ...
arXiv preprint arXiv:2407.05467, 2024
12024
Granite 3.0 Language Models
IBM Granite Team
12024
The system can't perform the operation now. Try again later.
Articles 1–20