Nitish Shirish Keskar

Cited by

	All	Since 2019
Citations	11848	11159
h-index	28	27
i10-index	41	41

3100

1550

775

2325

20172018201920202021202220232024136 504 1003 1556 1692 2175 3012 1710

Public access

View all

5 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Richard Socheryou.comVerified email at stanford.edu
Caiming XiongSalesforce ResearchVerified email at salesforce.com
Bryan McCannYou.comVerified email at you.com
Jorge NocedalProfessor, Industrial Engineering, Northwestern UniversityVerified email at NORTHWESTERN.EDU
Dheevatsa MudigereDistinguished Engineer, NVIDIAVerified email at nvidia.com
Mikhail SmelyanskiyFacebookVerified email at intel.com
Lav R. VarshneyUniversity of Illinois Urbana-ChampaignVerified email at illinois.edu
Stephen MerityVerified email at smerity.com
Nikhil NaikMITVerified email at mit.edu
Akhilesh Deepak GotmareSalesforce ResearchVerified email at salesforce.com
Ali MadaniProfluent BioVerified email at berkeley.edu
Nazneen RajaniHugging FaceVerified email at huggingface.co
Huan WangSalesforce ResearchVerified email at yale.edu
Semih YavuzSalesforce ResearchVerified email at salesforce.com
Albert S. BerahasAssistant Professor, University of MichiganVerified email at umich.edu
Karim AhmedDartmouth College, Samsung Research AmericaVerified email at dartmouth.edu
Tong NiuSalesforce ResearchVerified email at salesforce.com
Raphael R EguchiStanford UniversityVerified email at alumni.stanford.edu
Jasdeep SinghStanford UniversityVerified email at stanford.edu
Wojciech KryścińskiCohereVerified email at cohere.com

Nitish Shirish Keskar

OpenAI

Verified email at openai.com - Homepage

Deep Learning Mathematical Optimization Natural Language Processing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
On large-batch training for deep learning: Generalization gap and sharp minima NS Keskar, D Mudigere, J Nocedal, M Smelyanskiy, PTP Tang arXiv preprint arXiv:1609.04836, 2016	3326	2016
Gpt-4 technical report J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ... arXiv preprint arXiv:2303.08774, 2023	1290*	2023
Regularizing and optimizing LSTM language models S Merity, NS Keskar, R Socher arXiv preprint arXiv:1708.02182, 2017	1263	2017
Ctrl: A conditional transformer language model for controllable generation NS Keskar, B McCann, LR Varshney, C Xiong, R Socher arXiv preprint arXiv:1909.05858, 2019	1088	2019
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022	724	2022
The natural language decathlon: Multitask learning as question answering B McCann, NS Keskar, C Xiong, R Socher arXiv preprint arXiv:1806.08730, 2018	649	2018
Improving generalization performance by switching from adam to sgd NS Keskar, R Socher arXiv preprint arXiv:1712.07628, 2017	621	2017
Neural text summarization: A critical evaluation W Kryściński, NS Keskar, B McCann, C Xiong, R Socher arXiv preprint arXiv:1908.08960, 2019	375	2019
Gedi: Generative discriminator guided sequence generation B Krause, AD Gotmare, B McCann, NS Keskar, S Joty, R Socher, ... arXiv preprint arXiv:2009.06367, 2020	296	2020
A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation A Gotmare, NS Keskar, C Xiong, R Socher arXiv preprint arXiv:1810.13243, 2018	276	2018
Progen: Language modeling for protein generation A Madani, B McCann, N Naik, NS Keskar, N Anand, RR Eguchi, ... arXiv preprint arXiv:2004.03497, 2020	232	2020
An analysis of neural language modeling at multiple scales S Merity, NS Keskar, R Socher arXiv preprint arXiv:1803.08240, 2018	188	2018
Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains N Naik, A Madani, A Esteva, NS Keskar, MF Press, D Ruderman, DB Agus, ... Nature communications 11 (1), 5727, 2020	173	2020
Weighted transformer network for machine translation K Ahmed, NS Keskar, R Socher arXiv preprint arXiv:1711.02132, 2017	155	2017
Balancing communication and computation in distributed optimization AS Berahas, R Bollapragada, NS Keskar, E Wei IEEE Transactions on Automatic Control 64 (8), 3141-3155, 2018	114	2018
Sequence-to-sequence prediction using a neural network model NS Keskar, K Ahmed, R Socher US Patent 11,928,600, 2024	107	2024
Multitask learning as question answering NS Keskar, B McCann, C Xiong, R Socher US Patent 11,501,076, 2022	86	2022
Multitask learning as question answering B McCann, NS Keskar, C Xiong, R Socher US Patent 10,776,581, 2020	83	2020
Hybrid training of deep networks NS Keskar, R Socher US Patent 11,276,002, 2022	78	2022
Xlda: Cross-lingual data augmentation for natural language inference and question answering J Singh, B McCann, NS Keskar, C Xiong, R Socher arXiv preprint arXiv:1905.11471, 2019	77	2019

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors