Mixed precision training of convolutional neural networks using integer operations D Das, N Mellempudi, D Mudigere, D Kalamkar, S Avancha, K Banerjee, ... arXiv preprint arXiv:1802.00930, 2018 | 219 | 2018 |
Distributed deep learning using synchronous stochastic gradient descent D Das, S Avancha, D Mudigere, K Vaidynathan, S Sridharan, D Kalamkar, ... arXiv preprint arXiv:1602.06709, 2016 | 218 | 2016 |
Software-hardware co-design for fast and scalable training of deep learning recommendation models D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ... Proceedings of the 49th Annual International Symposium on Computer …, 2022 | 175* | 2022 |
Machine learning accelerator mechanism A Bleiweiss, A Ramesh, A Mishra, D Marr, J Cook, S Sridharan, ... US Patent 11,373,088, 2022 | 118 | 2022 |
Deep learning at 15pf: supervised and semi-supervised classification for scientific data T Kurth, J Zhang, N Satish, E Racah, I Mitliagkas, MMA Patwary, T Malas, ... Proceedings of the International Conference for High Performance Computing …, 2017 | 103 | 2017 |
Deep learning training in facebook data centers: Design of scale-up and scale-out systems M Naumov, J Kim, D Mudigere, S Sridharan, X Wang, W Zhao, S Yilmaz, ... arXiv preprint arXiv:2003.09518, 2020 | 99 | 2020 |
Fine-grain compute communication execution for deep learning frameworks via hardware accelerated point-to-point primitives S Sridharan, D Mudigere US Patent 12,154,028, 2024 | 90 | 2024 |
Abstraction layers for scalable distributed machine learning DD Kalamkar, K Vaidyanathan, S Sridharan, D Das US Patent 11,094,029, 2021 | 82 | 2021 |
Hardware implemented point to point communication primitives for machine learning S Sridharan, K Vaidyanathan, D Das US Patent 11,488,008, 2022 | 81 | 2022 |
Communication optimizations for distributed machine learning S Sridharan, K Vaidyanathan, D Das, C Sakthivel, ME Smorkalov US Patent 11,270,201, 2022 | 78 | 2022 |
Astra-sim: Enabling sw/hw co-design exploration for distributed dl training platforms S Rashidi, S Sridharan, S Srinivasan, T Krishna 2020 IEEE International Symposium on Performance Analysis of Systems and …, 2020 | 60 | 2020 |
Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints S Sridharan, J Dinan, DD Kalamkar SC'14: Proceedings of the International Conference for High Performance …, 2014 | 57 | 2014 |
Dynamic precision management for integer deep learning primitives N Mellempudi, D Mudigere, D Das, S Sridharan US Patent 10,643,297, 2020 | 56 | 2020 |
Enabling compute-communication overlap in distributed deep learning training platforms S Rashidi, M Denton, S Sridharan, S Srinivasan, A Suresh, J Nie, ... 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture …, 2021 | 44 | 2021 |
Thread migration to improve synchronization performance S Sridharan, B Keck, R Murphy, S Chandra, P Kogge Workshop on Operating System Interference in High Performance Applications, 2006 | 41 | 2006 |
Data parallelism and halo exchange for distributed machine learning D Das, K Vaidyanathan, S Sridharan US Patent 11,373,266, 2022 | 39 | 2022 |
High-performance, distributed training of large-scale deep learning recommendation models D Mudigere, Y Hao, J Huang, A Tulloch, S Sridharan, X Liu, M Ozdal, ... arXiv preprint arXiv:2104.05158, 2021 | 37 | 2021 |
On scale-out deep learning training for cloud and hpc S Sridharan, K Vaidyanathan, D Kalamkar, D Das, ME Smorkalov, ... arXiv preprint arXiv:1801.08030, 2018 | 35 | 2018 |
Astra-sim2. 0: Modeling hierarchical networks and disaggregated systems for large-model training at scale W Won, T Heo, S Rashidi, S Sridharan, S Srinivasan, T Krishna 2023 IEEE International Symposium on Performance Analysis of Systems and …, 2023 | 32 | 2023 |
Themis: A network bandwidth-aware collective scheduling policy for distributed training of dl models S Rashidi, W Won, S Srinivasan, S Sridharan, T Krishna Proceedings of the 49th Annual International Symposium on Computer …, 2022 | 30 | 2022 |