Follow
Sreepathi Pai
Title
Cited by
Cited by
Year
Improving GPGPU concurrency with elastic kernels
S Pai, MJ Thazhuthaveetil, R Govindarajan
ACM SIGARCH Computer Architecture News 41 (1), 407-418, 2013
2802013
Groute: An asynchronous multi-GPU programming model for irregular computations
T Ben-Nun, M Sutton, S Pai, K Pingali
ACM SIGPLAN Notices 52 (8), 235-248, 2017
1552017
A compiler for throughput optimization of graph algorithms on GPUs
S Pai, K Pingali
Proceedings of the 2016 ACM SIGPLAN International Conference on Object …, 2016
1182016
Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme
S Pai, R Govindarajan, MJ Thazhuthaveetil
Proceedings of the 21st international conference on Parallel architectures …, 2012
662012
Controlled kernel launch for dynamic parallelism in GPUs
X Tang, A Pattnaik, H Jiang, O Kayiran, A Jog, S Pai, M Ibrahim, ...
2017 IEEE International Symposium on High Performance Computer Architecture …, 2017
602017
Parallel triangle counting and k-truss identification using graph-centric methods
C Voegele, YS Lu, S Pai, K Pingali
2017 IEEE High Performance Extreme Computing Conference (HPEC), 1-7, 2017
462017
Stochastic gradient descent on GPUs
R Kaleem, S Pai, K Pingali
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 81-89, 2015
432015
Locality analysis through static parallel sampling
D Chen, F Liu, C Ding, S Pai
ACM SIGPLAN Notices 53 (4), 557-570, 2018
262018
Why gpus are slow at executing nfas and how to make them faster
H Liu, S Pai, A Jog
Proceedings of the Twenty-Fifth International Conference on Architectural …, 2020
252020
PLASMA: Portable programming for SIMD heterogeneous accelerators
S Pai, R Govindarajan, MJ Thazhuthaveetil
Workshop on Language, Compiler, and Architecture Support for GPGPU, held in …, 2010
242010
Architectural support for efficient large-scale automata processing
H Liu, M Ibrahim, O Kayiran, S Pai, A Jog
2018 51st Annual IEEE/ACM International Symposium on Microarchitecture …, 2018
222018
Bounded exhaustive test-input generation on GPUs
A Celik, S Pai, S Khurshid, M Gligoric
Proceedings of the ACM on Programming Languages 1 (OOPSLA), 1-25, 2017
212017
Synchronization trade-offs in gpu implementations of graph algorithms
R Kaleem, A Venkat, S Pai, M Hall, K Pingali
2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2016
212016
Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels
S Pai, R Govindarajan, MJ Thazhuthaveetil
Proceedings of the 23rd international conference on Parallel architectures …, 2014
192014
Groute: Asynchronous multi-GPU programming model with applications to large-scale graph processing
T Ben-Nun, M Sutton, S Pai, K Pingali
ACM Transactions on Parallel Computing (TOPC) 7 (3), 1-27, 2020
142020
One size doesn't fit all: Quantifying performance portability of graph applications on GPUs
T Sorensen, S Pai, AF Donaldson
2019 IEEE International Symposium on Workload Characterization (IISWC), 155-166, 2019
102019
Efficient execution of graph algorithms on CPU with SIMD extensions
R Zheng, S Pai
2021 IEEE/ACM International Symposium on Code Generation and Optimization …, 2021
82021
Adaptive work-efficient connected components on the GPU
M Sutton, T Ben-Nun, A Barak, S Pai, K Pingali
arXiv preprint arXiv:1612.01178, 2016
62016
Horus: A modular GPU emulator framework
AS Elhelw, S Pai
2020 IEEE International Symposium on Performance Analysis of Systems and …, 2020
52020
Limits of data-level parallelism
S Pai, R Govindarajan, M Thazhuthaveetil
14th Annual IEEE International Conference on High Performance Computing, 2007
32007
The system can't perform the operation now. Try again later.
Articles 1–20