Devansh Arpit
Devansh Arpit
Unknown affiliation
No verified email
Cited by
Cited by
A closer look at memorization in deep networks
D Arpit, S Jastrzębski, N Ballas, D Krueger, E Bengio, MS Kanwal, ...
ICML 2017 (arXiv preprint arXiv:1706.05394), 2017
On the spectral bias of deep neural networks
N Rahaman, D Arpit, A Baratin, F Draxler, M Lin, FA Hamprecht, Y Bengio, ...
ICML 2019 (arXiv preprint arXiv:1806.08734), 2018
Three factors influencing minima in SGD
S Jastrzębski, Z Kenton, D Arpit, N Ballas, A Fischer, Y Bengio, A Storkey
ICANN 2018 (arXiv preprint arXiv:1711.04623), 2017
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
S Jastrzebski, M Szymczak, S Fort, D Arpit, J Tabor, K Cho, K Geras
ICLR 2020 (arXiv preprint arXiv:2002.09572), 2020
Normalization propagation: A parametric technique for removing internal covariate shift in deep networks
D Arpit, Y Zhou, BU Kota, V Govindaraju
ICML 2016 (arXiv preprint arXiv:1603.01431), 2016
Residual connections encourage iterative inference
S Jastrzebski, D Arpit, N Ballas, V Verma, T Che, Y Bengio
ICLR 2018 (arXiv preprint arXiv:1710.04773), 2017
A walk with sgd
C Xing, D Arpit, C Tsirigotis, Y Bengio
arXiv preprint arXiv:1802.08770, 2018
Ensemble of averages: Improving model selection and boosting performance in domain generalization
D Arpit, H Wang, Y Zhou, C Xiong
NeurIPS 2022, 2021
Why regularized auto-encoders learn sparse representation?
D Arpit, Y Zhou, H Ngo, V Govindaraju
ICML 2016 (arXiv preprint arXiv:1505.05561), 2015
Deep Nets Don't Learn via Memorization
D Krueger, N Ballas, S Jastrzebski, D Arpit, MS Kanwal, T Maharaj, ...
ICLR 2017 Workshop, 2017
Fraternal Dropout
K Zolna, D Arpit, D Suhubdy, Y Bengio
ICLR 2018 (arXiv preprint arXiv:1711.00066), 2017
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
D Arpit, V Campos, Y Bengio
NeurIPs 2019, 2019
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization
S Jastrzebski, D Arpit, O Astrand, G Kerg, H Wang, C Xiong, R Socher, ...
ICML 2021, 2020
Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents
Z Liu, W Yao, J Zhang, L Xue, S Heinecke, R Murthy, Y Feng, Z Chen, ...
arXiv preprint arXiv:2308.05960, 2023
h-detach: Modifying the LSTM Gradient Towards Better Optimization
D Arpit, B Kanuparthi, G Kerg, NR Ke, I Mitliagkas, Y Bengio
ICLR 2019 (arXiv preprint arXiv:1810.03023), 2018
Variational bi-lstms
S Shabanian, D Arpit, A Trischler, Y Bengio
arXiv preprint arXiv:1711.05717, 2017
Is joint training better for deep auto-encoders?
Y Zhou, D Arpit, I Nwogu, V Govindaraju
arXiv preprint arXiv:1405.1380, 2014
Finding Flatter Minima with SGD
S Jastrzębski, Z Kenton, D Arpit, N Ballas, A Fischer, Y Bengio, A Storkey
ICLR 2018 Workshop, 2018
Retroformer: Retrospective large language agents with policy gradient optimization
W Yao, S Heinecke, JC Niebles, Z Liu, Y Feng, L Xue, R Murthy, Z Chen, ...
arXiv preprint arXiv:2308.02151, 2023
The benefits of over-parameterization at initialization in deep ReLU networks
D Arpit, Y Bengio
arXiv preprint arXiv:1901.03611, 2019
The system can't perform the operation now. Try again later.
Articles 1–20