Csaba Szepesvari

Cited by

	All	Since 2019
Citations	33328	21710
h-index	78	70
i10-index	245	192

4900

2450

1225

3675

2003200420052006200720082009201020112012201320142015201620172018201920202021202220232024114 96 129 95 216 319 382 522 769 843 927 1102 1146 1361 1315 1742 2413 3393 4272 4706 4887 1918

Public access

View all

68 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Tor LattimoreDeepMindVerified email at google.com
Rémi MunosDeepMindVerified email at inria.fr
Yasin Abbasi YadkoriDeepMindVerified email at google.com
Branislav KvetonAmazonVerified email at amazon.com
Dale SchuurmansUniversity of Alberta, Google DeepMindVerified email at cs.ualberta.ca
Kocsis LeventeMTA SZTAKIVerified email at sztaki.hu
Richard S. SuttonKeen, Amii, and University of AlbertaVerified email at richsutton.com
Dávid PálStaff Machine Learning Engineer, InstacartVerified email at instacart.com
Mohammad GhavamzadehAmazonVerified email at amazon.com
András AntosBudapest University of Technology and EconomicsVerified email at cs.bme.hu
Amir-massoud FarahmandUniversity of TorontoVerified email at cs.toronto.edu
Zheng WenGoogle DeepMindVerified email at google.com
Shalabh BhatnagarProfessor in the Department of Computer Science and Automation, Indian Institute of ScienceVerified email at iisc.ac.in
Lorincz, AndrasEotvos Lorand UniversityVerified email at inf.elte.hu
Hamid MaeiNetflixVerified email at netflix.com
Mengdi WangCenter for Statistics & Machine Learning, ECE, Princeton UniversityVerified email at princeton.edu
Nevena LazicDeepMindVerified email at google.com
Michael LittmanBrown UniversityVerified email at brown.edu
Jincheng MeiResearch Scientist, Google BrainVerified email at google.com
Doina PrecupDeepMind and McGill UniversityVerified email at cs.mcgill.ca

Csaba Szepesvari

DeepMind & University of Alberta

Verified email at cs.ualberta.ca - Homepage

machine learning learning theory online learning reinforcement learning Markov Decision Processes


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Bandit based monte-carlo planning L Kocsis, C Szepesvári European conference on machine learning, 282-293, 2006	4171	2006
Bandit algorithms T Lattimore, C Szepesvári Cambridge University Press, 2020	2609	2020
Algorithms for Reinforcement Learning C Szepesvari Morgan and Claypool, 2010	2098*	2010
Improved algorithms for linear stochastic bandits Y Abbasi-Yadkori, C Szepesvári, D Pál Advances in Neural Information Processing Systems, 2312-2320, 2011	1879	2011
Convergence results for single-step on-policy reinforcement-learning algorithms S Singh, T Jaakkola, ML Littman, C Szepesvári Machine learning 38, 287-308, 2000	988	2000
Exploration–exploitation tradeoff using variance estimates in multi-armed bandits JY Audibert, R Munos, C Szepesvári Theoretical Computer Science 410 (19), 1876-1902, 2009	761	2009
Fast gradient-descent methods for temporal-difference learning with linear function approximation RS Sutton, HR Maei, D Precup, S Bhatnagar, D Silver, C Szepesvári, ... Proceedings of the 26th annual international conference on machine learning …, 2009	699	2009
Finite-Time Bounds for Fitted Value Iteration. R Munos, C Szepesvári Journal of Machine Learning Research 9 (5), 2008	612	2008
Parametric bandits: The generalized linear case S Filippi, O Cappe, A Garivier, C Szepesvári Advances in neural information processing systems 23, 2010	522	2010
X-Armed Bandits. S Bubeck, R Munos, G Stoltz, C Szepesvári Journal of Machine Learning Research 12 (5), 2011	490	2011
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path A Antos, C Szepesvári, R Munos Machine Learning 71, 89-129, 2008	490	2008
Learning with a strong adversary R Huang, B Xu, D Schuurmans, C Szepesvári arXiv preprint arXiv:1511.03034, 2015	430	2015
Regret bounds for the adaptive control of linear quadratic systems Y Abbasi-Yadkori, C Szepesvári Proceedings of the 24th Annual Conference on Learning Theory, 1-26, 2011	410	2011
A generalized reinforcement-learning model: Convergence and applications ML Littman, C Szepesvári ICML 96, 310-318, 1996	344	1996
Toward off-policy learning control with function approximation. HR Maei, C Szepesvári, S Bhatnagar, RS Sutton ICML 10, 719-726, 2010	332	2010
Convergent temporal-difference learning with arbitrary smooth function approximation H Maei, C Szepesvari, S Bhatnagar, D Precup, D Silver, RS Sutton Advances in neural information processing systems 22, 2009	329	2009
Apprenticeship learning using inverse reinforcement learning and gradient methods G Neu, C Szepesvári arXiv preprint arXiv:1206.5264, 2012	317	2012
The grand challenge of computer Go: Monte Carlo tree search and extensions S Gelly, L Kocsis, M Schoenauer, M Sebag, D Silver, C Szepesvári, ... Communications of the ACM 55 (3), 106-113, 2012	315	2012
Multi-criteria reinforcement learning. Z Gábor, Z Kalmár, C Szepesvári ICML 98, 197-205, 1998	309	1998
Cascading bandits: Learning to rank in the cascade model B Kveton, C Szepesvari, Z Wen, A Ashkan International conference on machine learning, 767-776, 2015	307	2015

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors