Hao Tan
Title
Cited by
Cited by
Year
Lxmert: Learning cross-modality encoder representations from transformers
H Tan, M Bansal
Proceedings of the 2019 Conference on Empirical Methods in Natural Language …, 2019
4192019
A joint speaker-listener-reinforcer model for referring expressions
L Yu, H Tan, M Bansal, TL Berg
Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2017
1652017
Learning to navigate unseen environments: Back translation with environmental dropout
H Tan, L Yu, M Bansal
arXiv preprint arXiv:1904.04195, 2019
922019
Vokenization: improving language understanding with contextualized, visual-grounded supervision
H Tan, M Bansal
arXiv preprint arXiv:2010.06775, 2020
162020
The curse of performance instability in analysis datasets: Consequences, source, and suggestions
X Zhou, Y Nie, H Tan, M Bansal
arXiv preprint arXiv:2004.13606, 2020
132020
Enabling robots to understand incomplete natural language instructions using commonsense reasoning
H Chen, H Tan, A Kuntz, M Bansal, R Alterovitz
2020 IEEE International Conference on Robotics and Automation (ICRA), 1963-1969, 2020
122020
Object ordering with bidirectional matchings for visual reasoning
H Tan, M Bansal
arXiv preprint arXiv:1804.06870, 2018
112018
Unifying vision-and-language tasks via text generation
J Cho, J Lei, H Tan, M Bansal
arXiv preprint arXiv:2102.02779, 2021
102021
Expressing visual relationships via language
H Tan, F Dernoncourt, Z Lin, T Bui, M Bansal
arXiv preprint arXiv:1906.07689, 2019
92019
Diagnosing the environment bias in vision-and-language navigation
Y Zhang, H Tan, M Bansal
arXiv preprint arXiv:2005.03086, 2020
82020
Modality-balanced models for visual dialogue
H Kim, H Tan, M Bansal
Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 8091-8098, 2020
72020
Source-target inference models for spatial instruction understanding
H Tan, M Bansal
Thirty-Second AAAI Conference on Artificial Intelligence, 2018
72018
Maf: Multimodal alignment framework for weakly-supervised phrase grounding
Q Wang, H Tan, S Shen, MW Mahoney, Z Yao
arXiv preprint arXiv:2010.05379, 2020
32020
ArraMon: A joint navigation-assembly instruction interpretation task in dynamic environments
H Kim, A Zala, G Burri, H Tan, M Bansal
arXiv preprint arXiv:2011.07660, 2020
22020
Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information
J Li, H Tan, M Bansal
arXiv preprint arXiv:2104.09580, 2021
12021
How Much Can CLIP Benefit Vision-and-Language Tasks?
S Shen, LH Li, H Tan, M Bansal, A Rohrbach, KW Chang, Z Yao, ...
arXiv preprint arXiv:2107.06383, 2021
2021
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
Z Tang, J Cho, H Tan, M Bansal
arXiv preprint arXiv:2107.02681, 2021
2021
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
H Tan, J Lei, T Wolf, M Bansal
arXiv preprint arXiv:2106.11250, 2021
2021
An Effective Framework for Weakly-Supervised Phrase Grounding
Q Wang, H Tan, S Shen, M Mahoney, Z Yao
Proceedings of the 2020 Conference on Empirical Methods in Natural Language …, 2020
2020
The system can't perform the operation now. Try again later.
Articles 1–19