mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ... arXiv preprint arXiv:2304.14178, 2023 | 413 | 2023 |
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval Y Ma, G Xu, X Sun, M Yan, J Zhang, R Ji Proceedings of the 30th ACM International Conference on Multimedia, 638-647, 2022 | 137 | 2022 |
Semi-autoregressive neural machine translation C Wang, J Zhang, H Chen arXiv preprint arXiv:1808.08583, 2018 | 87 | 2018 |
mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration Q Ye, H Xu, J Ye, M Yan, H Liu, Q Qian, J Zhang, F Huang, J Zhou arXiv preprint arXiv:2311.04257, 2023 | 73 | 2023 |
mplug-2: A modularized multi-modal foundation model across text, image and video H Xu, Q Ye, M Yan, Y Shi, J Ye, Y Xu, C Li, B Bi, Q Qian, W Wang, G Xu, ... arXiv preprint arXiv:2302.00402, 2023 | 73 | 2023 |
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections C Li, H Xu, J Tian, W Wang, M Yan, B Bi, J Ye, H Chen, G Xu, Z Cao, ... arXiv preprint arXiv:2205.12005, 2022 | 68 | 2022 |
AliMeKG: Domain Knowledge Graph Construction and Application in E-commerce FL Li, H Chen, G Xu, T Qiu, F Ji, J Zhang, H Chen Proceedings of the 29th ACM International Conference on Information …, 2020 | 59 | 2020 |
A deep cascade model for multi-document reading comprehension M Yan, J Xia, C Wu, B Bi, Z Zhao, J Zhang, L Si, R Wang, W Wang, ... Proceedings of the AAAI Conference on Artificial Intelligence 33, 7354-7361, 2019 | 57 | 2019 |
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross-and Intra-modal Knowledge Integration Y Cui, Z Yu, C Wang, Z Zhao, J Zhang, M Wang, J Yu arXiv preprint arXiv:2108.07073, 2021 | 54 | 2021 |
Hitea: Hierarchical temporal-aware video-language pre-training Q Ye, G Xu, M Yan, H Xu, Q Qian, J Zhang, F Huang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 47 | 2023 |
mplug-docowl: Modularized multimodal large language model for document understanding J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ... arXiv preprint arXiv:2307.02499, 2023 | 45 | 2023 |
Evaluation and analysis of hallucination in large vision-language models J Wang, Y Zhou, G Xu, P Shi, C Zhao, H Xu, Q Ye, M Yan, J Zhang, J Zhu, ... arXiv preprint arXiv:2308.15126, 2023 | 43 | 2023 |
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding J Ye, J Tian, M Yan, X Yang, X Wang, J Zhang, L He, X Lin arXiv preprint arXiv:2203.15442, 2022 | 31 | 2022 |
Cvalues: Measuring the values of chinese large language models from safety to responsibility G Xu, J Liu, M Yan, H Xu, J Si, Z Zhou, P Yi, X Gao, J Sang, R Zhang, ... arXiv preprint arXiv:2307.09705, 2023 | 29 | 2023 |
CAT-MNER: Multimodal Named Entity Recognition with Knowledge-Refined Cross-Modal Attention X Wang, J Ye, Z Li, J Tian, Y Jiang, M Yan, J Zhang, Y Xiao 2022 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2022 | 29 | 2022 |
Ureader: Universal ocr-free visually-situated language understanding with multimodal large language model J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ... arXiv preprint arXiv:2310.05126, 2023 | 28 | 2023 |
Adavqa: Overcoming language priors with adapted margin cosine loss Y Guo, L Nie, Z Cheng, F Ji, J Zhang, A Del Bimbo arXiv preprint arXiv:2105.01993, 2021 | 25 | 2021 |
KACE: Generating Knowledge-Aware Contrastive Explanations for Natural Language Inference Q Chen, F Ji, X Zeng, FL Li, J Zhang, H Chen, Y Zhang | 22* | |
DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning Q Chen, FL Li, G Xu, M Yan, J Zhang, Y Zhang arXiv preprint arXiv:2208.00635, 2022 | 17 | 2022 |
An llm-free multi-dimensional benchmark for mllms hallucination evaluation J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia, M Yan, J Zhang, J Sang arXiv preprint arXiv:2311.07397, 2023 | 16 | 2023 |