Zhixi Cai

Monash University. Melbourne. Australia.

šŸ“˜ Iā€™m currently a Research Fellow (Post-Doctoral) at VL4AI Lab, Faculty of IT, Monash University. My current core research area is video understanding and reasoning using neurosymbolic and large language models.

šŸŽ“ I obtained my PhD in Monash University, supervised by Dr. Kalin Stefanov, A/Prof. Abhinav Dhall and Dr. Munawar Hayat in artificial intelligence domain. I completed my thesis Content-Driven Multimodal Deepfake Generation and Temporal Localization, which mainly focuses on deepfakes and video understanding.

šŸ”¬ Now I have published papers in CVPR, ECCV, CVIU, and DICTA. Please refer to the publication page for more details.

šŸ”Ž Iā€™m also invited as the reviewer of IEEE TAFFC, IEEE TMM, ACM TKDD, CVIU, INFFUS, KBS, IEEE TAI, ECCV, ACM MM, ACM ICMI, MBE, IET-CVI, DICTA, CVIP, ICVGIP.

šŸ–„ļø I enjoy programming and implementing some cool ideas. I have developped several interesting open source applications and libraries in my spare time. Please refer to the projects page for more details.

šŸ› ļø Also, I love discovering and fine-tuning tools in my hand, including both software tools and physical tools.

news

Jul 15, 2024 A paper is accepted by ACM MM 2024 Oral.
Jul 08, 2024 A paper is accepted by ACII 2024 Demo.
Jul 01, 2024 A paper is accepted by ECCV 2024.
Apr 25, 2024 Host MRAC: Multimodal, Generative and Responsible Affective Computing Workshop at ACM-MM 2024.
Mar 09, 2024 Host 1M-Deepfakes Detection Challenge at ACM-MM 2024.

selected publications

  1. AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
    Zhixi Cai,Ā Shreya Ghosh,Ā Aman Pankaj Adatia,Ā Munawar Hayat,Ā Abhinav Dhall,Ā andĀ Kalin Stefanov
    arXiv preprint arXiv:2311.15308, 2023
  2. Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
    Zhixi Caiā€ ,Ā Shreya Ghosh,Ā Abhinav Dhall,Ā Tom Gedeon,Ā Kalin Stefanov,Ā andĀ Munawar Hayat
    Computer Vision and Image Understanding, 2023
  3. MARLIN: Masked Autoencoder for facial video Representation LearnINg
    Zhixi Cai,Ā Shreya Ghosh,Ā Kalin Stefanov,Ā Abhinav Dhall,Ā Jianfei Cai,Ā Hamid Rezatofighi,Ā Reza Haffari,Ā andĀ Munawar Hayat
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
  4. Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization
    Zhixi Cai,Ā Kalin Stefanov,Ā Abhinav Dhall,Ā andĀ Munawar Hayat
    In 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) , 2022