Zhixi Cai

Monash University. Melbourne. Australia.


šŸ“˜ Iā€™m currently a Research Fellow (Post-Doctoral) at VL4AI Lab, Faculty of IT, Monash University. Supervised by Dr. Hamid Rezatofighi, my current core research area is video understanding and reasoning using neurosymbolic and large language models.

šŸŽ“ I obtained my PhD in Monash University, supervised by Dr. Kalin Stefanov, A/Prof. Abhinav Dhall and Dr. Munawar Hayat in artificial intelligence domain. I completed my thesis Content-Driven Multimodal Deepfake Generation and Temporal Localization, which mainly focuses on deepfakes and video understanding.

šŸ”¬ Now I have published papers in CVPR, ECCV, ACM MM, etc, and get two best paper awards in my PhD journey. Please refer to the publication page for more details.

šŸ”Ž Iā€™m also invited as the reviewer of CVPR, ECCV, ACM MM, ICRA, TPAMI, TMM, TAFFC, and more.

šŸ–„ļø I enjoy programming and implementing some cool ideas. I have developped several interesting open source applications and libraries in my spare time. Please refer to the projects page for more details.

šŸ› ļø Also, I love discovering and fine-tuning tools in my hand, including both software tools and physical tools.

news

Jan 28, 2025 A paper is accepted by ICRA 2025.
Oct 30, 2024 The paper is awarded as best student paper in ACM MM 2024.
Jul 15, 2024 A paper is accepted by ACM MM 2024 Oral.
Jul 08, 2024 A paper is accepted by ACII 2024 Demo.
Jul 01, 2024 A paper is accepted by ECCV 2024.

selected publications

  1. AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
    Zhixi Cai,Ā Shreya Ghosh,Ā Aman Pankaj Adatia,Ā Munawar Hayat,Ā Abhinav Dhall,Ā Tom Gedeon,Ā andĀ Kalin Stefanov
    In Proceedings of the 32nd ACM International Conference on Multimedia , 2024
  2. Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
    Zhixi Caiā€ ,Ā Shreya Ghosh,Ā Abhinav Dhall,Ā Tom Gedeon,Ā Kalin Stefanov,Ā andĀ Munawar Hayat
    Computer Vision and Image Understanding, 2023
  3. MARLIN: Masked Autoencoder for facial video Representation LearnINg
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
  4. Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization
    Zhixi Cai,Ā Kalin Stefanov,Ā Abhinav Dhall,Ā andĀ Munawar Hayat
    In 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) , 2022