Zhixi Cai

Monash University. Melbourne. Australia.

šŸ“˜ Iā€™m currently a Research Fellow (Post-Doctoral) at VL4AI Lab, Faculty of IT, Monash University. My current core research area is video understanding and reasoning using neurosymbolic and large language models.

šŸŽ“ I obtained my PhD in Monash University, supervised by Dr. Kalin Stefanov, A/Prof. Abhinav Dhall and Dr. Munawar Hayat in artificial intelligence domain. I completed my thesis Content-Driven Multimodal Deepfake Generation and Temporal Localization, which mainly focuses on deepfakes and video understanding.

šŸ”¬ Now I have published papers in CVPR, ECCV, ACM MM, etc, and get two best paper awards in my PhD journey. Please refer to the publication page for more details.

šŸ”Ž Iā€™m also invited as the reviewer of IEEE TAFFC, IEEE TMM, ACM TKDD, CVIU, INFFUS, KBS, IEEE TAI, ECCV, ACM MM, ICRA, and more.

šŸ–„ļø I enjoy programming and implementing some cool ideas. I have developped several interesting open source applications and libraries in my spare time. Please refer to the projects page for more details.

šŸ› ļø Also, I love discovering and fine-tuning tools in my hand, including both software tools and physical tools.

news

Jul 15, 2024 The paper is awarded as best student paper in ACM MM 2024.
Jul 15, 2024 A paper is accepted by ACM MM 2024 Oral.
Jul 08, 2024 A paper is accepted by ACII 2024 Demo.
Jul 01, 2024 A paper is accepted by ECCV 2024.
Apr 25, 2024 Host MRAC: Multimodal, Generative and Responsible Affective Computing Workshop at ACM-MM 2024.

selected publications

  1. AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
    Zhixi Cai,Ā Shreya Ghosh,Ā Aman Pankaj Adatia,Ā Munawar Hayat,Ā Abhinav Dhall,Ā Tom Gedeon,Ā andĀ Kalin Stefanov
    In Proceedings of the 32nd ACM International Conference on Multimedia , 2024
  2. Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
    Zhixi Caiā€ ,Ā Shreya Ghosh,Ā Abhinav Dhall,Ā Tom Gedeon,Ā Kalin Stefanov,Ā andĀ Munawar Hayat
    Computer Vision and Image Understanding, 2023
  3. MARLIN: Masked Autoencoder for facial video Representation LearnINg
    Zhixi Cai,Ā Shreya Ghosh,Ā Kalin Stefanov,Ā Abhinav Dhall,Ā Jianfei Cai,Ā Hamid Rezatofighi,Ā Reza Haffari,Ā andĀ Munawar Hayat
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
  4. Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization
    Zhixi Cai,Ā Kalin Stefanov,Ā Abhinav Dhall,Ā andĀ Munawar Hayat
    In 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) , 2022