publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2024

  1. Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting
    Boying LiZhixi CaiYuan-Fang Li, Ian Reid, and Hamid Rezatofighi
    arXiv preprint arXiv:2409.12518, 2024
  2. NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
    Zhixi Cai*, Cristian Rojas Cardenas*, Kevin Leo*, Chenyuan Zhang*, Kal Backman* , Hanbing Li*Boying Li, Mahsa Ghorbanali, Stavya Datta, Lizhen Qu , and 7 more authors
    arXiv preprint arXiv:2409.10196, 2024
  3. MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing
    Shreya GhoshZhixi CaiAbhinav DhallDimitrios Kollias, Roland Goecke, and Tom Gedeon
    In Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing , 2024
  4. 1M-Deepfakes Detection Challenge
    In Proceedings of the 32nd ACM International Conference on Multimedia , 2024
  5. Content-Driven Multimodal Deepfake Generation and Temporal Localization
    Zhixi Cai
    Monash University , 2024
  6. JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
    Simindokht JahangardZhixi Cai, Shiki Wen, and Hamid Rezatofighi
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024
  7. HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
    Fucai Ke*Zhixi Cai*Simindokht Jahangard*, Weiqing Wang, Pari Delir Haghighi, and Hamid Rezatofighi
    In European Conference on Computer Vision , 2024
  8. AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
    Zhixi CaiShreya Ghosh, Aman Pankaj Adatia, Munawar HayatAbhinav DhallTom Gedeon, and Kalin Stefanov
    In Proceedings of the 32nd ACM International Conference on Multimedia , 2024

2023

  1. Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit
    Shreya Ghosh*Zhixi Cai*, Parul Gupta, Garima Sharma, Abhinav DhallMunawar Hayat, and Tom Gedeon
    arXiv preprint arXiv:2305.05255, 2023
  2. Pavlok-Nudge: A Feedback Mechanism for Atomic Behaviour Modification with Snoring Usecase
    Shreya Ghosh, Rakibul Hasan, Pradyumna Agrawal , Zhixi Cai, Susannah Soon, Abhinav Dhall, and Tom Gedeon
    arXiv preprint arXiv:2305.06110, 2023
  3. Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
    Computer Vision and Image Understanding, 2023
  4. MARLIN: Masked Autoencoder for facial video Representation LearnINg
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2023

2022

  1. Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization
    In 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) , 2022