publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2026

  1. ICLR
    mata.jpg
    MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning
    In International Conference on Learning Representations , 2026
  2. VIEW2SPACE: Studying Multi-View Visual Reasoning from Sparse Observations
    Fucai KeZhixi CaiBoying Li, Long Chen, Beibei Lin, Weiqing Wang, Pari Delir Haghighi, Gholamreza Haffari, and Hamid Rezatofighi
    arXiv preprint arXiv:2603.16506, 2026
  3. JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics
    Simindokht Jahangard, Mehrzad Mohammadi, Yi Shen , Zhixi Cai, and Hamid Rezatofighi
    In Proceedings of the AAAI Conference on Artificial Intelligence , 2026
  4. Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents
    arXiv preprint arXiv:2604.17019, 2026

2025

  1. NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning
    In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , 2025
  2. NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
    Zhixi Cai*‡, Cristian Rojas Cardenas*Kevin Leo*Chenyuan Zhang*, Kal Backman* , Hanbing Li*Boying Li, Mahsa Ghorbanali, Stavya Datta, Lizhen Qu , and 7 more authors
    IEEE Robotics and Automation Letters, 2025
  3. AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations
    Zhixi Cai, Kartik Kuckreja, Shreya Ghosh, Akanksha Chuchra, Muhammad Haris Khan, Usman TariqTom Gedeon, and Abhinav Dhall
    In Proceedings of the 33rd ACM International Conference on Multimedia , 2025
  4. Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting
    In 2025 IEEE International Conference on Robotics and Automation (ICRA) , 2025
  5. Multimodal Deepfake Generation and Detection: Challenges, Methods, and Future Directions
    Abhinav DhallZhixi Cai, and Shreya Ghosh
    In Companion Proceedings of the 27th International Conference on Multimodal Interaction , 2025
  6. Explain Before You Answer: A Survey on Compositional Visual Reasoning
    Fucai Ke, Joy Hsu , Zhixi Cai, Zixian Ma, Xin Zheng, Xindi Wu, Sukai Huang, Weiqing Wang, Pari Delir Haghighi, Gholamreza Haffari , and 3 more authors
    arXiv preprint arXiv:2508.17298, 2025
  7. MRAC 2025: 3rd International Workshop on Multimodal, Generative and Responsible Affective Computing
    Zheng Lian, Shreya Ghosh, Erik Cambria , Zhixi Cai, Guoying Zhao, Abhinav Dhall, Björn W. Schuller, Roland Goecke, Jianhua Tao, and Tom Gedeon
    In Proceedings of the 33rd ACM International Conference on Multimedia , 2025
  8. DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
    Fucai Ke, Vijay Kumar B G, Xingjian Leng , Zhixi Cai, Zaid Khan, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi, and Manmohan Chandraker
    In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , 2025

2024

  1. AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
    Zhixi CaiShreya Ghosh, Aman Pankaj Adatia, Munawar HayatAbhinav DhallTom Gedeon, and Kalin Stefanov
    In Proceedings of the 32nd ACM International Conference on Multimedia , 2024
  2. 1M-Deepfakes Detection Challenge
    In Proceedings of the 32nd ACM International Conference on Multimedia , 2024
  3. Content-Driven Multimodal Deepfake Generation and Temporal Localization
    Zhixi Cai
    Monash University , 2024
  4. HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
    Fucai Ke*Zhixi Cai*Simindokht Jahangard*, Weiqing Wang, Pari Delir Haghighi, and Hamid Rezatofighi
    In European Conference on Computer Vision , 2024
  5. Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit
    Shreya Ghosh*Zhixi Cai*, Parul Gupta, Garima Sharma, Abhinav DhallMunawar Hayat, and Tom Gedeon
    In 12th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) , 2024
  6. JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
    Simindokht JahangardZhixi Cai, Shiki Wen, and Hamid Rezatofighi
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024
  7. MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing
    Shreya GhoshZhixi CaiAbhinav DhallDimitrios Kollias, Roland Goecke, and Tom Gedeon
    In Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing , 2024

2023

  1. MARLIN: Masked Autoencoder for facial video Representation LearnINg
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
  2. Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
    Computer Vision and Image Understanding, 2023
  3. Pavlok-Nudge: A Feedback Mechanism for Atomic Behaviour Modification with Snoring Usecase
    Shreya Ghosh, Rakibul Hasan, Pradyumna Agrawal , Zhixi Cai, Susannah Soon, Abhinav Dhall, and Tom Gedeon
    arXiv preprint arXiv:2305.06110, 2023

2022

  1. Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization
    In 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) , 2022