The Second Perception Test Challenge

Workshop at ECCV 2024, September 29, AM, Room: Suite 7

Overview

Following the successful 2023 iteration, we organise the second Perception Test Challenge with the goal of benchmarking multimodal perception models on the Perception Test (blog, github) - a diagnostic benchmark created by Google DeepMind to comprehensively probe the abilities of multimodal models across:

  • three modalities: video, audio, and text
  • four skill areas: Memory, Abstraction, Physics, Semantics
  • four types of reasoning: Descriptive, Explanatory, Predictive, Counterfactual
  • six computational tasks: multiple-choice video-QA, grounded video-QA, object tracking, point tracking, action localisation, sound localisation

You can try yourself the Perception Test here.

Check the Perception Test github repo for details about the data and annotations format, baselines, and metrics.

Check the Computer Perception workshop at ECCV2022 for recorded talks and slides introducing the Perception Test benchmark.

Check the First Perception Test challenge for details of the previous challenge.

Perception Test overview slides from the 2024 workshop here.

Contact: viorica at google.com, perception-test at google.com

Challenge

The Second Perception Test challenge includes the 6 original Perception Test tasks, plus an additional task focused on hour-long videoQA. (Check the links to access the eval.ai challenge pages)

We offer cash prizes totalling EUR 20K to top competitors across tasks, with special awards for models that complete multiple/all tasks under zero-shot evaluation regime.

Timeline

  • June 10th, 2024: Challenge server goes live with data from the validation split
  • July 1st, 2024: Held-out test split released
  • September 14th, 2024: Deadline for submissions
  • September 22nd, 2024 : Winners announced
  • September 29th, 2024: Challenge-workshop at ECCV2024, Milan

The Second Perception Test Challenge winners

We received 680 submissions from 123 teams across all seven tracks. We awarded runner-up and best performance per track.

Single Object Tracking

  • Best performance: Team NJUST-THU (Zhiqiang Zhong, Yang Yang, Fengqiang Wan, Henglu Wei, Xiangyang Ji) [report]
  • Runner-up: Team FAUgeddaboudit (Amin Heydarshahi, Shubhaankar Gupta, Bernhard Egger) [report]

Single Point Tracking

  • Best performance: Team SV (Hengzhi Zhang from Ricoh Software Research Center Beijing Co., Ltd) [report]
  • Runner-up: Team NJUST_kmg (Yuxuan Zhang, Pengsong Niu, Kun Yu, Qingguo Chen, Yang Yang) [report]

Temporal Action Localisation

  • Best performance: Team NJUST--_KMG (Yinan Han, Qingyuan Jiang, Hongming Mei, Yang Yang, Jinhui Tang) [report]
  • Runner-up: Team AITC (Songlian Li, Zitao Gao, Huili Huang, Xinlong Sun) [report]

Temporal Sound Localisation

  • Best performance: Team NJUST_KMG0 (Haowei Gu, Weihao Zhu, Yang Yang) [report]
  • Runner-up: Team JNU-Boat (Linze Li, Rongchang Li, Cong Wu, Tianyang Xu, Xiao-Jun Wu, Josef Kittler) [report]

Multiple-Choice Video Question-Answering

  • Best performance: Team SEU-2023 (Yingzhe Peng, Yixiao Yuan, Zitian Ao, Huapeng Zhou, Kangqi Wang, Qipeng Zhu, Xu Yang) [report]
  • Runner-up: Team TTgogogo (Dongshuai Li, Xingxian Liu, Fuyu Lv) [report]

Grounded Video Question-Answering

  • Best performance: Team Research newbie (Yi-Jing Wu, Jo-Ting Chen, Hsing-Chen Lee, Jun-Cheng Chen) [report]
  • Runner-up: Team UCF_CRCV(Joseph Fioresi, Tina Tran, Mubarak Shah)[report]

Hour-Long Video Question-Answering

  • Best performance: Team blackmonkey (Bozheng Li, Yangguang Ji, Yongliang Wu, Jiawang Cao, Wenbo Zhu, Jay Wu, Xu Yang) [report]
  • Runner-up: Team JJ_James (Yi Lu, Licheng Tang, Yuyang Sun, Wenyu Zhang, Weiheng Chi, Yalun Dai, Jing Wang) [report]

Workshop

Agenda (Room: Suite 7)

  • 09:30 - 09:45 Welcome and introduction
  • 09:45 - 10:15 Overview of Perception Test
  • 10:15 - 10:45 Keynote: Abhinav Gupta
  • 10:45 - 11:15 Coffee break
  • 11:15 - 11:30 Challenges overview and winner announcement
  • 11:30 - 12:00 Oral presentations from the challenge winners
  • 12:00 - 12:30 Keynote: Josh Tenenbaum
  • 12:30 - 13:00 Roundtable and closing notes

Speakers

Organizers

Joe Heyward

Google Deepmind
Research Engineer at Google Deepmind.
Research: computer vision.

Joao Carreira

Google Deepmind
Research Scientist at Google DeepMind.
Research: video processing, general perception systems.

Dima Damen

Bristol University
Professor of Computer Vision.
Research: computer vision, video understanding, perception benchmarks.

Andrew Zisserman

University of Oxford
Professor of Computer Vision Engineering at Oxford and a Royal Society Research Professor.
Research: computer vision, machine learning.

Viorica Pătrăucean

Google Deepmind
Research Scientist at Google DeepMind.
Research: computer vision, scalable learning, biologically plausible learning.