The First Perception Test Challenge

Overview

With the rise of large multimodal models (e.g. Flamingo, BeIT-3, GPT-4), integrated perception systems that can achieve human level scene understanding may be on the horizon. Making progress towards this ambitious goal requires robust and comprehensive evaluation benchmarks and strategies to reveal the strengths and weaknesses (including biases) of these models and guide research. There are many benchmarks in the multimodal space that have led to amazing progress in the field, but each one targets restricted aspects of perception: image benchmarks exclude temporal aspects; visual question-answering tends to focus mostly on image-level semantic understanding; object tracking tasks generally capture lower-level appearance of individual objects, like colour or texture. Some important aspects are poorly covered (e.g. memory skills or Physics understanding).

The proposed challenge-workshop aims to benchmark multimodal perception models by organising a competition around the Perception Test benchmark (blog, github). The Perception Test is a diagnostic benchmark created by DeepMind to counteract some of the limitations of existing benchmarks mentioned above by comprehensively probing the abilities of multimodal models across video, audio, and text modalities, in four skill areas (Memory, Abstraction, Physics, Semantics), four types of reasoning (descriptive, explanatory, predictive, counterfactual), and six computational tasks (multiple-choice video-QA, grounded video-QA, object tracking, point tracking, action localisation, sound localisation). The training and public test set were released in October 2022, and the held-out test set will be released together with the evaluation server for this competition.

You can try yourself the Perception Test here.

Check the Perception Test github repo for details about the data and annotations format, baselines, and metrics.

Check the Computer Perception workshop at ECCV2022 for recorded talks and slides introducing the Perception Test benchmark.

Challenge

We will host the first Perception Test challenge with the following tracks (Check the links to access the eval.ai challenges)

Prizes totalling 15k EUR are available across all challenges.

The First Perception Test Challenge winners

We received 475 submissions from 63 teams across all six tracks. We awarded runner-up and best performance per track, plus 2 awards for most novel submissions across tracks.

Single Object Tracking

Best performance: Team X-Works (Baojun Li, Jiamian Huang, Tao Liu) [report]
Runner-up: Team sth (Limin Wang, Gangshan Wu, Yutao Cui, Tianhui Song) [report]

Single Point Tracking

Best performance: Team NJUST_KMG_Point (Hongpeng Pan, Yang Yang, Zhongtian Fu, Yuxuan Zhang, Shian Du, Yi Xu, and Xiangyang Ji) [report]
Runner-up: Team THETEAM (Han Zang, Tianyang Xu, Xue-Feng Zhu, Xiao-Jun Wu, Josef Kittler) [report]

Temporal Action Localisation

Best performance: Team CTCV (Xinmeng Zuo, Yuting Zhang, Ruijie Zhao, Jiang Liu and Hao Sun) [report]
Runner-up: Team OpenGVLab (Jiashuo Yu, Guo Chen, Yizhuo Li, Yali Wang, Limin Wang, Yu Qiao) [report]

Temporal Sound Localisation

Best performance: Team OpenGVLab (Jiashuo Yu, Guo Chen, Yizhuo Li, Yali Wang, Limin Wang, Yu Qiao) [report]
Runner-up: Team NJUST_KMG (Yurui Huang, Shuo Chen, Xinyan Wang, Yang Yang) [report]

Multiple-Choice Video Question-Answering

Best performance: Team hsslab_inspur (Baoyu Fan, Runze Zhang, Xiaochuan Li, Lu Liu, Li Wang, Zhenhua Guo, Yaqian Zhao, Rengang Li) [report]
Runner-up: Team TTgogogo (Dongshuai Li, Chenglei Dai) [report]

Grounded Video Question-Answering

Best performance: Team NJUST--KMG (Hailiang Zhang, Dian Chao, Zhihao Guan, Weili Guo, Yang Yang) [report]
Runner-up: Not awarded

Most novel submissions

2nd place in Single Point Tracking: Team THETEAM (Han Zang, Tianyang Xu, Xue-Feng Zhu, Xiao-Jun Wu, Josef Kittler) [report]
3rd place in Temporal Sound Localisation: Team JNU_boat (Linze Li, Rongchang Li, Tianyang Xu, Xiao-Jun Wu, Josef Kittler) [report]

Timeline

June 15th - August 1st, 2023: Challenge server goes live with data from the validation split
August 1st, 2023: Held-out test split released
September 15th, 2023: Deadline for submissions
September 22nd, 2023 : Winners announced
October 3rd, 2023: Challenge-workshop at ICCV2023, Paris

The First Perception Test Challenge

Workshop at ICCV, October 3rd (AM), 2023

Workshop Stream

Overview

Challenge

The First Perception Test Challenge winners

Timeline

Workshop

Agenda

Speakers

Derek Hoiem

Rohit Girdhar

Organizers

Viorica Pătrăucean

Joao Carreira

Dima Damen

Andrew Zisserman

Joe Heyward