The Second Perception Test Challenge

Workshop at ECCV 2024, September 29, AM, Room: Suite 7


Following the successful 2023 iteration, we organise the second Perception Test Challenge with the goal of benchmarking multimodal perception models on the Perception Test (blog, github) - a diagnostic benchmark created by Google DeepMind to comprehensively probe the abilities of multimodal models across:

  • three modalities: video, audio, and text
  • four skill areas: Memory, Abstraction, Physics, Semantics
  • four types of reasoning: Descriptive, Explanatory, Predictive, Counterfactual
  • six computational tasks: multiple-choice video-QA, grounded video-QA, object tracking, point tracking, action localisation, sound localisation

You can try yourself the Perception Test here.

Check the Perception Test github repo for details about the data and annotations format, baselines, and metrics.

Check the Computer Perception workshop at ECCV2022 for recorded talks and slides introducing the Perception Test benchmark.

Check the First Perception Test challenge for details of the previous challenge.

Contact: viorica at, perception-test at


The Second Perception Test challenge includes the 6 original Perception Test tasks, plus an additional task focused on hour-long videoQA. (Check the links to access the challenge pages)

We will offer cash prizes totalling EUR 20K to top competitors across tasks, with special awards for models that complete multiple/all tasks under zero-shot evaluation regime.


  • June 10th, 2024: Challenge server goes live with data from the validation split
  • July 1st, 2024: Held-out test split released
  • September 14th, 2024: Deadline for submissions
  • September 22nd, 2024 : Winners announced
  • September 29th, 2024: Challenge-workshop at ECCV2024, Milan


Agenda (Room: Suite 7)

  • 09:30 - 09:45 Welcome and introduction
  • 09:45 - 10:15 Overview of Perception Test
  • 10:15 - 10:45 Keynote: Abhinav Gupta
  • 10:45 - 11:15 Coffee break
  • 11:15 - 11:30 Challenges overview and winner announcement
  • 11:30 - 12:00 Oral presentations from the challenge winners
  • 12:00 - 12:30 Keynote: Josh Tenenbaum
  • 12:30 - 13:00 Roundtable and closing notes



Joe Heyward

Google Deepmind
Research Engineer at Google Deepmind.
Research: computer vision.

Joao Carreira

Google Deepmind
Research Scientist at Google DeepMind.
Research: video processing, general perception systems.

Dima Damen

Bristol University
Professor of Computer Vision.
Research: computer vision, video understanding, perception benchmarks.

Andrew Zisserman

University of Oxford
Professor of Computer Vision Engineering at Oxford and a Royal Society Research Professor.
Research: computer vision, machine learning.

Viorica Pătrăucean

Google Deepmind
Research Scientist at Google DeepMind.
Research: computer vision, scalable learning, biologically plausible learning.