The Second Perception Test Challenge

Workshop at ECCV 2024

Overview

Following the successful 2023 iteration, we organise the second Perception Test Challenge with the goal of benchmarking multimodal perception models on the Perception Test (blog, github) - a diagnostic benchmark created by Google DeepMind to comprehensively probe the abilities of multimodal models across:

  • video, audio, and text modalities
  • four skill areas: Memory, Abstraction, Physics, Semantics
  • four types of reasoning: Descriptive, Explanatory, Predictive, Counterfactual
  • six computational tasks: multiple-choice video-QA, grounded video-QA, object tracking, point tracking, action localisation, sound localisation

You can try yourself the Perception Test here.

Check the Perception Test github repo for details about the data and annotations format, baselines, and metrics.

Check the Computer Perception workshop at ECCV2022 for recorded talks and slides introducing the Perception Test benchmark.

Check the First Perception Test challenge for details of the previous challenge.

Challenge

Details about challenge tracks coming soon. Prizes totalling 15k EUR are available.

Timeline

  • May 15th - June 15th, 2024: Challenge server goes live with data from the validation split
  • End of June, 2024: Held-out test split released
  • September 14th, 2024: Deadline for submissions
  • September 21st, 2024 : Winners announced
  • Early October, 2024: Challenge-workshop at ECCV2024, Milan

Workshop

Provisional agenda

  • 09:00 - 09:15 Welcome and introduction
  • 09:15 - 09:45 Overview of Perception Test
  • 09:45 - 10:15 Keynote: Abhinav Gupta
  • 10:15 - 10:45 Coffee break
  • 10:45 - 11:00 Challenges overview and winner announcement
  • 11:00 - 11:45 Oral presentations from the challenge winners
  • 11:45 - 12:15 Keynote: Josh Tenenbaum
  • 12:15 - 13:00 Roundtable and closing notes

Speakers

Organizers

Joe Heyward

Google Deepmind
Research Engineer at Google Deepmind.
Research: computer vision.

Joao Carreira

Google Deepmind
Research Scientist at Google DeepMind.
Research: video processing, general perception systems.

Dima Damen

Bristol University
Professor of Computer Vision.
Research: computer vision, video understanding, perception benchmarks.

Andrew Zisserman

University of Oxford
Professor of Computer Vision Engineering at Oxford and a Royal Society Research Professor.
Research: computer vision, machine learning.

Viorica Pătrăucean

Google Deepmind
Research Scientist at Google DeepMind.
Research: computer vision, scalable learning, biologically plausible learning.