Following the successful 2023 iteration, we organise the second Perception Test Challenge with the goal of benchmarking multimodal perception models on the Perception Test (blog, github) - a diagnostic benchmark created by Google DeepMind to comprehensively probe the abilities of multimodal models across:
You can try yourself the Perception Test here.
Check the Perception Test github repo for details about the data and annotations format, baselines, and metrics.
Check the Computer Perception workshop at ECCV2022 for recorded talks and slides introducing the Perception Test benchmark.
Check the First Perception Test challenge for details of the previous challenge.
Perception Test overview slides from the 2024 workshop here.
Contact: viorica at google.com, perception-test at google.com
The Second Perception Test challenge includes the 6 original Perception Test tasks, plus an additional task focused on hour-long videoQA. (Check the links to access the eval.ai challenge pages)
We offer cash prizes totalling EUR 20K to top competitors across tasks, with special awards for models that complete multiple/all tasks under zero-shot evaluation regime.
We received 680 submissions from 123 teams across all seven tracks. We awarded runner-up and best performance per track.
Single Object Tracking
Single Point Tracking
Temporal Action Localisation
Temporal Sound Localisation
Multiple-Choice Video Question-Answering
Grounded Video Question-Answering
Hour-Long Video Question-Answering