Share this research:

Twitter LinkedIn
Human vs. Machine Minds: Ego-Centric Action Recognition Compared

Human vs. Machine Minds: Ego-Centric Action Recognition Compared

* These authors contributed equally
[1] University of Surrey [2] University of Newcastle
IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025 Workshop on Multimodal Algorithmic Reasoning (MAR'25)

Explore the differences between human and AI action recognition in ego-centric videos.

Research pipeline for comparing human and AI performance

Our research pipeline, outlines our approach to comparing human and AI performance in ego-centric video action recognition. We began by employing a classifier to pre-select easy and hard video sets. To enable a comparison between how human and AI models recognise activities in video, we artificially and systematically reduced the video's spatial resolution. Then, using human participants and an AI model as classifiers, we evaluate and compare human and AI performance on these spatially reduced videos to quantify the difference in recognition between the human and AI models.

Abstract: Human vs. Machine Action Recognition

Humans reliably surpass the performance of the most advanced AI models in action recognition, especially in real-world scenarios with low resolution, occlusions, and visual clutter. These models are somewhat similar to humans in using architecture that allows hierarchical feature extraction. However, they prioritise different features, leading to notable differences in their recognition. This study investigated these differences by introducing Epic ReduAct, a dataset derived from Epic-Kitchens-100. It consists of Easy and Hard ego-centric videos across various action classes. Critically, our dataset incorporates the concepts of Minimal Recognisable Configuration (MIRC) and sub-MIRC derived by progressively reducing the spatial content of the action videos across multiple stages. This enables a controlled evaluation of recognition difficulty for humans and AI models. This study examines the fundamental differences between human and AI recognition processes. While humans, unlike AI models, demonstrate proficiency in recognising hard videos, they experience a sharp decline in recognition ability as visual information is reduced, ultimately reaching a threshold beyond which recognition is no longer possible. In contrast, the AI models examined in this study appeared to exhibit greater resilience within this specific context, with recognition confidence decreasing gradually or, in some cases, even increasing at later reduction stages. These findings suggest that the limitations observed in human recognition do not directly translate to AI models, highlighting the distinct nature of their processing mechanisms.

Want to Learn More?

Download the full paper or explore the Epic ReduAct dataset to dive deeper into our research.

Download Paper Explore Dataset

Epic ReduAct Dataset

Epic ReduAct Dataset download

The Epic ReduAct dataset is a new dataset derived from the Epic-Kitchens-100 dataset, which is a large-scale ego-centric video dataset designed for action recognition tasks. The Epic ReduAct dataset is specifically designed to investigate the differences between human and AI model performance in ego-centric action recognition. It consists of 36 videos, with 18 videos classified as Easy and 18 as Hard, representing different levels of activity recognition difficulty. The dataset is designed to systematically reduce the spatial information of the videos across eight hierarchical levels, allowing for a controlled evaluation of recognition difficulty for both humans and AI models. The dataset incorporates the concepts of Minimal Recognisable Configuration (MIRC) and sub-MIRC, which are derived by progressively reducing the spatial content of the action videos. This enables a detailed analysis of how humans and AI models perform in recognizing actions under varying levels of visual information.

Frequently Asked Questions

What is the Epic ReduAct dataset?

The Epic ReduAct dataset is derived from the Epic-Kitchens-100 dataset and is designed to compare human and AI performance in ego-centric action recognition.

How does this research benefit AI development?

Our research highlights the differences in recognition mechanisms between humans and AI, providing insights for improving AI models in challenging real-world scenarios.

Where can I access the dataset and code?

You can access the dataset and code on our GitHub repository.

Key Findings from the Epic ReduAct Dataset

This image presents the recognition-gap frequency distribution for the Easy, Hard and combined sets (a,b,c), which allows for comparison between humans and AI model. Our results show a similar distribution pattern to previous work with images (d). Similarly, AI models exhibit some improvement in image recognition, whereas human accuracy consistently declines. Humans also experience a sharper decrease in recognition performance compared to AI models (d). Our results further show that humans are susceptible to substantial losses in recognition confidence. In contrast, spatial reductions can enhance the AI model's ability to detect actions, as evidenced by negative recognition gaps. The frequency distributions are also broader for humans compared to the AI model, showing more diverse reductions than the AI model reductions in recognition gaps, which are more gradual. These findings indicate that, despite advancements in AI models, the gap between human and machine recognition capabilities persists.

BibTeX

@inproceedings{Rahmani:HumanvsMachine:CVPRWS:2025,
        AUTHOR = Rahmani, Sadegh and Rybansky, Filip and Vuong, Quoc and Guerin, Frank and Gilbert, Andrew ",
        TITLE = "Human vs. Machine Minds: Ego-Centric Action Recognition Compared",
        BOOKTITLE = "IEEE/CVF Conference on Computer Vision and Pattern Recognition - Workshop on Multimodal Algorithmic Reasoning (MAR'25)",
        YEAR = "2025",
        }