NYU Researchers Build AI that Sees Through a Child's Eyes

NYU Researchers Build AI that “Sees” Through a Child’s Eyes

Researchers at New York University (NYU) have taken a revolutionary step in artificial intelligence (AI) development by creating an AI system that learns from footage captured from a child’s perspective.

This groundbreaking approach, detailed in the journal Science, sheds light on how both humans and AI can learn effectively from limited data.

Inspired by Children’s Learning

The study drew inspiration from how children learn by absorbing vast amounts of information from their surroundings, gradually making sense of the world around them.

To replicate the same process, the team created a unique dataset: 60 hours of first-person video recordings from head-mounted cameras worn by children aged six months to two years. This rich dataset provided the AI with a child’s-eye view of the world.

Understanding Actions and Changes Without Labels

The researchers then trained a self-supervised learning (SSL) AI model using this dataset. Unlike traditional methods that rely heavily on labeled data, SSL approaches enable AI models to learn patterns and structures in the data without explicit labels.

This allowed the AI to grasp the concept of actions and changes by analyzing temporal information in the videos, similar to how a child learns by observing movement and interactions.

Learning Efficiency and Impressive Performance

The results were impressive. Despite the video data covering only 1% of the child’s waking hours, the AI system could learn numerous words and concepts, showcasing the efficiency of learning from limited but targeted data.

Here are some highlights:

  • Action Recognition: The AI models trained on this dataset excelled at recognizing actions in videos, even with minimal labeled examples. They performed competitively on large datasets like Kinetics-700, suggesting the child-centric approach provided a rich learning environment.
  • Video Interpolation: The models even learned to predict missing segments within video sequences, mirroring human perception and prediction of actions.
  • Robust Object Recognition: The study revealed that video-trained models developed more robust object representations than those trained on static images, highlighting the value of temporal information in learning versatile models.
  • Data Scaling and Performance: As expected, the models’ performance improved with more video data, implying that access to extensive, realistic data holds the key to further advancements.
Picture of Andrew Dennis

Andrew Dennis

Andrew Dennis is a tech enthusiast passionate about AI, technology , and businesses using the AI ecosystem to scale. He simplifies complex concepts to engage readers and loves exploring the latest in AI innovations in his free time.

Leave a Comment

email subscription

Receive Latest AI Insights To Your Inbox