‘Millions of images and YouTube videos, linked and tagged to teach computers what a spoon is.’
YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research
YouTube-8M is a large-scale labeled video dataset that consists of 8 million YouTube video IDs and associated labels from a diverse vocabulary of 4800 visual entities. It also comes with precomputed state-of-the-art vision features from billions of frames, which fit on a single hard disk. This makes it possible to train video models from hundreds of thousands of video hours in less than a day on 1 GPU!
Our goal is to accelerate research on large-scale video understanding, representation learning, noisy data modeling, transfer learning, and domain adaptation approaches for video. More details about the dataset and initial experiments can be found in our technical report.’