OpenAI is testing a new machine learning model called Video PreTraining (VPT), in which AI performs tasks using video footage from the web. The test case: An AI is to produce a diamond pickaxe in a new game in Minecraft. OpenAI states that creating a diamond pickaxe takes skilled human players over 20 minutes and around 24,000 inputs. While previous agents in Minecraft used action spaces, OpenAI’s AI uses a mouse and keyboard and plays at a 20Hz frame rate.
Video pre-training, how it works
There are thousands of hours of video on the internet for just about anything that people naturally use to learn. People can draw on contextual knowledge: Which key combination leads to which result on the screen or in a game. Machine learning agents lack this context, which is why AI cannot simply scrape videos from the web to learn. Open AI first trained a model using 2,000 hours of video footage, for which crowdworkers recorded video and input. The model was then unleashed on 70,000 hours of unlabeled Internet video.
Minecraft was chosen because a lot of video material is available and the game allows a large number of tasks in its open world, according to OpenAI in the blog. The VPT Foundation Model copies the behavior of human players in certain game situations known from the videos. According to the developer, agents trained in this way can handle tasks that were previously difficult to achieve using classic, reward-based deep learning approaches. In addition to targeted collection and crafting, the AI also builds the good old Nerd Pole out of earth in certain situations.
Models and training data available on GitHub
With the new VPT method, OpenAI promises further advances in generalized agents. Only recently, DeepMind introduced the Gato agent in this area, which is based on large language models. Whether advanced agents or speech AI is similar to humans or just imitates them has recently been hotly debated because of the chatbot LaMDA.
To develop the VPT approach To move forward, OpenAI released the crowdworkers’ data, the Minecraft environment, the model code, and the model weights free on GitHub. The company is also a partner of this year’s MineRL NeurIPS Contest, where participants can try to solve certain tasks in Minecraft by fine-tuning the VPT model. There is a $20,000 prize pool for this, plus an additional $100,000 should a participant achieve a special, unforeseen breakthrough.