You might not know Viggle AI, but you’ve likely seen the viral memes it created. The Canadian AI startup is responsible for dozens of videos remixing the rapper Lil Yachty bouncing on stage at a summer music festival. In one video, Lil Yachty is replaced by Joaquin’s Phoenix’s the Joker. In another, Jesus seemed to be hyping the crowd up. Users made countless versions of this video, but one AI startup was fueling the memes. And Viggle’s CEO says YouTube videos fuel its AI models.
Viggle trained a 3D-video foundation model, JST-1, to have a “genuine understanding of physics,” as the company claims in its press release. Viggle CEO Hang Chu says the key difference between Viggle and other AI video models is that Viggle allows users to specify the motion they want characters to take on. Other AI video models will often create unrealistic character motions that don’t abide by the laws of physics, but Chu claims Viggle’s models are different.
“We are essentially building a new type of graphics engine, but purely with neural networks,” said Chu in an interview. “The model itself is quite different from existing video generators, which are mainly pixel based, and don’t really understand structure and properties of physics. Our model is designed to have such understanding, and that’s why it’s been significantly better in terms of controllability and efficiency of generation.”
To create the video of the Joker as Lil Yachty, for instance, just upload the original video (Lil Yachty dancing on stage) and an image of the character (the Joker) to take on that motion. Alternatively, users can upload images of characters alongside text prompts with instructions on how to animate them. As a third option, Viggle allows users to create animated characters from scratch with text prompts alone.
But the memes are only a small percent of Viggle’s users; Chu says the model has seen wide adoption as a visualization tool for creatives. The videos are far from perfect – they’re shaky and the faces are expressionless – but Chu says it’s proven effective for filmmakers, animators, and video game designers to turn their ideas into something visual. Right now, Viggle’s models only create characters, but Chu hopes to enable more complex videos later on.
Viggle currently offers a free, limited version of its AI model on Discord and its web app. The company also offers a $9.99 subscription for increased capacity, and gives some creators special access through a creator program. The CEO says Viggle is talking with film and video game studios about licensing the technology, but he also is seeing adoption amongst independent animators and content creators.
On Monday, Viggle announced it had raised a $19 million series A led by Andreessen Horowitz, with participation from Two Small Fish. The startup says this round will help Viggle scale, accelerate product development, and expand its team. Viggle tells TechCrunch that it partners with Google Cloud, among other cloud providers, to train and run its AI models. Those Google Cloud partnerships often include access to GPU and TPU clusters, but typically not YouTube videos to train AI models on.
Training data
During TechCrunch’s interview with Chu, we asked what data Viggle’s AI video models were trained on.
“So far we’ve been relying on data that has been publicly available,” said Chu, relaying a similar line to what OpenAI’s CTO Mira Murati answered about Sora’s training data.
Asked if Viggle’s training data set included YouTube videos, Chu responded plainly: “Yeah.”
That might be a problem. In April, YouTube CEO Neal Mohan told Bloomberg that using YouTube videos to train an AI text-to-video generator would be a “clear violation” of the platform’s terms of service. The comments were in the context of OpenAI potentially having used YouTube videos to train Sora.
Mohan clarified that Google, which owns YouTube, may have contracts with certain creators to use their videos in training datasets for Google DeepMind’s Gemini. However, harvesting video from the platform is not allowed, according to Mohan and YouTube’s terms of service, without obtaining prior permission from the company.
After TechCrunch’s interview with Viggle’s CEO, a spokesperson for Viggle emailed to backtrack on Chu’s statement, telling TechCrunch the CEO “spoke too soon in regards to if Viggle uses YouTube data as training. In truth, Hang/Viggle is unable to share details of their training data.”
We pointed out that Chu had already done so on the record, however, and asked for a clear statement on the matter. Viggle’s spokesperson confirmed in their reply that the AI startup trains on YouTube videos:
Viggle leverages a variety of public sources, including YouTube, to generate AI content. Our training data has been carefully curated and refined, ensuring compliance with all terms of service throughout the process. We prioritize maintaining strong relationships with platforms like YouTube, and we are committed to respecting their terms by avoiding massive amounts of downloads and any other actions that would involve unauthorized video downloads.
This approach to compliance seems to conflict with Mohan’s comments in April that YouTube’s video corpus is not a public source. We reached out to spokespeople for YouTube and Google, but have yet to hear back.
The startup joins others in a grey area in using YouTube as training data. It’s been reported that lots of AI model developers – including OpenAI, Nvidia, Apple, and Anthropic – all use YouTube video transcriptions or clips for training. It’s the dirty secret in Silicon Valley that’s not so secret: everybody is likely doing it. What’s actually rare is saying it out loud.
Source : Techcrunch