Creating engaging videos isn’t only about the visuals. So much of the appeal of good video content is about the audio, but finding (or maybe even creating) the right audio effects can be a time-consuming process. At its annual Max conference, Adobe is showing off Project Super Sonic, an experimental prototype demo that shows how you could one day use text-to-audio, object recognition, and even your own voice to quickly generate background audio and audio effects for your video projects.
Being able to generate audio effects from a text prompt is fun, but given that ElevenLabs and others already offer this commercially, it may not be quite as groundbreaking.
What’s more interesting here is that Adobe is taking all of this a step further by adding two additional modes to create these soundtracks. The first is by using its object recognition models to let you click on any part of a video frame, create a prompt for you and then generate that sound. That’s a smart way to combine multiple models into a single workflow.
The actual wow moment, however, comes with the third mode, which lets you record yourself imitating the sounds you are looking for (timed to the video), and then having Project Super Sonic generate the appropriate audio automatically.
Justin Salamon, the head of Sound Design AI at Adobe, told me that the team started with the text-to-audio model — and he noted that like all Adobe generative AI projects, the team only used licensed data.
“What we really wanted is to give our users control over the process. We want this to be a tool for creators, for sound designers, for everyone who wants to elevate their video with sound. And so we wanted to go beyond the initial workflow of text to sound and that’s why we worked on like the vocal control that really gives you that precise control over energy and timing, that really turns it into an expressive tool,” Salamon explained.
For the vocal control, the tool actually analyzes the different characteristics of the voice and the spectrum of the sound you are making and uses that to guide the generation process. Salamon noted that while the demo uses voice, users could also clap their hands or play an instrument, too.
It’s worth noting that Adobe Max always features a number of what it calls ‘sneaks.’ These, like Project Super Sonic, are meant to be showcases of some of the experimental features the company is working on right now. While many of these projects do find their way into Adobe’s Creative Suite, there’s no guarantee that they will. And while Project Super Sonic would surely be a useful addition to something like Adobe Premiere, there’s also a chance that we will never see it again.
One reason I believe this project will make it into production is that the same group also worked on the audio portion of Generative Extend, a feature of its Firefly generative AI model that extends short video clips by a few seconds — including their audio track. As of now, though, Project Super Sonic remains a demo.