Stability AI, the London-based startup behind the open-source image-generating AI model Stable Diffusion, has released Stable Audio, a tool capable of creating high-quality music for commercial use using a technique called ‘latent diffusion.’
Just one year after the AI startup began exploring generative audio with the soft release of its Dance Diffusion AI music tool, Stability AI is reportedly under pressure from investors to bring in over $100 million in capital into its revenue-generated products.
The company has raised over $125 million in funding, having recently raised an additional $25 million in June through a convertible note, as initially reported by Bloomberg. The deal was expected to increase the startup’s initial $1 billion valuation to $4 billion.
How It Works
Stable Audio was developed by Stability’s audio team, which formalized in April, drawing inspiration from Dance Diffusion, according to an exclusive TechCrunch report.
The new tool, according to Stability AI, is the “first product for music and sound effect generation,” capable of creating high-quality 44.1 kHz stereo sound for commercial use. By utilizing latent diffusion for its audio model, it is also able to generate instrumental music with more coherent and melodic results compared to some other generative AI models.
Notably, the generated music tracks are about 90 seconds long, providing a level of coherence that extends beyond shorter audio snippets typically generated by other AI tools.
Unlike Dance Diffusion, which generated short, random audio clips from a limited sampling of sounds, Stable Audio can generate longer audio – up to 90 seconds, where the user is also able to utilize a text prompt to have more control over that music track’s generation and desired length.
The secret behind Stable Audio’s output is ‘latent diffusion,’ a technique similar to that used in Stable Diffusion for generating images. The model gradually subtracts noise from a starting song, moving it closer to the provided text description. It was trained with the help of AudioSparx, a commercial music library, on a collection of songs, excluding vocal tracks, to avoid ethical and copyright issues.
Ed Newton-Rex, VP of audio for Stability AI told TechCrunch via email that the startup is currently building “foundational AI models” across various content genres – or “modalities.”
He elaborated on Stable Diffusion’s expansion to include languages, code, and music. “We believe the future of generative AI is multimodality,” he explained.
Stable Audio is currently offered through a web app, as it’s not open source. Under the Pro Tier membership, users are able to create 500 commercializable tracks at a maximum length of 90 seconds, per month, for $11.99.
Free users, on the other hand, are limited to 20 non-commercializable tracks at 20 seconds each per month.
According to TechCrunch, Stable Audio’s Terms of Service also indicate that Stability may use users’ prompts, songs, and data for various purposes, including developing future models and services.
Copyright Concerns
While generative AI tools like Stable Audio have the potential for commercial use, they raise copyright and ethical concerns.
While the U.S. Copyright Office (USCO) has yet to release its official position on AI-generated music; last month, a federal judge ruled that AI-generated music cannot be copyrighted.
In the case of Stephen Thaler v. Shira Perlmutter and The United States Copyright Office, Judge Beryl Howell emphasized that “human authorship is a bedrock requirement of copyright,” where the “public is the primary beneficiary of copyright law.” A spokesperson from the USCO agreed with Judge Howell’s decision but didn’t provide any further comment.
Earlier this month, the USCO refused to grant copyright protection to another AI-generated art project that involved the use of Midjourney, citing that it was “not the product of human authorship.”
Stability’s approach also fails to consider compensating artists directly for the use of their work in training the model. However, some artists had the option to remove their work from the training dataset. AudioSparx, Stability’s partner, offers revenue sharing for musicians on the platform, allowing them to profit from Stable Audio if they choose to participate in the training or contribute to future versions.
Stability AI recently faced financial troubles, including delayed payments to employees and payroll taxes, despite the company denying any such struggles. While the company aims to turn its fortunes around with Stable Audio and its other AI models, it certainly has some challenges ahead in becoming a viable contender in generative AI and music.
Editor’s note: This article was written by an nft now staff member in collaboration with OpenAI’s GPT-3.5.