Over fall break, Professor of Physics and Belmont Data Collaborative Senior Fellow Scott H. Hawley attended the San Francisco launch event of “unicorn” startup Stability AI which recently received a $1 billion evaluation. CEO Emad Mostaque’s plan to create “AI by the people, for the people” and “democratize AI” by removing barriers to accessibility to significant computing resources has been making waves around the world, including Belmont University.
Stability made an Open Source release of their text-to-image generative model “Stable Diffusion” in August to great fanfare, not only due to the model’s capabilities but because of the way it was made accessible to everyone for free. Machine Learning (ML) guru Andrew Ng said, “the open way that Stable Diffusion’s image generation model was released — allowing users to run it on their own machines, not just via API — has made it a landmark event for AI.” The landmark launch event was attended by such luminaries as Google co-founder Sergey Brin, former head of Tesla AI Andrej Karpathy, Peter Wang (CEO of Anaconda), and several VC firms such as Stability investors Coatue and Lightspeed.
Hawley’s collaboration began in May with the audio group within Stability known as Harmonai. In Harmonai, Hawley found kindred souls who are dedicated to providing musicians, producers and audio engineers powerful creative tools for making “pro” musical audio that is often overlooked by the mainstream ML research community: high sample rates, low noise and multi-channel output.
“The connection with Harmonai came at the same time that [Belmont Audio Engineering Technology professor] Joe Baldridge and I began working on a new way to help musicians get paid,” said Hawley. “It was a great fit between the needs Joe and I had (for computing resources) and Belmont’s opportunity to provide well-sourced training data for the machine learning systems that could help power a new generation of AI-powered royalty-tracking software.” In doing so, Hawley and Baldridge have been driving the conversation forward at Belmont about data usage rights and informed consent. “I teach a course called Deep Learning and AI Ethics (DLAIE),” Hawley continued, “so I have a great opportunity to lead by example in helping Belmont to develop and implement policies of best practices in data ethics.”
Stability’s CEO Emad Mostaque funded Hawley’s research travel throughout the UK this past summer, allowing Hawley to participate in the CogX AI festival to speak at a panel on “AI and Human Rights” at Queen’s University in Belfast. He also traveled with Belmont Honors Program’s two-week residency in Belfast, Ireland, as well as an extended stay with members of the Musical Acoustics research group at the University of Edinburgh. Along the way, Hawley spent time with researchers at DeepMind, Cambridge University and the Intelligent Sound Engineering group at Queen Mary University in London. At the end of the summer, Hawley was offered the role of Technical Fellow with Stability so that his contribution to Harmonai’s efforts could continue. “While I’m teaching I mostly just write support code, libraries such as aeiou (‘audio engineering input/output utilities’),” Hawley explained.
These efforts were recently showcased in a video by “ML-ops” powerhouse Weights and Biases (“WandB”), who interviewed Evans and Hawley to talk about Harmonai’s new open-source release, “Dance Diffusion”, and other efforts at Harmonai. Dance Diffusion (DD) is a generative audio model trained on large datasets donated by artists such as Jonathan Mann and archives such as Google’s MAESTRO dataset of piano tunes. DD represents a first step toward a new way of producing sophisticated generative musical audio pieces. The research for Dance Diffusion has relied heavily on WandB’s tracking and efficiency tools, the same ones that Hawley uses in his work with Belmont students.
“I’m a huge fan of WandB and have found it essential to work such as the journal paper that Belmont senior physics and AET double-major Grant Morgan had published in February in JASA Express Letters,” said Hawley. When WandB heard that Hawley uses their services in his courses, they sent “swag” to the whole DLAIE class, a few of whom are shown below:
Hawley has always been involved in Open Source. As he said, “scientific computing is all open source.” His pre-Belmont career of doing supercomputing simulations of black holes was all built with open-source systems. His background has served well in collaborating with High-Performance Computing researchers in Stability – several of whom also come from the world of physics. “Even apart from the HPC world which tends to be physics heavy, there’s a huge cross-over between data science and physics – such as Belmont’s own Director of the Data Science Program, Dr. Christina Davis, who comes from the field of computational astrophysics,” he said. “Even the main creator of the Stable Diffusion (image generation) algorithm, Robin Rombach, has his undergrad degree in physics, and the class of generative models that are domaining the research scene now are all inspired by physics! There’s never been a better time to major in physics at Belmont!”
This collaboration will benefit the Belmont community in the following ways:
- Hands-on AI Art and Design Workshop: On Nov. 28, Watkins of College of Art and Design will host a Stable Diffusion event, in which artist Katie “@KaliYuga” May, Stability AI Education Specialist, will conduct a hands-on workshop on incorporating text-to-image models into artistic and design workflows. Participants will work on their own creations during this workshop. Laptops are required. RSVP via BruinLink. Seats are limited. This event is co-sponsored by the Belmont Data Collaborative and the College of Sciences and Mathematics.
- Cloud computing company CoreWeave has begun providing GPU compute to Belmont students in the DLAIE course. Hawley explains, “This is thanks to Stability supported organization EleutherAI, who heard of my students’ plight and offered to donate some of their CoreWeave allocation for the rest of our semester! Thanks, CoreWeave and Eleuther!”
- Hawley hosts a regular online AI-audio seminar series attended by students, faculty, hobbyists, DJs and producers from all over the world, including Nashville: “Harmonai Hangouts” occurs Tuesdays at noon Central on the Harmonai Play Discord Server. See Harmonai.org for more information.
The benefits of the collaboration have not been one-sided: The 3D data visualization methods that Hawley developed for teaching his classes have proved invaluable for Harmonai’s research efforts into diffusion models, specifically the way the audio is encoded in multidimensional spaces known as “embeddings” and the reconstructed audio that results from them. Harmonai Director Zach Evans said, “We would know nothing about how good these reconstructions are, nor the beauty of the latent space, if it weren’t for [Hawley’s] visualization utilities.” He also claimed that the embedding clouds were his favorite part of all of his research.
“This is about great people getting a chance to work together with people who have admired each other for quite a while, and to share everything,” Hawley said. “Stability CEO Emad Mostaque wants to make more people happy by giving things away- ‘AI by the people for the people.’ It’s an amazing opportunity for collaboration.”