In a paper recently accepted for the 2021 International Conference on Acoustics, Speech and Signal Processing (ICASSP, which is the IEEE’s premier conference on signal processing), researchers from Dolby Laboratories and the Music Technology Group at University Pompeu Fabra in Barcelona, Spain demonstrated a machine learning (ML) model for automated multitrack audio mixing that could also emulate audio effects such as compression and equalization.
Their first results highlight their model’s performance on the “SignalTrain LA2A Dataset,” a large (21 GB) corpus of audio recordings developed and recorded in Belmont’s Janet Ayers Academic Center acoustics lab by Physics/AET double-major Benjamin Colburn in 2019. Colburn, now a graduate student at the University of Florida, recorded the audio dataset of compressor effects in conjunction with Dr. Scott Hawley’s development of the “SignalTrain” machine learning model for learning to emulate audio effects.
Referring to Hawley’s SignalTrain model as “the current state of the art,” Dolby researchers Christian Steinmetz, Jordi Pons, Santiago Pascual and Juan Serra go on to describe their improvements which bring the exciting goal of real-time ML-based audio within reach.
For the Dolby group’s new paper and demonstrations, see here (their first example includes a recording of one of Hawley’s songs mixed by AET instructor Justin Dowse). For a survey and interactive graphical demo of Colburn and Hawley’s SignalTrain work, see here.