NeuralNote: Open-Source Audio-to-MIDI Plugin Brings Spotify's AI to DAWs

NeuralNote is a new open-source audio plugin that enables real-time Audio to MIDI transcription using deep learning, leveraging Spotify's basic-pitch technology to convert audio from any tonal instrument into editable MIDI within digital audio workstations.

NeuralNote represents an interesting addition to the growing ecosystem of AI-powered audio tools, bringing Spotify's basic-pitch technology directly into digital audio workstations through a plugin format. Developed by Damien Ronssin and Tibor Vass, this open-source project aims to make audio-to-MIDI transcription more accessible to musicians and producers.

At its core, NeuralNote is built upon Spotify's basic-pitch model, which has shown promising results in converting monophonic and polyphonic audio into MIDI data. The plugin implements this technology using RTNeural for the convolutional neural network (CNN) portion and ONNXRuntime for feature extraction, specifically handling the Constant-Q transform (CQT) calculation and harmonic stacking. As part of this project, the developers contributed back to RTNeural by adding 2D convolution support, demonstrating the collaborative nature of open-source development.

The technical implementation is noteworthy for its optimization approach. Rather than using the full basic-pitch model directly, the developers split the original CNN into four sequential models that can be run with RTNeural. This approach likely offers performance benefits while maintaining transcription accuracy. The features_model.onnx was generated by converting a Keras model containing only the CQT + Harmonic Stacking portion, while the CNN weights were extracted from the TensorFlow-JS model available in the basic-pitch-ts repository and converted to ONNX format.

From a practical standpoint, NeuralNote offers several features that make it appealing for music production:

Polyphonic Support: Unlike many earlier audio-to-MIDI solutions that could only handle single notes at a time, NeuralNote supports polyphonic transcription, meaning it can identify multiple simultaneous notes.
Pitch Bend Detection: The plugin can capture pitch bend information, which is crucial for expressive instruments like guitar, violin, or cello.
Real-time Interaction: Users can adjust parameters while listening to the transcription, allowing for immediate feedback and refinement.
Direct MIDI Editing: The plugin includes features for scaling and time quantizing transcribed MIDI directly within the interface, streamlining the workflow.
Cross-platform Support: Available for Windows, macOS (Universal Binary), and Linux, making it accessible to producers across different operating systems.

The installation process is straightforward, with installers available for Windows and macOS that allow users to select between Standalone, VST3, and AU (macOS only) formats. Linux users can install by copying the provided VST3 and Standalone binaries to appropriate locations. The macOS version is code-signed, while Windows users may need to take additional steps to use the plugin due to it not being signed.

Using NeuralNote is designed to be simple: users can either apply it to an audio track in their DAW and record in real-time, or drag and drop an audio file directly onto the plugin interface. The transcribed MIDI appears instantly in the piano roll section, where users can listen to the result and adjust settings as needed. Once satisfied, the MIDI can be exported via drag-and-drop to a MIDI track.

Despite its capabilities, NeuralNote has notable limitations. The most significant is that it cannot perform real-time transcription due to technical constraints in the underlying basic-pitch model. The Constant-Q transform requires audio chunks longer than one second to get accurate amplitude values for low-frequency bins, introducing unacceptable latency for real-time use. Additionally, the basic-pitch CNN adds approximately 120ms of latency, and the note event creation algorithm processes posteriorgrams backward (from future to past), making it non-causal.

For developers interested in the implementation details, the project can be built from source using git, cmake, and the appropriate compiler suite for the target operating system. The transcription engine code is contained in the Lib/Model directory, with model weights in Lib/ModelData/, making it possible to reuse just the transcription portion in other projects. The developers plan to further isolate this code into a standalone library in the future.

The project is published under the Apache-2.0 license, with dependencies including JUCE (JUCE Starter), RTNeural (BSD-3-Clause), ONNXRuntime (MIT), ort-builder (MIT), basic-pitch (Apache-2.0), basic-pitch-ts (Apache-2.0), and minimp3 (CC0-1.0).

NeuralNote joins a growing number of AI-powered music tools that leverage machine learning to automate traditionally difficult tasks in music production. While it may not replace skilled transcription work entirely, it offers a valuable starting point for producers looking to quickly convert audio ideas into MIDI for further editing and manipulation. The open-source nature of the project also allows for community contributions and potential improvements to the underlying technology.

For more information or to download NeuralNote, visit the official GitHub repository. Those interested in the underlying technology can explore Spotify's basic-pitch blog post and research paper for deeper technical insights.

#AI #Machine Learning #Open Source #Audio #MIDI

NeuralNote: Open-Source Audio-to-MIDI Plugin Brings Spotify's AI to DAWs

Comments