MusicLM by Google offers access to a rich dataset comprising 5,521 10-second music clips. Each of these clips is meticulously labeled with an aspect list and a descriptive free-text caption provided by skilled musicians. The aspect list comprises a series of adjectives that succinctly encapsulate the auditory characteristics of the music, delving into details like genre, instrumental qualities, and vocal features.
Simultaneously, the free-text caption offers a more elaborate narrative, painting a vivid picture of the music’s essence, complete with insights into the instruments used and the emotional tone it conveys. MusicCaps originates from the renowned AudioSet dataset and is thoughtfully organized into both evaluation and training segments.
Importantly, this valuable resource is made available under the Creative Commons BY-SA 4.0 license. Each music clip comes equipped with a rich array of metadata, encompassing the YouTube ID (linking to the YouTube video housing the respective music segment), temporal coordinates within the video, labels sourced from the AudioSet dataset, the accompanying aspect list, descriptive caption, author ID (for categorizing samples by their creators), information on whether it belongs to a balanced subset, and its designation within the AudioSet eval split.
The primary aim of this dataset is to cater to music description tasks, facilitating research and exploration in the realm of music analysis and understanding.