The list of datasets presented here pulls from the mir-datasets repository; if you’d like to add a dataset, create an issue / pull request there and the changes will propogate here.

The community owes a huge thanks to the hard work of Alexander Lerch in compiling and maintaining this list of datasets.

statusdatasetmetadatacontentswith audio
200DrumMachinesaudio samples7371 one-shotsyes
ACM_MIRUMtempo1410 excerpts (60s)yes
AcousticBrainz-Genre15-31 genres with 265-745 subgenresaudio features for about 2000000 songsno
ADC2004predominant pitch20 excerptsyes
Acoustic Event Dataset28 event classes5223 audio snippetsyes
AIST Dance Video Databasestreet dance videos13,940 videos for 60 piecesyes
Amg1608valence & arousal1608 excerpts (30s)no
AMT-pilotstructure by multiple annotators8 songsyes
Automatic Practice Loggingpiano practice620 segmentsyes
artist2020 artists1413 songsno
ASAPaligned MIDI/audio performances and MIDI/XML scores, beats, downbeats, time signatures, key signatures1068 MIDI performances, 520 audio performances, 222 scoresyes (see MAESTRO)
AudioSet632 event classes2084320 clips (10s)no
bach10aligned multitrack MIDI10 choralesyes
ballroom8 genres, tempo, beats, bars / downbeats698 excerpts (30s)yes
beatboxset1percussion annotation14 clipsyes
BPS-FH Beethoven Piano Sonata with Function Harmonyfunctional annotation32 sonatasno
C224a14 genres224 artistsno
C3ka18 genres3000 artistsno
C49ka-C111kagenres48800/110588 artistsno
CAL10ktags10870 songsno
CAL500tags502 songsyes
CarnaticRhythmsama, beats176 pieceson request
Chordify Annotator Subjectivity Datasetchords by 4 annotators50 songsno
CBFdataset4 playing techniques (Chinese Bamboo Flute)10 performersyes
CCMixtervocal track, background track50 mixesyes
Chopin22aligned MIDI44 recordingsyes
Clotho5 descriptive captions4981 snippetsyes
CMMSDnote/rest/transition, onsets, vibrato36 excerptsno
Coidach55 genres26420 songsno
corpusCOFLAeditorial, predominant melody1800 flamenco recordingsno
covers80cover songs80 song pairsyes
Cross-Composer11 composers, piece, key, era, instrumentation1100 chromagrams and chord labelsno
Cross-Eracomposer, piece, key, era, instrumentation2000 chromagrams and chord labelsno
Choral Singing Datasetf0, MIDI48 recordingsyes
Da-TACOScover songs25000 songsno
Dataset of synchronised Audio, LyrIcs and vocal notesaligned notes and lyrics5358 songsno
DAMPkaraoke performances, aligned lyrics, pronunciation assessment34000 monophonic recordingsyes
Dagstuhl ChoirSetbeats, time-aligned scores, F081 takesyes
DEAM - The MediaEval Database for Emotional Analysis of Musicvalence & arousal1802 excerptsyes
DEAPDatasetvalence & arousal, dominance, physiological data120 music video excerptsno
DESED10 audio event classespprox 20k 10s clips (unlabeled, weakly/strongly labeled)yes
DREANSSonset times, percussion instruments18 excerptsyes
DrumPt4 playing techniquesapp. 2000 annotationsyes (see ENST)
DSD100multitrack recordings, stems for vocals, drums, bass and accompaniment100 songsyes
EMO-Soundscapesarousal & valence1213 soundscape recordingsyes
emoMusicarousal & valence744 excerpts (45s)yes
Emotifyinduced emotion400 excerptsyes
EMusicarousal & valence100 excerpts (experimental music)yes
ENST-Drumsonset times, perc. instruments, playing technique318 segmentsyes
Erkomaishvili Datasetsheet music, structure, F0, note onsets118 tracksyes
Extendedballroom9 genres, tempo4000 excerpts (30s)downloadable
ExtraSensory51 context labels300000 sensor recordings from 60 usersyes
ffuhrmann11 predom. instr.6951 excerpts from 220 songsyes/no
Flamenco databaseeditorial, biographical, musicological information on flamenco, 1102 artists, 74 palos, 2860 albums13311 tracksno
FMA-full161 genres106574 songsyes
FMA-large161 genres106574 excerpts (30s)yes
FMA-medium16 genres25000 excerpts (30s)yes
FMA-small8 genres8000 excerpts (30s)yes
FSD-Kaggle201980 tags29000 clipsyes
Fugue Analysesfugue structure, patterns, cadences36 fugues (Bach & Shostakovich)no
GiantStepsKeykey604 filesno
GiantStepsTempotempo664 filesno
GiantStepsTempo:alternatetempo664 filesno
Greek Music Datasetgenre, valence, arousal1400 songsdownloadable
Gracenote Music Identification 2014timestamp, country110M music ID matchesno
GoodSounds12 instruments, pitch, sound quality8750 notesyes
GPT7 guitar playing techniques6580 clipsyes
Groove MIDI Datasetdrum timing1150 MIDI recordingsrendered
Guitar Solo Datasetstart/stop of guitar solos60 songsno
GTZAN10 genres, tempo labels, key labels (lerch), key labels (li), beat/downbeat, metrical levels1000 excerpts (30s)yes
GuitarSetmidi, pitch, beat, chords360 guitar excerpts (30s) with hexaphonic audioyes
Hainsworthtempo245 excerpts (60s)yes
HarmonixSetbeats, downbeats, structure912 pop songsno
HHDSmultitrack, style, tempo18 songsyes
holzapfel:onsetonset times78 excerptsyes
homburg9 genres1889 excerpts (10s)yes
IADSvalence & arousal, dominance111 sound snippetsyes
IDMT Multitrackmultitrack, style12 songsyes
IDMT-SMT-Audio-Effectseffects on bass and guitar notes55044 recordingsyes
IDMT-SMT-Bassbass performance styles4300 excerptsyes
IDMT-SMT-Bass-SINGLE-TRACKstyle annotated bass lines17 bass lines (?)yes
IDMT-SMT-Drumonset times, perc. instruments518 filesyes
IDMT-SMT-Guitar9 guitar playing techniques4700+400 note eventsyes
iKalasinging voice tracks, background tracks252 excerpts (30s)yes
INRIA:EuroVisionstructure124 songsno
INRIA:Quaerostructure159 songsno
IRMAS11 instruments2874 excerptsyes
ISMIR2004Genre6 genres729 excerpts (30s)yes
ISMIR2004Tempotempo465 excerpts (20s)yes
Jazz Audio-Aligned Harmony Datasetstructure, key, chords, beats113 songsno
Jamendo-VADvoice activity61+16+16 songsyes
JGDBmultitrack, MIDIrandom generated excerptsyes
JKU-ScoFoaudio, MIDI16 recordingsyes
Josquin La Rue Secure Duo Datasetsymbolic scores77 duos (Josquin & La Rue)no
Jordan:Classicalstructure15 piecesyes
Jordan:Jazzstructure15 piecesyes
LabROSA:APTMIDI29 piano excerptsyes
LabROSA:MIDIaudio, MIDI4 songsyes data setlistening habits992 usersno
LFM-1blistening habits120000 usersno
Lyrical Influence Networks Datasetlyrics-based artist and genre graphs42802 artists/214 genresno
Lakh MIDI DatasetMIDI, tempo, key176581 MIDI filesno
LMD - Latin10 genres3160 songsno
M-DJCUEcue points134 tracksno
MAESTROaudio aligned midi, velocity, sustain172 hours of pianoyes
magnatagatunesimilarity, tags25863 excerpts (30s)yes
MAPSpiano notes/chords/pieces, tempo/key238 piecesyes
MARDalbum reviews66566 songsno
MARG-AMTMIDI pitch, onset/offset times30 melodiesyes
MASTvocal performance assessment1018 performancesno
MAST-Rhythmrhythm performance assessment3721 performancesyes
McGill Billboardchords740 songsno
MDBDrumsonset times, perc. instrument, playing technique23 excerptsyes
Medley-solos-DB: a cross-collection dataset for musical instrument recognition8 instruments21572 excerptsyes
MedleyDBmultitrack, genre, melody f0, instrument activation122 songsyes
MER500emotion500 clipsyes
MIR-1Kvocal tracks, background tracks1000 excerptsyes
mirex05Trainpredominant pitch13 excerptsyes
mirex06Traintempo, beats20 excerpts (30s)yes
Mid Level Perceptual Music Features7 perceptual features5000 audio filesyes
Million Musical Tweetslistening behavior1086808 tweetsno
Modalonset times71 snippetsyes
MOODetector:Bi-Modallyrics, valence & arousal133 excerptsyes
MOODetector:Multi-Modallyrics, MIDI, mood903 excerpts (30s)yes
moodswingsarousal & valence240 excerpts (30s)no
Mozart’s String Quartetssonata from structure, cadences32 movementsno
Million Song Datasetmetadata, proprietary features1000000 songsno
Multimodal Sheet Music Datasetpiano notes/chords/pieces, synthetic audio, aligned MIDI, aligned sheet music images, OMR497 piecesno
The Meertens Tune Collectionsphrases, key, meter18000 melodiespartially
A Multimodal Dataset of Musical Themes for MIR Researchsheet music, symbolic encodings, audio snippets, symbolic-audio alignments, composer, work, recording, and theme characteristics2067 Themesyes
MTG-Jamendotags (genre, instruments, mood)55000 tracksyes
MTG-Query by Hummingtitle, artist118 queries/481 songsyes/no
MUSDB18multitrack recordings, stems for vocals, drums, bass and accompaniment150 songsyes
MUSIC4ALLtags, lyrics109,269 excerpts (30s)on request
musiclef2012tags1355 songsno
MusicMicromusic listening patterns136866 usersno
MusicNetpitch, onsets330 recordingsimplicitly
NES-MDBmulti-track MIDI, aligned audio5000 songson request
Nine Inch Nails Multitracksmultitrack66 songsyes
NMED-H - Naturalistic Music EEG Dataset HindiEEG24 trials x 16 excerpts (4.5min)no
Naturalistic Music EEG Dataset – Rhythm PilotEEG20 trials x 10 excerpts (4.5min)no
Naturalistic Music EEG Dataset - TempoEEG30 trials x 16 excerpts (30sec)no
NSynthinstrument, pitch305979 single notesyes
NUS-48Ealigned phonemes48 pairs of sung and spokenyes
ODBonset times19 excerptsyes
Onset_Leveauonset times21 excerptsyes
Open Broadcast Media Audio from TV6 classes for music presence1647 excerpts (60s)yes
OpenMIC-201820 instruments20000 excerpts (10s)yes
Orchsetpredominant pitch64 excerptsyes
Piano Gestures Datasetvideo, intentions, audio210 clipsyes
Phenicx-Anechoicaudio, aligned MIDI4 piecesyes
Phonationpitch, vowel, phonation mode900 monophonic snippetsyes
PlaylistDatasetplaylists75262 songs/2840553 transitionsno
QBT-Extendedtaps3365 queries/51 songsMIDI
QMUL:Beatlesstructure, key, chords, beats181 songsno
QMUL:Kingstructure, key, chords14 songsno
QMUL:MichaelJacksonstructure38 songsno
QMUL:MixEvaluationmultitrack, mixes18 songs/180 mixesyes
QMUL:Queenstructure, key, chords51/31 songsno
QMUL:RSSstructure60 songsno
QMUL:Zweieckstructure, key, chords, beats18 songsno
QUASImultitrack11 songsyes
RobbieWilliamsAnnotationschords, keys, beats65 songsno
RockCorpuschords, melody, bars200 songsno
RWClyrics, 10 genre, 50 instruments, chords, structure, aligned MIDI115 songs/50 classical/100 songsyes
SALAMIstructure1447 songsno
SAMBASETrecording date, escolas, beats392no
Sargonstructure4 songsyes
Semantic Artist Similarityartist biographies, similarity268+2336 artistsno
Schenker AnaysesMusicXML, Schenker analysis41 piecesno
SCP - EEG-Recorded Responses to Short Chord ProgressionsEEG108/648 trials x 12 stimuli (5s)yes
Sample detection datasetstart of samples80 songs, 80 samplesno
SEILSscores in different symbolic formats30 madrigalsno
Seyerlehner:1517-Artists19 genres3180 songsyes
Seyerlehner:Annotated19 genres190 songsyes
Seyerlehner:Poptempo1105 songsyes
Seyerlehner:Unique14 genres3115 excerpts (30s)yes
SHS100Kcover songsca. 10,000 songs with 100,000 tracksno
SISEC2013multitrack, mix5 excerptsyes
SLAKHMIDI, synthesized audio (tracks + mix)2100 mixesyes
SMC:MIREXtempo, beats217 excerptsyes
SMDaudio, aligned MIDI50 recordingsyes
SoundTracksvalence, energy, tension, mood360+110 excerptsyes
SPAMstructure50 songsno
Shazam Research Dataset Offsetsin-song query times188M queries over 20 songsno
Su-AMTonset times, pitch10 excerptsyes
SUPRA-RWpiano roll performances478 performancesyes
Schubert Winterreise Dataset (SWD)lyrics, scores (image, symbolic, MIDI), audio, measures, chords, local keys, global keys, structure24 songs, 9 performancesyes
Texture in String Quartetstexture11 movementsno
Traditional Flute Datasetaudio, aligned MIDI30 excerptsyes
ThisIsMyJamfavorite songs, artists131k usersno
TinySOL, an audio dataset of isolated musical notesinstrument, pitch, dynamics, string number (if applicable)2913 isolated notesyes
TONASpitch72 single-voiced excerptsyes
Track Popularitypopularity rating23385 songsno
Tunebottitle, artist10000 queries/? songsyes/no
UIOWA:MISsingle instrument notesmanyyes
UMA-Pianopiano chords275040 recordingsyes
UnmixDBDJ mix parameters37 playlistsyes
URBAN-SED9 event classes10000 recordingsyes
UrbanSound8k10 event classes8732 slicesyes
Multi-modal Music Performancescore-aligned video and audio44 recordingsyes
uspop2002tags, genre, chords8752 songsno
Violin Gestures DatasetEMG, playing techniques, audio960 recordingsyes
VocalSet17 vocal techniques3560 recordingsyes
YousicianUkuleleevaluated notes and chords500000 exercises by 1000 usersno