The list of datasets presented here pulls from the mir-datasets repository; if you’d like to add a dataset, create an issue / pull request there and the changes will propogate here.
The community owes a huge thanks to the hard work of Alexander Lerch in compiling and maintaining this list of datasets.
status | dataset | metadata | contents | with audio |
---|---|---|---|---|
☠ | 200DrumMachines | audio samples | 7371 one-shots | yes |
✅ | ACM_MIRUM | tempo | 1410 excerpts (60s) | yes |
✅ | AcousticBrainz-Genre | 15-31 genres with 265-745 subgenres | audio features for about 2000000 songs | no |
☠ | ADC2004 | predominant pitch | 20 excerpts | yes |
✅ | Acoustic Event Dataset | 28 event classes | 5223 audio snippets | yes |
✅ | AIST Dance Video Database | street dance videos | 13,940 videos for 60 pieces | yes |
✅ | Amg1608 | valence & arousal | 1608 excerpts (30s) | no |
✅ | AMT-pilot | structure by multiple annotators | 8 songs | yes |
✅ | Automatic Practice Logging | piano practice | 620 segments | yes |
☠ | artist20 | 20 artists | 1413 songs | no |
✅ | ASAP | aligned MIDI/audio performances and MIDI/XML scores, beats, downbeats, time signatures, key signatures | 1068 MIDI performances, 520 audio performances, 222 scores | yes (see MAESTRO) |
✅ | AudioSet | 632 event classes | 2084320 clips (10s) | no |
✅ | bach10 | aligned multitrack MIDI | 10 chorales | yes |
✅ | ballroom | 8 genres, tempo, beats, bars / downbeats | 698 excerpts (30s) | yes |
✅ | beatboxset1 | percussion annotation | 14 clips | yes |
✅ | BPS-FH Beethoven Piano Sonata with Function Harmony | functional annotation | 32 sonatas | no |
✅ | C224a | 14 genres | 224 artists | no |
✅ | C3ka | 18 genres | 3000 artists | no |
✅ | C49ka-C111ka | genres | 48800/110588 artists | no |
✅ | CAL10k | tags | 10870 songs | no |
✅ | CAL500 | tags | 502 songs | yes |
✅ | CarnaticRhythm | sama, beats | 176 pieces | on request |
✅ | Chordify Annotator Subjectivity Dataset | chords by 4 annotators | 50 songs | no |
✅ | CBFdataset | 4 playing techniques (Chinese Bamboo Flute) | 10 performers | yes |
✅ | CCMixter | vocal track, background track | 50 mixes | yes |
✅ | Chopin22 | aligned MIDI | 44 recordings | yes |
✅ | Clotho | 5 descriptive captions | 4981 snippets | yes |
✅ | CMMSD | note/rest/transition, onsets, vibrato | 36 excerpts | no |
✅ | Coidach | 55 genres | 26420 songs | no |
✅ | corpusCOFLA | editorial, predominant melody | 1800 flamenco recordings | no |
☠ | covers80 | cover songs | 80 song pairs | yes |
✅ | Cross-Composer | 11 composers, piece, key, era, instrumentation | 1100 chromagrams and chord labels | no |
✅ | Cross-Era | composer, piece, key, era, instrumentation | 2000 chromagrams and chord labels | no |
✅ | Choral Singing Dataset | f0, MIDI | 48 recordings | yes |
✅ | Da-TACOS | cover songs | 25000 songs | no |
✅ | Dataset of synchronised Audio, LyrIcs and vocal notes | aligned notes and lyrics | 5358 songs | no |
✅ | DAMP | karaoke performances, aligned lyrics, pronunciation assessment | 34000 monophonic recordings | yes |
✅ | Dagstuhl ChoirSet | beats, time-aligned scores, F0 | 81 takes | yes |
✅ | DEAM - The MediaEval Database for Emotional Analysis of Music | valence & arousal | 1802 excerpts | yes |
✅ | DEAPDataset | valence & arousal, dominance, physiological data | 120 music video excerpts | no |
✅ | DESED | 10 audio event classes | pprox 20k 10s clips (unlabeled, weakly/strongly labeled) | yes |
✅ | DREANSS | onset times, percussion instruments | 18 excerpts | yes |
✅ | DrumPt | 4 playing techniques | app. 2000 annotations | yes (see ENST) |
✅ | DSD100 | multitrack recordings, stems for vocals, drums, bass and accompaniment | 100 songs | yes |
✅ | EMO-Soundscapes | arousal & valence | 1213 soundscape recordings | yes |
✅ | emoMusic | arousal & valence | 744 excerpts (45s) | yes |
✅ | Emotify | induced emotion | 400 excerpts | yes |
✅ | EMusic | arousal & valence | 100 excerpts (experimental music) | yes |
☠ | ENST-Drums | onset times, perc. instruments, playing technique | 318 segments | yes |
✅ | Erkomaishvili Dataset | sheet music, structure, F0, note onsets | 118 tracks | yes |
✅ | Extendedballroom | 9 genres, tempo | 4000 excerpts (30s) | downloadable |
✅ | ExtraSensory | 51 context labels | 300000 sensor recordings from 60 users | yes |
☠ | ffuhrmann | 11 predom. instr. | 6951 excerpts from 220 songs | yes/no |
✅ | Flamenco database | editorial, biographical, musicological information on flamenco, 1102 artists, 74 palos, 2860 albums | 13311 tracks | no |
✅ | FMA-full | 161 genres | 106574 songs | yes |
✅ | FMA-large | 161 genres | 106574 excerpts (30s) | yes |
✅ | FMA-medium | 16 genres | 25000 excerpts (30s) | yes |
✅ | FMA-small | 8 genres | 8000 excerpts (30s) | yes |
✅ | FSD-Kaggle2019 | 80 tags | 29000 clips | yes |
✅ | Fugue Analyses | fugue structure, patterns, cadences | 36 fugues (Bach & Shostakovich) | no |
✅ | GiantStepsKey | key | 604 files | no |
✅ | GiantStepsTempo | tempo | 664 files | no |
✅ | GiantStepsTempo:alternate | tempo | 664 files | no |
✅ | Greek Music Dataset | genre, valence, arousal | 1400 songs | downloadable |
☠ | Gracenote Music Identification 2014 | timestamp, country | 110M music ID matches | no |
✅ | GoodSounds | 12 instruments, pitch, sound quality | 8750 notes | yes |
✅ | GPT | 7 guitar playing techniques | 6580 clips | yes |
✅ | Groove MIDI Dataset | drum timing | 1150 MIDI recordings | rendered |
✅ | Guitar Solo Dataset | start/stop of guitar solos | 60 songs | no |
✅ | GTZAN | 10 genres, tempo labels, key labels (lerch), key labels (li), beat/downbeat, metrical levels | 1000 excerpts (30s) | yes |
✅ | GuitarSet | midi, pitch, beat, chords | 360 guitar excerpts (30s) with hexaphonic audio | yes |
✅ | Hainsworth | tempo | 245 excerpts (60s) | yes |
✅ | HarmonixSet | beats, downbeats, structure | 912 pop songs | no |
✅ | HHDS | multitrack, style, tempo | 18 songs | yes |
☠ | holzapfel:onset | onset times | 78 excerpts | yes |
✅ | homburg | 9 genres | 1889 excerpts (10s) | yes |
✅ | IADS | valence & arousal, dominance | 111 sound snippets | yes |
✅ | IDMT Multitrack | multitrack, style | 12 songs | yes |
✅ | IDMT-SMT-Audio-Effects | effects on bass and guitar notes | 55044 recordings | yes |
✅ | IDMT-SMT-Bass | bass performance styles | 4300 excerpts | yes |
✅ | IDMT-SMT-Bass-SINGLE-TRACK | style annotated bass lines | 17 bass lines (?) | yes |
✅ | IDMT-SMT-Drum | onset times, perc. instruments | 518 files | yes |
✅ | IDMT-SMT-Guitar | 9 guitar playing techniques | 4700+400 note events | yes |
✅ | iKala | singing voice tracks, background tracks | 252 excerpts (30s) | yes |
✅ | INRIA:EuroVision | structure | 124 songs | no |
✅ | INRIA:Quaero | structure | 159 songs | no |
✅ | IRMAS | 11 instruments | 2874 excerpts | yes |
☠ | ISMIR2004Genre | 6 genres | 729 excerpts (30s) | yes |
✅ | ISMIR2004Tempo | tempo | 465 excerpts (20s) | yes |
✅ | Jazz Audio-Aligned Harmony Dataset | structure, key, chords, beats | 113 songs | no |
✅ | Jamendo-VAD | voice activity | 61+16+16 songs | yes |
✅ | JGDB | multitrack, MIDI | random generated excerpts | yes |
✅ | JKU-ScoFo | audio, MIDI | 16 recordings | yes |
✅ | Josquin La Rue Secure Duo Dataset | symbolic scores | 77 duos (Josquin & La Rue) | no |
✅ | Jordan:Classical | structure | 15 pieces | yes |
✅ | Jordan:Jazz | structure | 15 pieces | yes |
☠ | LabROSA:APT | MIDI | 29 piano excerpts | yes |
☠ | LabROSA:MIDI | audio, MIDI | 4 songs | yes |
☠ | last.fm data set | listening habits | 992 users | no |
✅ | LFM-1b | listening habits | 120000 users | no |
✅ | Lyrical Influence Networks Dataset | lyrics-based artist and genre graphs | 42802 artists/214 genres | no |
☠ | Lakh MIDI Dataset | MIDI, tempo, key | 176581 MIDI files | no |
✅ | LMD - Latin | 10 genres | 3160 songs | no |
✅ | M-DJCUE | cue points | 134 tracks | no |
✅ | MAESTRO | audio aligned midi, velocity, sustain | 172 hours of piano | yes |
✅ | magnatagatune | similarity, tags | 25863 excerpts (30s) | yes |
☠ | MAPS | piano notes/chords/pieces, tempo/key | 238 pieces | yes |
✅ | MARD | album reviews | 66566 songs | no |
✅ | MARG-AMT | MIDI pitch, onset/offset times | 30 melodies | yes |
✅ | MAST | vocal performance assessment | 1018 performances | no |
✅ | MAST-Rhythm | rhythm performance assessment | 3721 performances | yes |
✅ | McGill Billboard | chords | 740 songs | no |
✅ | MDBDrums | onset times, perc. instrument, playing technique | 23 excerpts | yes |
✅ | Medley-solos-DB: a cross-collection dataset for musical instrument recognition | 8 instruments | 21572 excerpts | yes |
✅ | MedleyDB | multitrack, genre, melody f0, instrument activation | 122 songs | yes |
✅ | MER500 | emotion | 500 clips | yes |
✅ | MIR-1K | vocal tracks, background tracks | 1000 excerpts | yes |
☠ | mirex05Train | predominant pitch | 13 excerpts | yes |
✅ | mirex06Train | tempo, beats | 20 excerpts (30s) | yes |
✅ | Mid Level Perceptual Music Features | 7 perceptual features | 5000 audio files | yes |
✅ | Million Musical Tweets | listening behavior | 1086808 tweets | no |
✅ | Modal | onset times | 71 snippets | yes |
✅ | MOODetector:Bi-Modal | lyrics, valence & arousal | 133 excerpts | yes |
✅ | MOODetector:Multi-Modal | lyrics, MIDI, mood | 903 excerpts (30s) | yes |
☠ | moodswings | arousal & valence | 240 excerpts (30s) | no |
✅ | Mozart’s String Quartets | sonata from structure, cadences | 32 movements | no |
☠ | Million Song Dataset | metadata, proprietary features | 1000000 songs | no |
✅ | Multimodal Sheet Music Dataset | piano notes/chords/pieces, synthetic audio, aligned MIDI, aligned sheet music images, OMR | 497 pieces | no |
✅ | The Meertens Tune Collections | phrases, key, meter | 18000 melodies | partially |
✅ | A Multimodal Dataset of Musical Themes for MIR Research | sheet music, symbolic encodings, audio snippets, symbolic-audio alignments, composer, work, recording, and theme characteristics | 2067 Themes | yes |
✅ | MTG-Jamendo | tags (genre, instruments, mood) | 55000 tracks | yes |
✅ | MTG-Query by Humming | title, artist | 118 queries/481 songs | yes/no |
✅ | MUSDB18 | multitrack recordings, stems for vocals, drums, bass and accompaniment | 150 songs | yes |
✅ | MUSIC4ALL | tags, lyrics | 109,269 excerpts (30s) | on request |
✅ | musiclef2012 | tags | 1355 songs | no |
✅ | MusicMicro | music listening patterns | 136866 users | no |
✅ | MusicNet | pitch, onsets | 330 recordings | implicitly |
✅ | NES-MDB | multi-track MIDI, aligned audio | 5000 songs | on request |
☠ | Nine Inch Nails Multitracks | multitrack | 66 songs | yes |
✅ | NMED-H - Naturalistic Music EEG Dataset Hindi | EEG | 24 trials x 16 excerpts (4.5min) | no |
✅ | Naturalistic Music EEG Dataset – Rhythm Pilot | EEG | 20 trials x 10 excerpts (4.5min) | no |
✅ | Naturalistic Music EEG Dataset - Tempo | EEG | 30 trials x 16 excerpts (30sec) | no |
✅ | NSynth | instrument, pitch | 305979 single notes | yes |
☠ | NUS-48E | aligned phonemes | 48 pairs of sung and spoken | yes |
✅ | ODB | onset times | 19 excerpts | yes |
☠ | Onset_Leveau | onset times | 21 excerpts | yes |
✅ | Open Broadcast Media Audio from TV | 6 classes for music presence | 1647 excerpts (60s) | yes |
✅ | OpenMIC-2018 | 20 instruments | 20000 excerpts (10s) | yes |
✅ | Orchset | predominant pitch | 64 excerpts | yes |
✅ | Piano Gestures Dataset | video, intentions, audio | 210 clips | yes |
✅ | Phenicx-Anechoic | audio, aligned MIDI | 4 pieces | yes |
✅ | Phonation | pitch, vowel, phonation mode | 900 monophonic snippets | yes |
✅ | PlaylistDataset | playlists | 75262 songs/2840553 transitions | no |
✅ | QBT-Extended | taps | 3365 queries/51 songs | MIDI |
✅ | QMUL:Beatles | structure, key, chords, beats | 181 songs | no |
✅ | QMUL:King | structure, key, chords | 14 songs | no |
✅ | QMUL:MichaelJackson | structure | 38 songs | no |
✅ | QMUL:MixEvaluation | multitrack, mixes | 18 songs/180 mixes | yes |
✅ | QMUL:Queen | structure, key, chords | 51/31 songs | no |
✅ | QMUL:RSS | structure | 60 songs | no |
✅ | QMUL:Zweieck | structure, key, chords, beats | 18 songs | no |
☠ | QUASI | multitrack | 11 songs | yes |
✅ | RobbieWilliamsAnnotations | chords, keys, beats | 65 songs | no |
✅ | RockCorpus | chords, melody, bars | 200 songs | no |
✅ | RWC | lyrics, 10 genre, 50 instruments, chords, structure, aligned MIDI | 115 songs/50 classical/100 songs | yes |
✅ | SALAMI | structure | 1447 songs | no |
✅ | SAMBASET | recording date, escolas, beats | 392 | no |
✅ | Sargon | structure | 4 songs | yes |
✅ | Semantic Artist Similarity | artist biographies, similarity | 268+2336 artists | no |
✅ | Schenker Anayses | MusicXML, Schenker analysis | 41 pieces | no |
✅ | SCP - EEG-Recorded Responses to Short Chord Progressions | EEG | 108/648 trials x 12 stimuli (5s) | yes |
✅ | Sample detection dataset | start of samples | 80 songs, 80 samples | no |
✅ | SEILS | scores in different symbolic formats | 30 madrigals | no |
☠ | Seyerlehner:1517-Artists | 19 genres | 3180 songs | yes |
☠ | Seyerlehner:Annotated | 19 genres | 190 songs | yes |
☠ | Seyerlehner:Pop | tempo | 1105 songs | yes |
☠ | Seyerlehner:Unique | 14 genres | 3115 excerpts (30s) | yes |
✅ | SHS100K | cover songs | ca. 10,000 songs with 100,000 tracks | no |
✅ | SISEC2013 | multitrack, mix | 5 excerpts | yes |
✅ | SLAKH | MIDI, synthesized audio (tracks + mix) | 2100 mixes | yes |
✅ | SMC:MIREX | tempo, beats | 217 excerpts | yes |
✅ | SMD | audio, aligned MIDI | 50 recordings | yes |
✅ | SoundTracks | valence, energy, tension, mood | 360+110 excerpts | yes |
✅ | SPAM | structure | 50 songs | no |
✅ | Shazam Research Dataset Offsets | in-song query times | 188M queries over 20 songs | no |
✅ | Su-AMT | onset times, pitch | 10 excerpts | yes |
✅ | SUPRA-RW | piano roll performances | 478 performances | yes |
✅ | Schubert Winterreise Dataset (SWD) | lyrics, scores (image, symbolic, MIDI), audio, measures, chords, local keys, global keys, structure | 24 songs, 9 performances | yes |
✅ | Texture in String Quartets | texture | 11 movements | no |
✅ | Traditional Flute Dataset | audio, aligned MIDI | 30 excerpts | yes |
✅ | ThisIsMyJam | favorite songs, artists | 131k users | no |
✅ | TinySOL, an audio dataset of isolated musical notes | instrument, pitch, dynamics, string number (if applicable) | 2913 isolated notes | yes |
✅ | TONAS | pitch | 72 single-voiced excerpts | yes |
✅ | Track Popularity | popularity rating | 23385 songs | no |
✅ | Tunebot | title, artist | 10000 queries/? songs | yes/no |
✅ | UIOWA:MIS | single instrument notes | many | yes |
✅ | UMA-Piano | piano chords | 275040 recordings | yes |
✅ | UnmixDB | DJ mix parameters | 37 playlists | yes |
✅ | URBAN-SED | 9 event classes | 10000 recordings | yes |
✅ | UrbanSound8k | 10 event classes | 8732 slices | yes |
✅ | Multi-modal Music Performance | score-aligned video and audio | 44 recordings | yes |
☠ | uspop2002 | tags, genre, chords | 8752 songs | no |
✅ | Violin Gestures Dataset | EMG, playing techniques, audio | 960 recordings | yes |
✅ | VocalSet | 17 vocal techniques | 3560 recordings | yes |
✅ | YousicianUkulele | evaluated notes and chords | 500000 exercises by 1000 users | no |