A Dataset of Music Production Metadata

The Semantic Audio Feature Extraction Dataset (SAFE-DB) is a continually updating database of semantically annotated music production metadata, taken from an international user group of sound engineers. The data is taken from 4 audio effects: a dynamic range compressor, an overdrive distortion, a parametric equaliser and an algorithmic reverb. Each entry into the dataset contains the following items:

  • Descriptor: A string representing a user’s description of the audio transformation (e.g. warm, bright, fluffly).
  • Audio Features: An array of audio features extracted from the audio file, taken before and after processing is applied to the signal. Feature list: Mean, Variance, StdDev, RMS_Amplitude, Zero_Crossing_Rate, Spectral_Centroid, Spectral_Variance, Spectral_Standard_Deviation, Spectral_Skewness, Spectral_Kurtosis, Irregularity_J, Irregularity_K, F0 (autocorrelation), Smoothness, Spectral_Roll_Off, Spectral_Flatness, Spectral_Crest, Spectral_Slope, Peak_Spectral_Centroid, Peak_Spectral_Variance, Peak_Spectral_Standard_Deviation, Peak_Spectral_Skewness, Peak_Spectral_Kurtosis, Peak_Irregularity_J, Peak_Irregularity_K, Peak_Tristimulus (x3), Inharmonicity, Harmonic_Spectral_Centroid, Harmonic_Spectral_Variance, Harmonic_Spectral_Standard_Deviation, Harmonic_Spectral_Skewness, Harmonic_Spectral_Kurtosis, Harmonic_Irregularity_J, Harmonic_Irregularity_K, Harmonic_Tristimulus (x3) Noisiness, Parity_Ratio, Bark_Coefficients (x25), MFCCs (x13)
  • Audio Effect Parameters: A variable sized array of plug-in parameters used to apply processing to the signal.
  • User Data: a series of strings representing the user’s age, location, language and experience, along with the genre of the music and the instrument being processed. These fields can be left blank.

Note: To limit the file-size of the download, feature-sets are averaged. For data from a specific date, or for multi-channel time series features please get in touch


Last Updated: Monday 11th July 2016 


R. Stables, S. Enderby, B. De Man, G. Fazekas, and J. D. Reiss, “SAFE: A system for the extraction and retrieval of semantic audio descriptors”, The International Society for Music Information Retrieval (ISMIR), 2014. [pdf | bib]

Data Visualisation

Using multidimensional scaling, we can map the descriptive terms from the dataset into a more intuitive space. The interactive visualisation below shows the feature space of the SAFE-DB in 3-dimensions.