Moodbar Generator

Generate moodbars used by various music players to navigate audio tracks. RGB color bands are derived by using the Fourier transform as provided by Python’s librosa.

Moodbars are visualizations of audio features that should mainly allow to easily navigate to a certain point in time, but also depict basic music characteristics. The underlying approach was proposed by Wood and O’Keefe in 2005.

There are already several implementations available, such as moodbar as one of the earliest, in Clementine and other music players, or pymoodbar as Python implementation trying to replicate Clementine.

Background

Usually, the music track is divided into 1000 samples first, for example 0,3 seconds each for a 5 minutes track, resulting in an output width of 1000 pixels. The “Bandwise Spectral Magnitude” approach then maps low, medium, and high frequencies to red, blue, and green colors, respectively. Originally, 3 × 8 “bark band” frequencies are chosen, with a split that can be generalized to:

R ≤ 920 Hz < G ≤ 3150 Hz < B

This is followed by a normalization per RGB channel. Such a per band scaling adapts to different styles and loudness levels, but makes the results not comparable in absolute terms. The normalization implementations in moodbar, Clementine, and pymoodbar are quite more elaborated when compared to the original paper’s min/max scaling, though.

The pymood script provides a very straight-forward implementation for further experiments that only depends on librosa for reading various audio formats and FFT analysis. The “native” frequency decomposition in the squared power spectrum is summed up into the three RGB channels by the bark band limits. The otherwise possibly overly complex normalization step consists of a simple linear mapping between the 5th percentiles.

Moodbar Comparison

Using a well-known sample audio track, the mood output of the following tools is to be compared:

moodbar
pymood
Clementine
pymoodbar

Moodbars can be directly interpreted as RGB via PPM images, that happen to have the same binary representation:

When injecting the above moods into Clementine’s moodbar cache, they are rendered as follows, respectively:

Clementine’s file cache bases on QNetworkDiskCache. For moodbars, the filename starts at offset 16, its length at offset 12, and the actual content can be found in the last 3000 bytes. For experimenting with custom data, the respective cache entry can be edited in-place simply by:

cd ~/.cache/Clementine/moodbarcache/data8/9/
head -c -3000 3mm1qz2y.d > 3mm1qz2y.d.head
cat 3mm1qz2y.d.head ~/new.mood > 3mm1qz2y.d

Usage & Installation

usage: pymood.py [-h] [--mood-out MOOD] [--image-out PPM] --audio-in MP3

Generate moodbars from audio files, using `librosa` for FFT analysis and a simple normalization function.

options:
  -h, --help       show this help message and exit
  --mood-out MOOD  output .mood file
  --image-out PPM  output .ppm file
  --audio-in MP3   input audio file

The script is self-contained with librosa as sole dependency.

python3 -m venv venv
. venv/bin/activate
pip install librosa
./pymood.py -h