Some usefull resources for help coding in C

How music similarity measures could be used

Lately I’ve been thinking about the possibilities of songbird (see screencast below), collecting music distributed under common licence or the like from blogs. Wouldn’t it be great to precisely search for music that way? You could say something like “search for music that sounds like what I’m listening to right now”. That means, the method I might be working on should be efficient enough to calculate relevant vectors for different pieces of music fast. Otherwise there would have to be some central service doing some fingerprinting again for all tracks around. But, as said before, that’s not what I’m after. Rather something to get a mutual measure between two tracks, pairwise.

As you can see in the screencast songbird can already, by using diverse search services, find related music. But, as far as I can judge right now, that’s user- or expert-based. That means, the music has to be known well to a certain point to have the data what a newly found song is like. And also it’s text based.

First Impression on Musicovery

Even though I’m into, if any (of those), I do understand if some Panora maniac cheets on them the alternative has to be worth a glimpse. So, it was that also impressed me … at first. And I’ll admit it was mostly because of the blinky-blinky. But it’s more to it than just effects attention. From a HMI conceptional point of view Musicovery have really made an effort. It is easy to start listening to what you want — without any or, if you really want to, very little reading. In one word: I’d call it intuitive.

You are presented those, and only those, selections you need to do and combine (what you can) to make your choice distinct enough to gather correct songs. Also the other direction of “communication”, machine to human, has some promissing approaches like the “neighbourhood map” and colours for genres. One can even drag (move) that map around. The playlist is shown as path through the graph of audio tracks.

But then, of course, the hacker in me came to surface and I had to test that stuff. After a few clicks I was presented with Shakira’s “Objection” after hitting “dark” mood. Sure, no accounting for taste, but I wouldn’t call “Objection” a dark mood song. And also there was Black Eyed Peas’ “Shut up” to come… I don’t know about you; I couldn’t keep my feed still while listening and there where absolutely no “I hate the world” and “Where is my gun to get a rampage going” (just being sarcastic here). While the “energetic” direction has worked fine for a while dark more and more seams to be a bad label.

To conclude nevertheless sounds very promising. I’d really like to know the “music selection techniques” behind it, though, since the more I listen to the tracks that are picked for a selected mood don’t satisfy me just like the other lot.

Edit: I just caught myself letting imaginary drift away: Wouldn’t it be possible to have, in a few years time, some HMI stuff so one brachiates though a play list just like the one displayed at Musicovery but as some sort of hologram or only imaginary (not directly visible) but more like that Wii stuff? So if one wants to ffw to a track on the playlist (displayed in some sort of 3D neighbourhood map/grid as a ball, e.g.) you grab it and drag it to the middle of the cube or punch it to play it, pet it to let information been displayed about it, …

One Step further towards Diplom Thesis

While doing yet again a search for companies or institutes, i.e. a attendant, possibly related to what I’m looking for (automated music similarity analysis) I got one big step forward finding projects at the Fraunhofer IDMT (Institute Digital Media Technologies) that sound really interesting. What I’m interested in is doing some sort of wave form analysis and find different characteristics, different descriptive measures that make two music pieces “sound similar” independent of genre, same artist or whatever and those that make two other pieces different. Most interesting would be to derive them from how we humans would decide it which, of course, is not always deterministic, i.e. fuzzy. The long term dream would be to have an automate find the right music for a given emotional atmosphere, e.g. candle lite dinner, BBQ, reception event…

  • SoundsLike — Sounds like it comes close to what I’m interested in; derived from AudioID fingerprinting mechanism.
  • GenreID — more the category based approach similar to acoustic fingerprinting. Still interesting, though.
  • Query by Humming — Finding music tracks by humming the characteristic melody. But what exactly is characteristic to a certain track?
  • Semantic HiFi — interesting project; combines multiple tools to have the stereo generate personalized playlists on demand via call by voice, interacts with other media devices. Reads very promising. The project itself went from 2003-2006. And what’s really interesting is a cooperation with, among others, the University of Pompeu Fabra, Barcelona, Spain.
    I also could imagine automated adjustment of volume level by sound level in the room if actually it’s wise and no feedback effekt takes place, e.g. at a cockail party: conversation louder -> music louder -> conversation louder…
  • Retrieval and Recommendation Tools — just the attempt I’m looking for.

I also stumbled upon news that the mp3 format has been enhanced to encode surround sound information while only enlarging file size by about 10% (see also And secondly an attempt to use P2P to legally ship music content and to utilize it to encourage people to by the better quality version called Freebies.

Basics of MusicIP’s MusicDNS and MusicAnalysis

Deriving from musicbrainz the system MusicIP created finding similar music works, in short, in three steps:

  1. analyse the music audio signal (up to 10 min of a track) locally by MusicIP Mixer generating an id called PUID (closed source!)
  2. PUID is sent to MusicDNS, a web-service by MusicIP (closed source, too!) which does fuzzy matching
  3. Some magic happens that the Mixer calculates a playlist by. It would not be sufficient for the DNS (Music Digital Naming Service, don’t mistaken it with Domain Name System) server to just return a list of PUIDs since the server (hopefully!) doesn’t know about all other tracks I have in my library, i.e. that potentially could be used to generate playlists with.


PUID is a 128-bit Portable Unique IDentifier that represents the analysis result from MusicIP Mixer and therefore is not a music piece finger print identifying a song in some particular version. PUIDs are just the ids used in the proprietary fingerprinting system operated by MusicIP. They provide a lightweight PUID generator called genpuid that does 1. and 2. PUIDs can be used to map track information such as artist, title, etc. to a finger print. The id itself has no acoustic information.

Acoustic Fingerprinting

Refering, again, to musicbrainz’s wiki acoustic fingerprinting here is a different process using only 2 minutes of a track. This fingerprint than is send to a MusicDNS server which in turn matches it against stored fingerprints. If a close enough match is made a PUID is returned which unambiguously identifies the matching fingerprint (Also see a list of fingerprinting systems. There is also an scientific review of algorithms). This is necessary since source to generate PUIDs or submit new ones is closed source.

On the other hand wikipedia defines acoustic fingerprinting as follows:

An acoustic fingerprint is a unique code generated from an audio waveform. Depending upon the particular algorithm, acoustic fingerprints can be used to automatically categorize or identify an audio sample.

This definition is even quoted by MusicIP’s Open Fingerprint™ Architecture Whitepaper (page 3).


The web-service mainly is to match a PUID to a given acoustic fingerprint and look up track metadata such as artist, title, album, year, etc. (aka tags) as done by the fingerprinting client library libofa which has been developed by Predixis Corporation (now MusicIP) during 2000-2005. Only the query code is public via the MusicDNS SDK; music analysis and PUID submitting routines are closed source!

Getting the Playlist

Up to now I couldn’t figure out or find sources how this is actually done by Music Mixer. I’ll keep you posted as I find out.

Other sources / Directions

I Had a Dream

… a day dream that was. I was walking through my flat listening to music of my favourite kind. Well, tell me something new you might think. Here it comes: Each time I walked from one room to another the music speakers in the next room would be activated by the N95 (or any other Upnp aware device) playing the music. So the music would only be played in that sorounding I was in. Of course the music was not locally stored on the N95 but came via Upnp (or whatever) from my file storage via ether.

That’s not so unrealistic as I thought at first. I vaguely remember reading about a sound system that can address all present speakers individually via remote control even. That, of course must have been centrally controlled, though, and should have been proprietary, i.e. only working of all hardware comes from the same manufacturer. But how about if that worked via Upnp (or anything the like)? I guess the tricky bit would be the hand over of the signals transporting the sound information, i.e. manage gapless playback as if everything was wired and feed by broadcasting the sound to the speakers. Of course the speakers most likely would have to support some kind of wireless technique that the signal could be transmitted by.

Also, some mode that does address each speaker individually should be implemented, regulating the sound volume for each box (or at least each room) so the toilette is not blasted away… But I guess that would be rather easy and does not even need a modulated infrastructure as needed for the “sound hand over” scenario described above. What a cockaigne world it would be to have an abstract layer every manufacturer sticks to and supports to handle “cross-platform-interaction” like needed here. Well, I’m looking forward to building that cockaigne and living in it. How about you?

What would be even greater to have a stationary player, say in the living room, controlled via, eg. Upnp, but still be able to have the hand over working, i.e. have the system notice where the sound should be played in which volume. Another attempt could be some kind of tracker that knows where the person listening to the music is sojourning. I don’t think that would be the best approach since most likely it needs complex structures. Plus I can’t think of a way to keep that tracker “inter-platformly” scalable for many scenarios and systems.

Blogged with Flock


Music Analysis — On The Way to Diploma Thesis Topic

To step onwards in finding a subject for my diploma thesis I’ve googled a littel and found the following:

First of all I looked for what topics are being worked on at my uni to maybe narrow it that way. Our Institute for Digital Media seamed the best guess showing a seminar by Dr. Dieter Trüstedt called “Elektronische Musik in Theorie und Praxis” (electronic music in theorie and in practice”). Only after a while I noticed that it emphasis on, or I should say is making music, not analysing it. Nevertheless I was pointed to a book by Miller Puckette (Dept. of Music, University of California, San Diego) called “The Theory and Technique of Electronic Music” including some parts about wave analysis in generell, digital music, etc.

Issues I’m looking for are as described before, more precisely finding similar music as a starting point. I also found a few (not yet reviewe) papers:

  • Music Database Retrieval Based on Spectral Similarity by Cheng Yang
  • Pattern Discovery Techniques for Music Audio by Roger B. Dannenberg and Ning Hu
  • Toward Automatic Music Audio Summary Generation from Signal Analysis by Geoffroy Peeters, Amaury La Burthe and Xavier Rodet
  • Audio Retrieval by Rhythmic Similarity by Jonathan Foote, Matthew Cooper and Unjung Nam

Also, what came to my mind what to maybe take into account how humans (mammals) distinguish music (or complex sounds) and thus learn more about the brain, also.

Another thought that hit my mind concerning the use of such an analysis was to use it in, say meeting recording scenarios as some kind of search algorithm. Imagine you have some 3 hours of meeting recorded (possibly conference call) and need some certain part of but cannot find the time position by any means. Maybe by the analysis spread out above one can use a search just as we do nowadays with text: Speak the word or phrase one is looking for (with a different voice — your own) and find the position in the audio file.

Blogged with Flock