The Ultimate Lossless Compression Format with Hybrid Mode and Open Source: WavPack

Way the hack didn’t I trip over WavPack (wv) earlier — it’s been around for some time now and astonishingly ultimate. To make it short and obvious (see hydrogenaudio.org for complete list):

Pros

  • Open sourc
  • Good efficiency (fwd is even better than mp3′s on foobar2000)
  • Hybrid/lossy mode (see below)
  • Tagging support (ID3v1, APEv2 tags)
  • Replay Gain compatible (which is no deal with fb2k anyway, but still)

Cons

  • Limited hardware player support
  • does take long to encode with optimal settings (not really because one time procedure)

Other features

  • Supports embedded CUE sheets
  • Includes MD5 hashes for quick integrity checking
  • Can encode in both symmetrical and assymmetrical modes
  • Supports multichannel audio and high resolutions
  • Fits the Matroska container
  • streaming support
  • Error robustness

So it’s open source! Most distros even come along with WavPack preinstalled. The only other lossless formats that I know are open sourced are two: FLAC which has bad tagging support and Shorten (to put it short: out dated). MAC (Monkey Audio Codec, ape) has an open sourced version but it’s not developed any longer.

The next great feature is hybrid mode which means you decode to lossy small file and an additional file containing “the rest” of the information. Only one other format is capable of this: OptimFROG (ofr) in DualMode. That means, putting both files together you get your 100% original back. The lossy file can be used entirely on it’s own. While encoding there is a second file, called correction file, that stores the difference between lossy and original — compressed that is. So what that means is you don’t have to convert your files each time you’re to shove them onto your portable. The only bad thing is you need to ensure the device can decode (read: play) them.

WavPack Properties after encodeTo give an example: If you convert a 27.5 MB ofr file to wv, hybrid enabled with lossy bitrate set to 192, you’ll get one 6.31 MB sized .wv file and a second 22.8 MB sized .wvc file. It took 12 min. 15 sec (playtime 4:25). Insane settings where: Compression Mode “high”, Processing Mode “6″ (best encoding quality), Hybrid Lossy Mode “192kbps”. When you open the .wv file only in fb2k it’ll handle correction files automatically (see picture). However moving the file with foobar2000′s dialog “” will only move .wv files. I cannot speak for other software players but this way it’s just easy to handle two qualities of one file — one hifi one and on “to go”! After I now have converted an entire album here are the approximate file sizes comparing MAC, OptimFROG and WavPack and MP3/OggVorbis as lossy counterpart to “portable wv” (to be added to .ape/.ofr):

.ape 311 MB
.ofr 306 MB
.wv 70 MB
.wvc 255 MB
.wv+.wvc 325 MB
.mp3 (V2, ~190kbps) 74 MB
.ogg (q5, ~160kbps) 61 MB

Tagging: Unlike FLAC it uses APEv2 (or ID3v1) so tags can be used with most players, software and portable devices’ ones, without intervention.

While I ran encoding test’s with foobar2000 (which has decoding WavPack “build-in” by the way) I noticed when converting from, say, OptimFROG to WavPack fb2k went right at it. No temporary wav files as with OptimFROG to MAC, for example! But mind you it does take a long time if you use optimization for file size and quality. It seams to be somewhere around 0.7x (slightly slower than plain play time). I don’t see why this really is an issue because in most cases you’ll only encode once as it’s true for all lossless formats anyway.

Resources:

How music similarity measures could be used

Lately I’ve been thinking about the possibilities of songbird (see screencast below), collecting music distributed under common licence or the like from blogs. Wouldn’t it be great to precisely search for music that way? You could say something like “search for music that sounds like what I’m listening to right now”. That means, the method I might be working on should be efficient enough to calculate relevant vectors for different pieces of music fast. Otherwise there would have to be some central service doing some fingerprinting again for all tracks around. But, as said before, that’s not what I’m after. Rather something to get a mutual measure between two tracks, pairwise.

As you can see in the screencast songbird can already, by using diverse search services, find related music. But, as far as I can judge right now, that’s user- or expert-based. That means, the music has to be known well to a certain point to have the data what a newly found song is like. And also it’s text based.

One Step further towards Diplom Thesis

While doing yet again a search for companies or institutes, i.e. a attendant, possibly related to what I’m looking for (automated music similarity analysis) I got one big step forward finding projects at the Fraunhofer IDMT (Institute Digital Media Technologies) that sound really interesting. What I’m interested in is doing some sort of wave form analysis and find different characteristics, different descriptive measures that make two music pieces “sound similar” independent of genre, same artist or whatever and those that make two other pieces different. Most interesting would be to derive them from how we humans would decide it which, of course, is not always deterministic, i.e. fuzzy. The long term dream would be to have an automate find the right music for a given emotional atmosphere, e.g. candle lite dinner, BBQ, reception event…

  • SoundsLike — Sounds like it comes close to what I’m interested in; derived from AudioID fingerprinting mechanism.
  • GenreID – more the category based approach similar to acoustic fingerprinting. Still interesting, though.
  • Query by Humming — Finding music tracks by humming the characteristic melody. But what exactly is characteristic to a certain track?
  • Semantic HiFi — interesting project; combines multiple tools to have the stereo generate personalized playlists on demand via call by voice, interacts with other media devices. Reads very promising. The project itself went from 2003-2006. And what’s really interesting is a cooperation with, among others, the University of Pompeu Fabra, Barcelona, Spain.
    I also could imagine automated adjustment of volume level by sound level in the room if actually it’s wise and no feedback effekt takes place, e.g. at a cockail party: conversation louder -> music louder -> conversation louder…
  • Retrieval and Recommendation Tools — just the attempt I’m looking for.

I also stumbled upon news that the mp3 format has been enhanced to encode surround sound information while only enlarging file size by about 10% (see also mpegsurround.com). And secondly an attempt to use P2P to legally ship music content and to utilize it to encourage people to by the better quality version called Freebies.

Basics of MusicIP’s MusicDNS and MusicAnalysis

Deriving from musicbrainz the system MusicIP created finding similar music works, in short, in three steps:

  1. analyse the music audio signal (up to 10 min of a track) locally by MusicIP Mixer generating an id called PUID (closed source!)
  2. PUID is sent to MusicDNS, a web-service by MusicIP (closed source, too!) which does fuzzy matching
  3. Some magic happens that the Mixer calculates a playlist by. It would not be sufficient for the DNS (Music Digital Naming Service, don’t mistaken it with Domain Name System) server to just return a list of PUIDs since the server (hopefully!) doesn’t know about all other tracks I have in my library, i.e. that potentially could be used to generate playlists with.

PUIDs

PUID is a 128-bit Portable Unique IDentifier that represents the analysis result from MusicIP Mixer and therefore is not a music piece finger print identifying a song in some particular version. PUIDs are just the ids used in the proprietary fingerprinting system operated by MusicIP. They provide a lightweight PUID generator called genpuid that does 1. and 2. PUIDs can be used to map track information such as artist, title, etc. to a finger print. The id itself has no acoustic information.

Acoustic Fingerprinting

Refering, again, to musicbrainz’s wiki acoustic fingerprinting here is a different process using only 2 minutes of a track. This fingerprint than is send to a MusicDNS server which in turn matches it against stored fingerprints. If a close enough match is made a PUID is returned which unambiguously identifies the matching fingerprint (Also see a list of fingerprinting systems. There is also an scientific review of algorithms). This is necessary since source to generate PUIDs or submit new ones is closed source.

On the other hand wikipedia defines acoustic fingerprinting as follows:

An acoustic fingerprint is a unique code generated from an audio waveform. Depending upon the particular algorithm, acoustic fingerprints can be used to automatically categorize or identify an audio sample.

This definition is even quoted by MusicIP’s Open Fingerprint™ Architecture Whitepaper (page 3).

MusicDNS

The web-service mainly is to match a PUID to a given acoustic fingerprint and look up track metadata such as artist, title, album, year, etc. (aka tags) as done by the fingerprinting client library libofa which has been developed by Predixis Corporation (now MusicIP) during 2000-2005. Only the query code is public via the MusicDNS SDK; music analysis and PUID submitting routines are closed source!

Getting the Playlist

Up to now I couldn’t figure out or find sources how this is actually done by Music Mixer. I’ll keep you posted as I find out.

Other sources / Directions

Follow

Get every new post delivered to your Inbox.