Finally adding contextual data

Posted by Anita on April 20th, 2008 — in audio features, contextual features

I implemented the first set of contextual data for my test set of music. I’m grabbing moods off of allmusic.com. These are descriptors like “spooky”, “lively”, or “epic”, which are usually listed per album. I’m incorporating these features into the PCA by treating each descriptor as a separate feature, giving each song with the given descriptor a “1″, and all others a “0″. These contextual data are being added into the model that already has audio-based data.

As expected, this stratifies the whole data set, and is an interesting way of separating out the albums… those albums with the same sets of mood tags appear in a stripe across the space, and their position in the stripe comes from the audio features. If I had to guess what the audio feature separation means here, I’d say it’s distributing music on a spectrum of edgier, more bursty sounds (on the left in this visual representation) to smoother, softer sounds (on the right). I am guessing this purely from inspection.

Here are some examples:

  • I see a stripe of this album: Backstreet Boys — “Black & Blue”. It moves from Shining Star(click to listen; you need Windows Media Player) on the edgy side, to How Did I Fall In Love With You (click to listen IF YOU DARE; you need Windows Media Player) on the smooth side.
  • I see a stripe of this album: The Beatles — “Sgt. Pepper’s Lonely Hearts Club Band”. It moves from “Getting Better” on the edgy side, to “Lovely Rita” in the middle, to “A Day In the Life” on the smooth side.
  • I see a stripe of this album: 10,000 Maniacs — “Our Time in Eden”. It moves from Candy Everyone Wants on the edgy side, to How You’ve Grown on the smooth side.

Also, Busta Rhymes’ smoothest piece in the test set, “Hot Fudge”, is way less smooth than the Beatles’ edgiest piece.

screenshot with moods

What to do with missing data?

Posted by Anita on April 15th, 2008 — in audio features, feedback

My newest problem is one that I knew I’d come across eventually: What do I do with songs that have missing data? This most recently came up when I was adapting Thomas Lidy’s rhythm feature code… It couldn’t open some of the MP3s in my test set, so I have no rhythm feature data for those songs. Lacking a better idea, I just gave them the mean values of all the other songs. But this doesn’t seem right… I can’t really give them any value. But if I don’t give them any values, the PCA can’t process these tracks anymore; I can’t put them in the space at all.

So, not sure what to do about these songs. Anyone have any suggestions?

Rhythm features

Posted by Anita on April 15th, 2008 — in audio features

I was able to add some rhythm features to the model. I’m using Thomas Lidy and Andreas Rauber’s Rhythm Histograms (RH), summarizing each histogram (one histogram per song) with four features: mean, median, standard deviation, and (at Thomas’s recommendation) kurtosis. Here’s how the map of the same music library looks now:

model with rhythm histogram data

You can compare with the last version, before RH features were used:

model with rhythm histogram data

Unfortunately the Rhythm Histogram code couldn’t open my classical music, so you have to disregard that data (to which I gave the mean value of the rest of the songs’ RH data).

So the main thing that happened here is that the rap music popped out. And perhaps a wee bit the jazz.

2D music browser, v0.0.1

Posted by Anita on April 13th, 2008 — in milestones

Now posting the equivalent of the last post’s video, except in 2D, built with Processing. Just click the image below to load the 1.5 MB QuickTime video (larger version here — 12 MB). This is the version I demo-ed for sponsors April 1st and 2nd.

2D screenshot

Songs/circles are still located based on a PCA of timbre, tempo, duration, number of distinct sections, and time signature stability. The circles’ colors are mapped straight from genre tag (see the legend at bottom-left).

Here’s a more detailed look at the layout of this small music library:

2D screenshot

A few things to note:

  • Some of the genre colorings are fairly well separated/clustered based on the small set of audio features I’m using. Note how classical music sits almost on its own. Jazz sits neatly nearby, between Classical and everything else. Rock is a nice mess that overlaps with Pop, Country, Rap, and Dance. These are relationships that make sense if you think about how loose or tight each genre label is.
  • In a few instances, there are pairs of songs that are actually the same track at different bit rates (I had duplicates-with-different-bit-rates in the test library). One example of this is the light-blue-and-white pair just south of Jazz. This is The Flaming Lips’ “The Observer”, at bit rates 256 and 192. (One of those has a bad genre tag as well — That’s why it’s white.) Looking at The Flaming Lips’ “Zaireeka” album (not pictured) was also very cool, since you can see how the complementary stereo tracks are either very similar or very different.
  • I listened to the leftmost Country song; it is Willie Nelson’s “Still is still moving to me”. According to this representation of the music library, this is the “country song most like rock”. You can certainly argue for or against that, but it’s still interesting to be able to start making characterizations like this.
  • This is a small library (~150 songs). As it gets bigger, and as the feature set grows (i.e. when I implement new features), hopefully these patterns will just become more meaningful… We’ll see.

3D music browser

Posted by Anita on March 27th, 2008 — in milestones

Just posting a short video of the unfinished 3D browser. I’ve decided to focus on a 2D interface built with Processing, but wanted to show anyone who’s interested how far I went with the 3D version. Just click the image to load the 10 MB QuickTime video.

Songs/spheres are located based on a PCA of timbre, tempo, duration, number of distinct sections, and time signature stability. The spheres’ colors are mapped straight from genre tag (e.g. yellow is “Classical”).

3D screenshot

Spatial music collections… previous work?

Posted by Anita on March 15th, 2008 — in feedback, papers, background

Does anyone know of some projects/papers on spatially-based organization of digital music collections? I’m trying to gather information for writing my thesis’s “Background” section.

Here’s what I have so far:

I am going to implement a mode in my interface that is completely free-form, allowing the user to assign tracks to locations however they want. Does anyone know of projects that allow for this kind of music library organization?

Mistakes

Posted by Anita on March 13th, 2008 — in mistakes

Just a couple more silly mistakes in the 3D world. A screenshot / demo-video of the current state of the intended 3D model is coming soon, after this silly stuff…

ascii sphere boxy in stars

Adjustment to schedule

Posted by Anita on March 13th, 2008 — in Uncategorized

I’m going to take an extra month or two to finish my thesis. I’m now aiming to turn in the final draft, signed by my readers, around late June or early July.

I decided to take more time because I realized that the pressure of my current schedule was leading me to make poor design decisions, and to lose focus of the big picture of my thesis work. Hopefully I can do a better job than that.

Re: Evaluation requirements for the thesis

Posted by Anita on March 6th, 2008 — in definitions, feedback

The responses I got to my previous post about evaluation requirements for my thesis pretty much boils down to this: I need to clarify what I hope my contribution will be before I can decide how to appropriately test it.

To make my goals more clear, here are the kinds of questions I’m trying to ask with this new interface:

  • Does this spatial interface give a better understanding of the overall scope of your music?
  • Is the music->visual mapping intuitive?
  • To introduce the notion of “really fuzzy searching”… Is a spatial representation more appropriate than text-based lists for music browsing?
  • Does the interface help you see/examine your listening patterns?
  • Does the interface help you see the relationship between multiple people’s music libraries?
  • Can you more easily find music of a certain style/type, or for a particular activity, than you can with a more traditional music browser?
  • Can looking at your music and someone else’s music at the same time in the space help you find recommendations for new music?
  • Does this interface change what things are important when it comes to looking at your library? Do you find that you are looking for, or thinking about, different sorts of things than you would with a traditional music browser?
  • Does the interface make you more aware of the context for your music-browsing decisions?

Should the interface be 2D or 3D?

Posted by Anita on March 5th, 2008 — in definitions, feedback, milestones, organization

I’m at a big decision point in my thesis. I have a very primitive music browser implemented in both 2D and 3D. I want to choose the number of dimensions (2 or 3) for my main project before I move too much farther in developing the interface. I just don’t have time to develop them both.

My biggest concern: I had been pushing for a 3D interface throughout the proposal process, but I’m worried that continuing with it will force me in my remaining time to focus much more on elements of 3D interfaces (e.g. how to orient the user, how to show the overall cloud shape despite obscuration) than on elements core to my own thesis motivations (e.g. how to organize music, how to find patterns in music listening).

I think a 2D interface is currently more easy to develop than a 3D interface, and that perhaps I should focus on only two dimensions and have a better chance of making an interface that demonstrates all the things I had hoped to show (outlined in my proposal).

In the end, my thesis is not about interfaces; it is about the organizational model itself. That organizational model is the use of audio and contextual data to organize a music collection in a fuzzy manner that I think is more appropriate for this type of data, in addition to providing others with a framework to add onto it, both in terms of input features and output interface. This approach is in opposition to what we see in most music browsers (well, and data browsers in general), which limit organization to non-configurable lists and, ultimately, text labels.

So, my thesis work becomes: (1) an implementation of this organizational model, (2) made publicly-available, along with (3) demonstration(s) of an interface built on top of the model. An analog to this manner of thinking is the Echo Nest’s recent announcement of their AudioAnalysis API. Last year, they made this tool (1) available to others (2) — it gave me numbers, and I built an interface on top of it (3). In this thesis, I am the one providing the numbers, and letting others build interfaces on top.

Even though the main contribution is the model, I will demonstrate one such interface with a 2D representation of a music collection that is user-configurable and dynamically updated through RSS feeds.

Here are the main questions:

  • Am I losing something integral to the project if I move down from three dimensions to two?
  • Is this line of thinking (that my contribution is more an organizational model than an interface) too dangerous?
  • Am I contributing enough?