Taste Profiles get added to the Million Song Dataset

October 27, 2011

The Echo Nest is overjoyed to announce a major addition to the beloved Million Song Dataset for researchers and developers — a huge collection of real world anonymized listener data in the form of Echo Nest Taste Profiles!

We are contributing 1.2 million unique (and carefully anonymized) listeners’ taste profiles that overlap with 400,000 songs of the Million Song Dataset. Each taste profile has at least 10 unique MSD songs worth of activity. This is the largest music activity dataset made available to researchers and we are happy to take part in it. Scientists, developers and researchers can now understand the correlation between the deep musical context in the MSD (provided by Last.fm, musixmatch, SecondHandSongs and of course The Echo Nest) and actual user preference and activity. With all of this data, we can perhaps begin to better model and understand musical behavior and go far, far beyond “people who like Coldplay also like Beyonce.” We were inspired to contribute by the 360K Last.fm dataset and of course the amazing but controversial Yahoo Music KDD Cup release.

We treat listener privacy incredibly carefully at the Echo Nest (our CTO is a bit paranoid) and this release reflects that care: the data in our Taste Profile release include a shuffled hash of persistent session identifiers from a very small random selection of our musical universe and only play counts associated with Echo Nest song IDs that overlap with the MSD set. There is no connection to individuals. The date added field is the date of the anonymized ingestion and is not the date of the original activity. No usernames, listener details, original IDs, dates, IPs, locations or anything but [random user string, EN song ID, play count] are being released. 

Please go over to the MSD Taste Profile page for more details and also check out the Million Song Dataset paper, presented just yesterday at ISMIR 2011 in Miami. Thanks to Thierry and Tyler for all their help getting this data ready.

—Brian