New version of the Echo Nest Track Analyzer

August 23, 2011

This week we will be pushing out version 3.08b of the Echo Nest Track Analyzer.  We’ve made all sorts of improvements in this new version.  

Here are some highlights:

  • Faster analysis - it will take 30 to 50 percent less time to upload and analyze a track with the new version.
  • Support for longer tracks - the new analyzer can analyze tracks up to about 50mb and 100 minutes in length.
  • Meta and Track JSON Changes - the new analyzer provides more detail in the “meta” and “track” sections of the JSON output. The “meta” section has more inclusive audio file metadata (as available) such as filename, artist, album, title, genre bitrate, sample_rate and seconds . The “track” section includes the identity of the audio decoder used as well as information about where the analysis starts and the amount of audio analyzed.  We are also including two fingerprint codes  (ENMFP and Echoprint) and a synchstring (see below for details).
  • Fingerprinting - Two types of audio fingerprints have been added to the output — ENMFP is based on analysis segment pitch vectors and is computed on top of the analysis stack. Echoprint is based on a new spectrogram computed independently of the analyzer at the bottom of the stack.  Both are presented in the JSON file in the form of a code string and a code version.
  • Synchstring - A new data string is introduced in the output of the analyzer. It allows clients to synchronize the analysis results to the sample level using a simple client-side algorithm regardless of the decoder used. We shall be writing more about how you can use this synchstring to synchronize the analysis output with your audio.
  • Improved data quality - we’ve improved algorithms making the overall results returned by the analysis better.  Specific improvements include:
    • Segments:  Onset detection and post-masking algorithms have improved.  The onset of the attack falls more often into the true minimal loudness location, rather than into close local minima.  The maximum loudness location detection has also improved in the same way.  The detected dynamic range has therefore occasionally increased.  The notion of silent segment (loudness_start, loudness_max and loudness_end equal -60 dB) has also been introduced. The silent segment allows for silent section detection, and occasionally, better segment duration estimation. As a result of segment changes, all other related values may have shifted, including but not limited to segment durations, loudness values, as well as pitch and timbral vectors. The number of segments detected may have shifted as well.
    • Tatums, Beats, and Downbeats:  Tatum and beat locations have improved in precision.  They now align more accurately to the perceived maximum loudness, rather than to the middle of an attack as was the case previously.  As a result of beat and segment changes, downbeat locations and key detection may vary as well.  Beat detection has improved as a result of fixing tatum locations, removing certain errors like out-of-phase beats. 
    • Loudness:  The equation defining overall perceived loudness for an entire track was redefined.  Overall perceived loudness for some tracks has slightly increased as a result.
    • Timbre:  Some bugs were fixed in the masking curve models, resulting in more accurate timbral vectors.

The new version of the analyzer is lovingly named 3.08b.  When we release the new version, all track/upload and track/analyze calls will start to use the new version.  You can tell when you have the new version when the “analyzer_version” field in the “meta” section is set to “3.08b”.