Why Is Spotify Working On a Speech Recognition System?

Spotify, the world's largest music streaming service, has been awarded a patent for speech recognition technology to analyze a user's voice to infer gender, age, and environment. When taken with the company's other developments, it's clear that Spotify, having won our ears, is now after our voices, too.

But why might Spotify want to develop this kind of speech recognition, and what would it be used for? Let's dig into the patent and its implications.

Spotify's Speech Recognition Patent

In 2018, Spotify submitted a patent application titled, "Identification of taste attributes from an audio signal." After an almost three-year wait, the patent was granted in January 2021. As the name suggests, the filing details, in principle, a system that can take recorded audio from your environment, with or without speech, run it through a set of algorithms, and use the resulting analysis to play you music suited for your demographic and current environment.

The patent lists some examples of how the algorithm might categorize data, including gender, age, accent, emotional state, physical environment, and the number of people. However, the filing goes on to note that this is not an exhaustive list, just some examples of how the company might label recorded audio. In addition to this metadata, the patent suggests Spotify may also analyze your speech.

What Could Spotify Use Speech Recognition For?

Currently, there's no indication that Spotify has developed the proposed system outlined in the patent. However, it does align with some other projects the music streaming service has been working on. Not long after the patent was granted in early 2021, Spotify rolled out a voice-control feature. Using the "Hey, Spotify" wake word, you can control music playback within the app by voice commands alone.

As Spotify is a mobile app rather than a system-level voice assistant like Siri or Google Assistant, there are some limitations. For example, the app needs to be open, Spotify must have access to your microphone, and your smartphone's display needs to be unlocked and turned on. If the streaming service is hoping to build a more comprehensive system, it would need system-level access or its own hardware.

In 2019, Spotify trialed a vehicle-based hardware device known as Car Thing. In a Spotify Newsroom post at the time, the company said that the device would allow some Spotify Premium users in the US to listen to music and podcasts in their car using the voice-controlled Car Thing. It also noted that they were looking to perform similar tests known as Voice Thing and Home Thing.

However, not much was known about the tests or whether Spotify had plans to roll them out more widely. In January 2021, two days after the patent was awarded, Spotify filed new listings with the FCC for a redesigned Car Thing with Bluetooth functionality. Although there's no official confirmation of a release date, it seems the company was waiting for the audio analysis patent before pushing ahead with its hardware plans.

The Problem With Machine Learning

Although increasingly commonplace, artificial intelligence systems aren't quite as smart as they initially sound. Most utilize machine learning, where the system is given a set of training data to learn from. In this case, it may have been some audio recordings, categorized by gender and location. The AI starts to understand how to spot the differences it sees in the training data and sorts them accordingly.

However, this is where troubles sometimes arise. Everybody has a different voice, accent, and tone. In most cases, we can pick up the phone and determine whether we know the person on the other end, and if so, who it is. This is without any visual prompt either, demonstrating how unique each voice is. A set of training data will never be able to capture that level of detail and nuance.

Consequently, there will be times the AI makes assumptions so it can output a result. If the input voice is slightly lower, it might label it as a man's voice. Likewise, the reverse might be true, where higher-pitched tones are marked as women, for example.

Unfortunately, this isn't only a theoretical risk, as there have been many high-profile instances where machine learning algorithms have gone wrong.

Baca Juga

The Implications of Spotify's System

When pushed, most people would struggle to identify an unfamiliar accent accurately, and that's with a lifetime of experiences and memories from which to pull. The machine learning system will only know what was in the training data, leaving it to make even more assumptions. It's easy to see how this could lead to potentially problematic or even racist outcomes.

This isn't without precedence either. In 2015, Jacky Alciné, a software engineer, noticed that Google Photos identified his black friends as gorillas. After an online backlash, Google claimed to have taken care of this sensitive issue. However, WIRED reported in 2018 that Google hadn't fixed the underlying image categorization issue. Instead, the company had only blocked terms related to certain primates like gorilla, monkey, and chimpanzee from its classification system.

Spotify's proposed system has potential privacy concerns, too. To function in the way the company expects, the speech recognition feature would need to be continually monitoring what you're saying and the environment you're in. The always-on capability is a personal privacy issue but could also lead to invasive law enforcement or governmental surveillance.

Some are also wary of the emotion detection feature. As described, Spotify's algorithm would identify your emotional state and play mood-appropriate music once your audio has been analyzed. However, this is underpinned by the assumption that if you're in a particular headspace, you wish to remain there through music. It's also open to abuse by tech companies.

For instance, in 2012, Facebook performed a secret experiment by showing positive or negative content in more than half a million users' feeds to see how it affected their emotional state. For these reasons, Access Now, a human rights organization, sent an open letter to Spotify asking the company to abandon the system.

The Future of Personalized Music?

Spotify was one of the first company's to create a compelling music streaming service. The interface and vast catalog make it a favorite worldwide. The service also integrates nicely with most digital assistants and smart home equipment. Over the years, the company has made it easy for you to discover new music or enjoy your favorites with algorithmically generated playlists.

In theory, the always-on speech recognition should take this customization one step further, so the streaming service can passively take in your mood and environment to play you the best music at the right time. However, the technology's always-listening nature has far-reaching privacy implications that may outweigh any convenience offered by the platform.

source https://www.makeuseof.com/spotify-speech-recognition-system/