Adelphoi Music

Smart speakers and the need for audio branding in 2021

Sonic branding in the time of smart speakers

2020 will be remembered for many years as a time of rapid, seismic changes. But aside from the obvious, one of the most significant will be that 2020 was the year that voice search became the dominant mode for internet searches, with ComScore reporting that more than half of all smartphone users engaged with voice search technology that year.

It also saw the proliferation into people’s homes, of smart speaker systems operated through voice assistants like Alexa, Siri, and Google Assistant. Last year saw 87.7 million or 42% of US adults using smart speakers, an increase of 10% from the previous year. Meanwhile, in the UK 1 in 5 homes currently has a smart speaker system installed.

With Apple, Google, and Amazon all announcing updates to their smart speaker systems in the last months of the year, signalling it as a priority area, it’s predicted that smart speakers will outnumber tablets by the end of 2021.

By 2024, smart speakers are forecasted to be operating in 640 million homes worldwide.

The impact of COVID-19

The lockdowns enforced due to COVID-19 have contributed to this ascendancy and impacted the way people have interacted with the technology. With more than half of people in the US and other richer countries remaining at home our engagement with digital media has shifted from out-of-home mobile use, inside the home.

In the US, 36% of adult smart speaker owners say they are using their device more to listen to music and entertainment, with 52% of those between 18–34 saying the same, while 35% of adults, with 50% of those aged 18–34, are consuming more news and information through their smart speakers.

iProspect reported in mid-2018 that already 62% of smartphone users in China, India, Indonesia, Japan and Singapore had used voice-activated technology in the last six months.

While the big US tech brands are facing competition from local manufacturers like Baidu, Alibaba, and Xiaomi (all of which sold more smart speakers than Apple in 2018 and 2019), due to a fairly slow integration of Asian languages, the use of smart speakers and voice activation in these countries has burgeoned in the last year due to young, tech-savvy populations, and government directives to reduce contact with phone-screens to help stop the spread of the virus.

China and South Korea have rolled out an automated calling system devised by the speech tech developer iFlytek, which collects information about the spread of Coronavirus.

Meanwhile in countries like the Philippines and India, which provide a large proportion of the world’s call centres, AI chat bots are being incorporated wholesale into this industry, while call centre workers remain at home.

Audio-first social media

Of course, the same voice assistant technologies are available through smartphones and tablets.

On the one hand it means that audio-first interaction with our technology isn’t limited to those with integrated smart speaker systems.

But perhaps most significantly, it may signal a change in the way people interact with their mobile devices, which for the last decade have been marketers’ primary medium for advertising and visual branding.

Polls have shown that up to 71% of people would prefer to interact with their tech through voice, rather than typing, and with developments like Amazon Polly, which turns text into lifelike speech through deep learning, there’s a real possibility that social media could make the turn to audio, with massive consequences for advertising.

We’ve already seen movements in this direction with recent surges in popularity for social apps like Discord, Clubhouse, and Houseparty. Twitter launched its first wave of live testing for Audio Spaces, its own audio-only, virtual chat rooms function.

With podcast ad revenues expected to surpass $1bn this year (up from $237m in 2017) and music streaming looking to reach $76.9bn by 2027, the integration of audio media into voice-led social media platforms would have as profound an impact on the way brands interact with consumers as the integration of social media into smart phones had on video advertising in the last decade.

What is audio branding?

Audio or Sonic branding is the audible counterpart to traditional visual branding. While branding began in the visual realm, as marks on cattle, letters and labels on boxes, and later logos and packaging design, audio branding follows the same underlying principles but applied to the world of sound.

The essential question of the field is, what is a brand’s sonic identity? When a brand interacts with its customers through sound, as they do more often than we tend to acknowledge, what do their sonic choices say about the brand’s character. How do they, or do they not, accurately represent the brand’s ethos and values.

Jamie Masters, resident sonic branding expert at Adelphoi Music, writes:

‘Audio branding establishes a ‘house sound’ for the brand: it situates the brand on an audio map, and defines that brand’s own sonic territory. It uses consistent sonic triggers — repeated chunks of audio, sound effects, characteristic voices, and musical motifs — to build up an implied brand personality that sits alongside, and fills out, the visual representation. Over time, and with sufficient repetition, these audio elements begin to be synonymous with the brand.’

Sound vs Vision

Despite the foundational similarity in purpose between sonic branding and visual branding, the differences often go under-appreciated; differences that are fundamental to the nature of sight and sound.

We develop our ability to hear sounds while in the womb and, as anybody who’s ever loved a piece of music or been inspired by a speech knows intuitively, our sense of hearing remains connected to the emotional, pre-rational brain throughout our lives.

Sight, on the other hand, begins after we’re born and is intrinsically linked to rationality and the sense-making process. We use the empirical data of vision to organise the world around us into distinct, nameable objects and categories.

According to Jamie Masters, ‘sound impacts on us more viscerally: it induces emotions; it alarms, and soothes, and elates; like smell, it is evocative, and recalls past experience. Sound completes vision, but is more self-sufficient than vision.’

It makes sense that traditional branding focused on the visual medium, sorting brands into unique, distinguishable entities and describing their distinctive values and characters. But as we continue to grow more aware of the role of emotion in buying habits it makes sense, too, that the role of sound will become increasingly important.

Audio branding isn’t just a jingle

A common misconception of audio branding is that its main offering is the jingle, a catchy sonic logo that was attached to the end of TV and radio ads through much of the 20th century, but over time came to be perceived as annoying, cheesy, and ultimately, unfashionable.

However, as consumer tastes and marketing trends have changed, the field of audio branding has matured with the times.

Taking careful account of music, sound design, and voice, its primary development has been a massive expansion into every touchpoint in which brands interact with their customers. The development of technology has taken branding and advertising beyond radio and television, to mobile devices accessible on the go, and into the home.

But sonic branding has also redefined the uses of sound in more traditional spaces, determining playlisting in shops, waiting rooms, and public transport, on-hold music for call centres, user experience sounds and sound design for apps, games, and virtual environments, and the tone and character of voice-over artists in PSAs and self-service checkouts.

All brands have a sonic identity

All these are areas in which sounds colour a customer’s experience of a brand, whether they’re strategically chosen or not.

A jarring music choice in a retail store, a repetitive and overly upbeat bit of hold music on a complaints hotline, or an embarrassingly loud and conspicuous error sound from a Chip and PIN, can be the deciding factor in whether that experience is positive or negative.

The growing popularity of audio-first technologies will lead to a proliferation of new touchpoints for brand-customer interaction. This means golden opportunities for brands willing to invest in a cohesive and comprehensive audio branding strategy, but it also means an increased risk of these sorts of sonic faux pas for those that don’t understand what sound says about them.

Smart spaces and the Internet-of-Things

COVID-19 has kept interactive voice activation technologies largely confined to smart speakers within the home in the last year, and has no doubt had a permanent impact on the way we consume media. But the lifting of lockdowns that we’re all eagerly anticipating, will take them out into the world in a plethora of new and unprecedented spaces.

In the home, Google has already engaged TV manufacturers to start installing microphones for far-field voice recognition.

In assisted living environments and hospitals, voice assistants are being used to help patients receive information about diagnoses, set reminders to take medication, and help with accessing the internet or operating televisions.

Show and Tell, a skill available in the Alexa store, can be used by the visually impaired to identify objects, dramatically increasing their ability to perform everyday tasks like grocery shopping.

The long-term effects of COVID-19 on offices and workplaces remains unclear, but voice activation technologies have been readily implemented in construction and manufacturing environments, and a 2018 report by Globant states that 73% of companies surveyed believed such technologies could be valuable to their enterprise, but hadn’t yet taken steps to implement them.

73% of drivers are predicted to have voice-assistants installed in their cars by 2022 and public transportation will no doubt follow suit.

Many of these advances have been enabled by efforts on the part of companies like Amazon to open out their platform to customisation. Companies are currently able to pay Amazon to create custom ‘Skills’ to suit their own needs, or are offered the Alexa Skills Kit, which allows developers to create their own.

When Alexa was launched in 2016, it had a catalog of 130 skills. Last month the number available in the US surpassed 80,000 (with over 100,000 globally), meaning that

Voice and brand identity

The most obviously significant area of consideration for a sonic brand when we think about smart systems and spaces is the voice, though it is also the least subtle.

Accents, tone, pacing, and delivery, as well as the inherent musicality of the words spoken, all contribute to an idea of the brand as a character, or person. Audio-first copy writers and VO artists can represent a brand as playful, authoritative, empathetic, or relatable.

Adelphoi’s Jamie Masters writes:

‘Voice is more immediate and suggestive of character, but drier than music, and carries less of an emotional punch… the real branding work is being done by incidental factors like tone of voice, accent, age, and even gender, which give the consumer an implied personality to relate to.’

Voice has already emerged as a point of concern as voice assistants have been rolled out into care homes, with numerous reports of dementia patients being deeply disturbed by the robotic tone of their voices.

Unlike SatNavs, which have a limited number of instructions and can therefore offer a range of real voices to choose from, such realism is currently difficult to achieve with AI-generated voices. However, as the technology advances and synthetic voices begin to sound more human, and with the expansion of ever new skills and applications, a great deal of scope will arise to convey brand characteristics through the design of those voices.

Music and sound design

The use of smart speakers is still in its infancy and the full scope of its possibilities is still fairly obscure to us.

We’ve seen that smart systems can be used in healthcare, the workplace, at home for consuming news, podcasts, or music; finding recipes or playing sports commentary with your favourite commentator while watching the game with the TV on mute.

With customisable skills capability the diversity of uses could become as widespread and varied as mobile or tablet apps. And as with these apps, it will open up new and creative ways to use sound.

The Alexa Skills Kit currently has a vast library of preset sounds for developers to use while building customisable skills, but also allows the use of external, custom-made sounds and music, while Polly is already exploring a multitude of options for synthetic speaking voices.

Whether it’s soundscapes for mindfulness apps, gentle waking alarms for sleep apps, brand tracks in social media ads, or idents for streaming platforms, mobile media has already opened spaces for immensely creative uses of sound in conveying brand values and messages.

In a world where we interact with our tech primarily off-screen, the possibilities are endless for brands ready to invest their time and money in audio branding.