10 Easy Questions towards a sound Voice 🗣 Strategy

It's time to talk.png

Voice and other Conversational Interfaces are the latest mainstream technical innovation that could have a major impact on your business. I therefore present you with 10 easy questions – which might be a bit tougher to answer – in order to come up with a sound Voice Strategy.


1. Where are you now? Getting your bearings

Voice interfaces are being adopted faster than any communication technology before it:

Smart speaker market penetration.jpg

Next to dedicated speakers – like the Amazon Echo and Google Home – they are in cars, mobile phones, earphones, appliances (think Smart TV’s), etc.

It’s estimated that there will be 8 billion digital voice assistants in use by 2023.

And as recent Microsoft research showed: Voice assistance is becoming the norm, as 72% of surveyed respondents reported using a digital assistant in the past 6 months.


2. Do you need a voice strategy? Or 20/20 hindsight


Did you need an Internet Strategy in 1990?

Did you need a Search Strategy in 2000?

Did you need a Mobile Strategy in 2010?

Even if you don’t have a vision on the future of voice interfaces yourself, others do. So keeping tabs on developments in voice (and knowing what they are) is a strategy as well.

Keep an eye out for remarks and terminology like this:

Voice use quotes and terminology.jpeg

3. Do you have an open mind? Possible paradigm shift

Voice is not the “faster horse”, so don’t bluntly compare it with the previous interfaces. Take a look a this graphic by voice visionary Brian Roemmele:

Interface graph.jpg

Each era had its respective winners – from IBM to Microsoft, Google and Apple. Who, and in what way, will dominate the Voice Interface era (which, according to above mentioned Brian Roemmele, might be the last interface) is still very much undecided.


4. Does your business interact with humans? Voice is the most natural

Our brains are evolutionary wired for voice. Voice is the human I/O.

Everything you type and read is the work product of a “silent voice” in your brain.

The brain processes are visualised in this graphic by again Brian Roemmele:

Brain processing.png

And ~100% of the information from this phonological loop/speech is retained for ~400 seconds and may be processed.

In comparison:

  • ~97% of the information of the visual cortex become exformation. This means it’s lost immediately!

  • ~75% of information of the auditory cortex become exformation. Again lost!


5. How does a user invoke your product/service? By asking!

Expect, in the near future, for a user to simply state an intent to the nearest available, trusted, voice assistent and have it fulfilled according to his/her previously uttered or learned preferences.

DILBERT © Scott Adams. Used By permission of ANDREWS MCMEEL SYNDICATION. All rights reserved.

DILBERT © Scott Adams. Used By permission of ANDREWS MCMEEL SYNDICATION. All rights reserved.

This may indeed seem awkward now, but in order to be or stay part of this flow you need to ‘deconstruct’ your users needs into intents. And learn what wordings (utterances) are used and in which context.

In the short term you will need to claim your voice search query and in the long run expect users to switch assistants and/or queries very reluctantly.


6. Is the interaction near- of far-field? They are different use cases

Near-field and far-field refers to the distance between the speaker/ear(s) and microphone/mouth(s):

From near-field (left) to far-field (right)

From near-field (left) to far-field (right)

So for near-field use cases you should consider users with Apple AirPods earphones with built-in ‘Hey Siri’ detection or the rumoured Amazon Echo Earbuds. Also a voice assistant in a car with a single occupant can be seen as near-field. The point is that the interaction can be considered confidential, because it can not be overheard.

In the case headphones/earphones you should also keep in mind the future possibility of gesture control (i.e. taps and head movements) for ‘silent’ user input or feedback.

The remaining far-field use cases are those where the interaction is out in the open, where the conversation may be overheard or is intentionally shared between different users.


7. How smart is your AI? Claim your domain knowledge

Artificial General Intelligence is still a long ways away, but that doesn’t mean you can’t master a specific domain already.

AI definitions.png

Start by defining a narrow domain and feed your artificial intelligence, meanwhile managing user expectations.

AI improves on iterations and variations, so start generating as many as possible.


8. Are you making preparations? Start ‘dogfooding’

In order to get your voice/chat service and the underlying AI to a minimal viable level you have to start feeding and testing your system. An internal version of your service (i.e. ‘eat your own dog food’) is essential.

Kayak App with ‘Add to Siri’ button which allows you to link a dedicated spoken intent like ‘My travel plans’ to the array of commands.

Kayak App with ‘Add to Siri’ button which allows you to link a dedicated spoken intent like ‘My travel plans’ to the array of commands.

The ideal place to start is with the consumer care agents (which also makes it an opportunity, not a cost).

Each customer inquiry and resolution is ‘free’ input for your system. And the agent can be the (temporary) controlling and mitigating interface between your fledgeling conversational system and the end-user.

You can already take advantage of some low-hanging fruit. For example by adding the “Add to Siri” button to your mobile App, so users can start to become familiar with the possibilities of voice control. For more information on the potential see this article on ‘Siri shortcuts’. 


9. Do you have screen/human follow-up in place? You must

It’s not voice only, but voice first. This means that in some situations the response to a spoken request needs to be visual, e.g. an overview, (long) list of options or an image on an app/watch/speaker screen. This is called multi-modal interaction.

One reason for this is that humans speak faster than they can type, but conversely we read (and visually scan/compare multiple data points) faster than we can hear. However, it should be purposeful use of the screen and not just fallback, because of limitations in the context awareness of the voice assistent.


Also, in many cases for now a human respondent is still more intelligent and therefore faster (with the bot doing mundane and preparatory tasks). So the handover between bot assistent 🤖 to human support 👨‍💻 needs to be seamless. 

See for example Audible, Amazon’s e-book company, that is Offering Live Customer Support Through Amazon Echo Devices.


10. Do you have the right expertise? Voice and Conversation Design Skills

Even though Voice is ‘yet’ another digital interface, the skills for designing one are quite different from those for web and mobile.


Especially the psychology and dynamics behind a good dialogue are new. Not to mention the NLP and AI components.

So if you have trouble formulating the right answers to the previous questions; get in touch via and we’ll schedule some time to talk!


1 Comment

Creating the Digital Experience Interface

Or a ‘cinematic’ webpresence with a chatbot voiceover

Sketch for Digital Experience Interface.jpg

Digital experiences are what fascinate and drive me. That’s why I admire new media art or #netart (e.g. by Rafaël Rozendaal ) and I strive to create inspirational and lasting digital presences in whatever digital medium that is applicable.

It’s with this in mind that I propose the idea of a web presence with cinematic traits and embedded chatbot, and critique the existing examples of digital experiences.


Let me first state that — in my opinion — it’s impossible to actually create or design an experience; a person has an experience. And unless you’re a geneticist or neuroscientist running a mind control experiment, you don’t get to determine the actual experience (sorry UX’ers 🙁).

You can however try to influence the experience, by telling a story, compel actions/reactions, evoking emotions and designing the interface the person interacts with. I prefer to call this the Experience Interface, rather than the User Interface. It’s more a broad vs. narrow definition, where for me the User Interface is the controls the user is given and the Experience Interface is the totality of the environment and how it evolves during the interaction.

A number of factors is important here.

Story and Cinema

Our evolutionary hardwired way of sharing experiences is storytelling. According to Wikipedia “the term ‘storytelling’ can refer in a narrow sense specifically to oral storytelling and also in a looser sense to techniques used in other media to unfold or disclose the narrative of a story.” It predates writing, but rock-art (as supporting material) may have served as a form of storytelling for many ancient cultures.

So it’s not story reading, not story watching, but story telling. And (again via Wikipedia):

“Crucial elements of stories and storytelling include plot, characters and narrative point of view.”

Cinema, or motion picture, is a very compelling way of storytelling, and basically a big step up from the rock-art. It also uses some techniques that greatly enhance the conveying of a story, specifically: the establishing shotpanningtilting and zooming.

To clarify “An establishing shot in filmmaking and television production sets up, or establishes the context for a scene by showing the relationship between its important figures and objects. It is generally a long or extreme-long shot at the beginning of a scene indicating where, and sometimes when, the remainder of the scene takes place. (Wikipedia)”.

Panning is swiveling the camera horizontally from a fixed position. Tilting is the same, but rotating vertically. And finally, zooming brings the object closer (close-up) or further away (wide shot).

To actually craft a good visual story we can take some pointers from screenwriting. In the chart below you see the ‘3 Act Structure’ as defined by the late Syd Field.

3 Act Structure by Syd Field, author of  Screenplay

3 Act Structure by Syd Field, author of Screenplay

The main elements are the plot points and working your way towards the resolution. The ‘action’ and ‘confrontation’ shouldn’t be taken literal in the sense of ‘physical’; the confrontation can also be emotional and the actions/plot points manifested as certain insights during the progression of the story.

Utilizing these storytelling elements and techniques have proven to be an effective way of influencing human experience.


Maybe even more impactful than a story, and certainly essential to the dissemination of a story is the conversation about the story. The force of a story grows exponentially with the speed and ease in which it is shared (and no, this is not achieved simply by adding a ‘share’-button).

And with conversations there’s a lot of neuroscience at work.

Judith E. Glaser, in her book Conversational Intelligence states “Conversations have the power to change the brain — they stimulate the production of hormones and neurotransmitters, stimulate body systems and nerve pathways, and change our body’s chemistry, not just for a moment but perhaps for a lifetime.”

And “Conversations impact different parts of the brain in different ways, because different parts of the brain are listening for different things. By understanding the way conversations impact our listening we can determine how we listen — and how we listen determines how we interpret and make sense of our world.”

“Language plays a role in the brain’s capacity to expand perspectives and create a ‘feel-good’ experience” — Judith E. Glaser

The importance of this for every product is stressed by Ross Mayfield in his article Products Are Conversations. He states “Your customers want to talk with you […]. They want to be heard and want you to understand their needs. It’s your job to enable these conversations, and figure out how to have them at scale.”

I also like to borrow his quote:

“The single biggest problem in communication is the illusion that it has taken place”

Which he attributes to George Bernard Shaw, but according to Quote Investigator 🔍, it should be William H. Whyte 😉.

So the better conversation you have or incite, the better the experience, and the better your message gets across.

This is also what makes conversational interfaces (like chatbots and voice-assistents) so powerful.

Experience preparation

Like language and visuals there are other dramatic techniques you can use to influence an experience. For this I’d like to refer to Kile Ozier ‘s 5 tenets of Experience Creation (he can create experiences, he’s just that good):

N.B. The explanations contain my paraphrasing

  • Exploration of Assumption; what is the audience assuming when accessing the interface. And can you circumvent or overcome or rather leverage and enhance those assumptions.

  • Liberation of Preconception; preconception is a conversation(!) going on inside the head of the audience, i.c. the user, reassuring s/he knows what’s going to happen. If you can liberate them from this, you can give them a fresh experience and renewed excitement.

  • Comfortable Disorientation; feeling safe in not knowing what’s next. Effectively executed, this technique results in an immediate, deeper level of trust on the part of the audience, i.c. the user, and an intangible yet greater willingness to suspend disbelief.

  • Successive Revelation; too much, up front, can completely overload the audience early and virtually numb them to further sensation, empathy or inspiration; leaving them inured to subtlety and nuance as the Story or Experience unfolds. You should try to shape the arc of storytelling by balancing curiosity and revelation.

  • Subliminal Engagement; inviting the audience, i.c. the user, to participate in the creation of their own experience. Allow for the journey or journeys to be completed in the imaginations of audience members, i.c. users.

So if you truly want to create a digital (user) experience, you’d better be willing to put up a show 🎭 !

Application to web presences

The above mentioned fundamental elements and techniques with regards to (visual) storytelling, and experience preparation, are — in my opinion and perception — currently lacking from most digital experiences. Even the ones that are most dependent on story, like brand websites.

But there are some notable exceptions!

For example the Mercedes-Benz campaign site for their new, all electric, vehicle the EQC:

Animated GIF. See  website  for the full experience (n.b. on mobile the intro tilt doesn’t seem to work)

Animated GIF. See website for the full experience (n.b. on mobile the intro tilt doesn’t seem to work)

Let’s brake it down:

“Electric now has a Mercedes”. Now there’s a story headline! And there is a beautiful establishing shot (with tilt). The story unfolds. Reveal after reveal. There are pans and tilts (in intro and outro) and parallax scrolling (top layer scrolling faster than bottom layer to create depth)! All-in-all a visual feast, with some subliminal engagement: you control the scroll and can pan/tilt around in the establishing shot a bit.

What’s still lacking is conversation, or any meaningful interaction for that matter. Which is a shame, because it would have been a great opportunity for Mercedes-Benz to get a first impression of the user’s reaction to their new product.

On to another example: Apple’s iOS12 website, introducing their latest software update for the iPhone and iPad.

Animated GIF. See  website  for the full experience.

Animated GIF. See website for the full experience.

Again a break down: First the establishing shot is reminiscent of the queue for the ride at the Wizarding World of Harry Potter at Universal Studios in Orlando.

Harry Potter portrait wall.jpg

Talking Portraits and all. Some ‘Comfortable Disorientation’, to which Kile Ozier notes: “Theme parks strive for this all the time, often with what I call the Venice Effect; bringing guests through a queue that is often labyrinthine, usually feels a bit cramped — limited sightlines, low ceilings — to then be suddenly released into a space that seems vast by comparison.”

For a fluent version with sound see  YouTube

For a fluent version with sound see YouTube

Of course Apple here has the benefit of the iPhones/iPads as ‘natural’ frames. There’s some ‘successive revelation’ in introducing the different features with auto-playing animations, but other than that there’s little more experience to be had.

Again the lack of interaction and conversation is a missed opportunity.

And to show that it is possible, see the Typeform blogpost on the rise the conversational interface:

Animated GIF. See  website  for the full and personal experience

Animated GIF. See website for the full and personal experience

I encourage you to give it a try yourself by checking out the blogpost and putting in your own responses. Even though the ‘conversation’ is preformatted, by making your response choices, you get the feeling you’re sharing your view and engage on a basic level.

I first saw this on Adrian Zumbrunnen ‘s personal website, which does allow open ended input. This, at least for me, immediately raises the bar for engagement, because what happens if he responds in person 😳? So it definitely triggers an emotion on my side, which is a good thing.

Chatbots of course make scaled conversations like these possible and can gather a great deal of customer insight. And the Typeform example shows the potential storytelling power and enhanced experience when it is seamlessly integrated in the total experience interface.

The ideal

So an ideal digital experience interface, to me, should contain all the above mentioned storytelling elements and utilize all the available techniques (with moderation of course).

What I envision is basically a merger between the before mentioned Mercedes-Benz EQC site and the Typeform blogpost;

The interface commences with setting the scene in a visually attractive and seductive way. Than the chatbot comes in as a voice-over, or narrator, and engages you by giving pointers as to how to further explore the interface, triggering different user controls and visual attractions; unfolding the story with you. It asks questions and gives options, actively soliciting your feedback.

To be clear the chatbot is not ‘on top of’ (which is often seen as ‘live chat’, and completely disengaged from the site content and story) but embedded, taking or talking you through the story, actively engaging you and asking for your feedback in the moment.

And more…

If we could have all this, there’s one more thing that would be the icing on the cake 🍰: Sound. Or better ‘sonic branding’ 🔊.

Of course if you add video or podcast to a webpresence, you have sound already. But I see it as a stand-alone feature, supporting the overall experience.

To clarify, sonic branding, or sound/audio branding, is “the strategic use of sound … in positively differentiating a product or service, enhancing recall, creating preference, building trust, and even increasing sales. Audio branding can tell you whether the brand is romantic and sensual, family-friendly and everyday, indulgent and luxurious, without ever hearing a word or seeing a picture. And it gives a brand an additional way to break through audiences’ shortened attention spans.” (Wikipedia)

And again I’m not thinking jingles, but embedding sounds that you associate with the topic or product. To give you an idea listen to👂 the NIKE Freestyle video based on an idea by Jimmy Smith:

Wouldn’t you feel more engaged if you visited a website selling athletic shoes and hearing this in the background?

I look forward to your feedback and hope to create some of these ideals with you.

1 Comment


Siri: “Which wife?”

Wait! What 😳? I asked: “Hey Siri. What can you tell me about my wife?” And I was expecting or anticipating to get some general information like maybe her age, her occupation and perhaps some social media updates. As I was giving my wife some examples of voice commands, I most certainly did NOT expect to get Siri’s, Big Love inspired(?), response: “Which wife?”.

When we both quickly looked up at the screen, it turned out that I had two separate contact entries for my wife’s name on my phone, and Siri needed to know which one I was looking for. So Siri didn’t have a problem hearing me, also did know who my wife is, did have the information available, but still stumbled on a seemingly minor detail.

So much for live demoing 😉, but it’s a fair example of the current state of voice assistants. A recent Digital Assistant IQ test did put Google ahead of Apple’s Siri, but I don’t expect Google to have handled it much better.

However, I’m increasingly growing fonder of using voice commands for certain tasks, as they just give a better experience (i.e. more convenient) than a gesture or typed input. In general, repetitive, daily tasks are most likely to be more convenient through a voice command. 

For me it’s “Turn on/off the lights”, “Set an alarm/timer for..”, “Is it going to rain?“, and (even if the answer to the former question is affirmative); “Start an activity outdoor walking” (because I’m already holding the dogs leash, the door keys 🔑, possibly an umbrella 🌂, and a ball 🎾 the dog 🐶 is trying to grab).

They’re basically cues or simple questions that I know my connected devices can handle (and don’t yet handle on their own automatically).

[intermezzo regarding connected lights 💡: voice operating your (connected) lights, in my case Philips Hue indoor & outdoor through Apple Home, is the same difference as using a landline ☎️ versus a smartphone 📱; the experience and range of possibilities is almost incomparable.]

Today (July 26, 2018), in the Netherlands, a slew of new ‘Skills’ became available on Google Assistant as they launched their Dutch language capability.

It included among others two national news broadcaster (@NOS, @RTL), our two airlines (@KLM, @Transavia), one of the biggest banks (@Rabobank) and two of the biggest energy suppliers (@Essent and @Eneco) that have connected thermostats. And, finally, our biggest grocery stores (@AlbertHeijn and @Jumbo), online retailer ( and our national postal service (@PostNL).

The fact that all these big players are present at launch does give an indication of the expected potential of voice assistants. I haven’t been able to test them yet, but from what I understand from the various press releases the possible actions are mostly status updates and generic commands, some of which might make more sense (to me) than others.This is a good thing because through experimenting with all these skills you can on the one hand experience the limitations and on the other see the possibilities of voice commands and interactions.

For understanding ‘Voice’ it’s especially important to get an idea of all the implications/variations of a simple question; as you see on the screen recording below, Siri is changing it’s understanding of my question as I speak.

Siri setting alarm demo.gif

For example; if I had paused a little longer after stating ‘6:30’ it would have set an alarm for the same day. You can try it yourself with any kind of voice command and see the different variations it goes through (at least with Apple’s Siri you can see it happen).

In my experience comparing voice interactions with typed or gestured interactions some routines are just faster, where others are just cumbersome. This is keeping in mind the complete interaction: question and answer, where the latter can be simply executed (e.g. lights go on/off), spoken or displayed(!). 

As Ben Sauer pointed out in his #OpenVoice presentation: “Using voice input is great because it’s faster than typing. [but] Listening to voice output is hard because it’s slower than reading.

The faster voice commanded, fully completed routines will stick.

Keeping this in mind I can look forward to a lot more voice commanded (kitchen) appliances, so I don’t have to operate them with my dirty or otherwise occupied hands. And I can imagine that setting up devices through voice commands can give a much richer and more fulfilling experience.



A chatbot for news

Last year I got the opportunity to create an update for the Chatbot of the Dutch News Broadcasting association NOS and we launched it right before the end of the year.

It's a Facebook chatbot that gives you daily news updates and allows you to search their archives. You can find it here:

NOS Update chatbot

For details about the bot and how it is setup I kindly refer you to the Medium article I wrote about it.


Experimenting with Chatbots

The best way to learn about bots is to build and experience them.

For my business bot I used, a chatbot creation platform that specifically allows you to create 'conversational articles' or 'convos'.

For my personal bot I used, a WYSIWYG chatbot creation platform with all the main Facebook Messenger capabilities and even some Artificial Intelligence functions.

Please click the links below to interact which each bot, or scan directly from Facebook Messenger:

Link to VIRVIE's bot

Link to VIRVIE's bot

Link to Almar's bot

Link to Almar's bot

1 Comment

An updated web presence

Today, after more than 10 (!) years, I updated It was long overdue, but I was just too fond of the original design.

The original site, or VIRVIE vintage, is still available (just follow the link), because I think it's important to also preserve our digital heritage.

It also marks the end of VIRVIE's ventures in the Web and App space. One of these ventures was the creation of PeRSoN.NeL. It started out as a web based URL shortener, which allowed to at mini artworks to your links. It then evolved into an Augmented Reality App which allowed a user to show these same artworks, and now even in 3D, on there location in reality.

The evolution of PeRSoN.NeL

The evolution of PeRSoN.NeL

After this the App pivoted into diary App, which is still available from PRSN.NL, but no longer maintained.

1 Comment