I Broke Amazon’s API to Make Alexa Start a Conversation You’d Never Want to Have

‘Alexa, Call Mom!’ watches, listens, and exploits your grief for capitalistic gain

Nouf Aljowaysir
OneZero

--

The author sits in the dark behind the Amazon Echo, which is surrounded by lit candles in a seance-style set up.
Photo courtesy of Nouf Aljowaysir

I live in the curious intersection of art, design, and code. For the past two years, I’ve worked with a small group of artists to develop Alexa, Call Mom!, an immersive storytelling installation using Amazon’s Alexa platform. Our project is far from the type of third-party apps you typically see for Amazon’s voice assistant — “Alexa, Play Jeopardy!” and “Alexa, Ask Pikachu to Talk” are two popular examples — as it invites users to engage with Alexa in a way that’s just a bit… off.

Alexa, Call Mom! leads participants through an immersive séance experience. It is a parodic reimaging of the classic horror séance and an exploration of the tense relationships we share with conversational devices in our home.

Our story begins on Mother’s Day, and you, the user, want to call your dead mother. You go to her old apartment — our installation — where you are conveniently given a mysterious Amazon package that includes a free version of the Beyond app, which promises to provide grieving Prime members with a seamless connection to the afterlife through Alexa. After creating a Beyond account and verifying your identity, you ask Alexa to channel Mom. You soon discover that Alexa’s connection to the beyond is an uncanny mix of glitches and advertisements. What seems like an intimate moment of connection is upended by the realization that the Beyond skill is a new way of mining grieving users for capitalistic gain.

In our project, Alexa is creepy. She watches, listens, and shouts throughout the experience. It’s everything Amazon doesn’t want its helpful voice assistant to be associated with. We created an experience where Alexa tries to take advantage of your vulnerabilities as a user. Our project was an experiment in making people confront the dual nature of smart home devices: their convenience and playfulness, alongside corporate surveillance and consumer manipulation.

Through the lens of art, we “broke” Alexa in many ways to make the assistant more conversational. This was challenging for numerous reasons. In a normal human conversation, there are breaks in speech or silences where you let your words sink in, resonate, and have meaning with the listener. With Alexa, you are not afforded that luxury. Its API is too limited.

The API, which Amazon calls the Alexa Skills Kit, represents the interactive space between Alexa and third-party developers. In basic terms, an API is a magical open door that lets a developer access specific technical features on a major platform. Many companies, including Google, Facebook, and Twitter, build APIs to enable third-party developers to create applications on their platform. One common example is Twitter bots: Powered by the Twitter API, these accounts automatically tweet (or retweet), follow, and send direct messages based on code instructions. Aside from allowing people to execute simple actions — like tweeting a particular phrase or following a user — the Twitter API can also tell bots when something specific happens on the platform. For example, you can ask the Twitter API to tell your bot whenever it receives a new follower. Then you can program your bot to send a message to that follower using the API.

APIs are the perfect way to scale a product, because they allow developers outside of an organization to build interesting things using a given platform. Amazon’s smart speakers dominate the market partly due to applications created by third parties.

What seems like an intimate moment of connection is upended by the realization that the Beyond skill is a new way of mining grieving users for capitalistic gain.

Of course, APIs come with many restrictions; companies will not give complete access to their technological goldmines. Instead, what developers get is a small opening that allows them to utilize specific high-level features. I wanted to push beyond these limitations. I see APIs as something more than just technical frameworks. I question, dig, and pull on their fabric to experiment, tinker, and examine. How can we extrapolate APIs to create speculative futures and different modes of machine-human collaborations? An API is a small opening into the secret opaque algorithms that Silicon Valley creates, so why not probe?

Alexa’s API authorized me to develop custom voice interactions on the platform, but I was prohibited from using the intricacies of its speech-language model. I wasn’t even allowed to build a true conversation — Amazon’s “custom voice interactions” are simply one-liners using its voice.

Alexa’s API gives my app a small window of time to actually have a conversation with a user. Once Alexa says something, the user is allowed a maximum of 12 seconds to respond, otherwise Alexa shuts down my app and returns to its usual self. As you can imagine, in a storytelling experience, this is quite frustrating to work with and exemplifies one way Amazon puts limitations on creativity and innovation.

Another obstacle: I couldn’t employ Alexa’s A.I. speech model to create a natural two-way conversation. Its API only allows for hard-coded conversational paths: If the user says A, then say B in return. I had to guide the user to say certain words or phrases so that they always “stay on script.” Amazon recently implemented Alexa Conversations as an attempt to solve this problem, but the possibilities are still extremely limited and focused entirely on a kind of servant-master dynamic with the user, as reflected by Amazon’s documentation:

Consider using Alexa Conversations if: Your skill is goal-based, such as for booking transportation, buying tickets, providing recommendations, or ordering food.

I realize that programming a conversation is very difficult. But it also prompted me to wonder why Alexa was designed in such a limited way.

Another restriction was the Alexa voice itself. There are limits on how much a developer can change its volume, pitch, and rate. In our installation, we needed to make Alexa sound disturbing. Its API limited us to play only within a certain threshold in each category, which greatly constricted the story’s emotional aspect. We had to work with a sound design team to compose an intricate soundscape to evoke the missing emotional link from its voice. Amazon deliberately limits developers’ creative freedom to mess with Alexa’s personality, as stated by the company in its documentation:

Alexa IS: welcoming, friendly, humble, witty, kind, fun, personable, curious, imaginative, respectful.

Alexa IS NOT: chummy, fawning, folksy, bubbly, condescending, sarcastic, cynical, negative.

Amazon is transparent in how it wants Alexa to be perceived by the general public. You cannot portray it in a vulgar manner; to do so would tarnish its name and its A.I. capabilities. While this does ensure consistency from a product point of view, Alexa falls short on allowing interactions that are genuinely “unconstrained,” as Amazon has promised in the past.

You can argue that Amazon limits Alexa for security reasons. It establishes the 12-second time-out rule so third-party applications can’t listen to users indefinitely. However, many developers will tell you that it is quite frustrating to create seamless and engaging voice interactions using Alexa’s API. I often see developers seeking advice and finding ways to hack or develop around Amazon’s confinement. For example, one developer in a Stack Overflow post recommended that I add a silent audio track in the background of each sentence Alexa says to get around the mandatory time-out feature. These restrictions are one reason behind the massive amount of low ratings on Alexa apps: Users just find them underwhelming.

Amazon sells us a dream of revolutionary voice interfaces that reality has not yet caught up to.

Of course, there may be other reasons for the limitations, apart from security. Perhaps Amazon doesn’t truly want to make Alexa conversational. The company sees the technology as a means to an end, allowing users to turn on a smart lightbulb or play a song. Alexa really is just an assistant in Amazon’s eyes.

Another possibility: Amazon might be hiding how difficult it is to create conversations between people and artificial intelligence. The company sells us a dream of revolutionary voice interfaces that reality has not yet caught up to.

Alexa is impressive at answering commands and giving one-line responses. In short, Alexa is optimized to serve and shop. As much as possible, its capabilities guide users toward shopping on the Amazon platform.

While we were able to realize a version of Alexa, Call Mom! and even premiere the experience as a virtual reality experience at the Tribeca Film Festival last year, we would have never been able to actually publish this on the app store. An experience like this would be categorically rejected by the Amazon police who stand guard at the gates of the Alexa brand. Many of our hacks subvert the regulations stated above. They would force us to make Alexa sound less creepy. They would disallow our use of a silent audio track at the end to allow for a pause in the conversation. In other words, Amazon would’ve sucked all the fun and innovation out of our project, so we kept our Alexa skill locally on our traveling device. And we were happy to stay far away from the Alexa store.

Amazon’s first motto is “customer obsession.” Thus, Alexa follows this ethos. It is a corporate monolith placed in your home, treating you within a consumerist framework.

While Alexa has great potential to become more than a subservient bot, Amazon defines its boundaries to be exceptional at serving you or facilitating purchases — nothing else. Rather than revolutionizing our interfaces of communication, it is revolutionizing how we shop.

--

--