Building a voice interface for a spoken word platform




In August 2019, my team and I were commisioned to build the voice app for the podcast-streaming startup Luminary, "The Netflix of Podcasts".

We aimed to make this the most natural surface for their product, a voice interface for a voice product.

The main objective was enabling frictionless consumption of media, providing a seamless experience between any kind of device.


We created the first Alexa Skill to offer seamless subscriptions across devices and surfaces. Ubiquity, from smart headphones to home stereo.


The App helped Luminary acquire hundreds of new subscribers within the first few hours of being live. It was therefore picked up by media outlets, and featured by Amazon.


One thing was clear in the summer of 2019, Luminary was atop everyone's mind in the media world. With one hundred million dollars to spend, and a bold message behind their value-prop "Podcast's shouldnt have ads", they quickly polarized public opinion, and captured big names within the industry.

The backlash was swift, and many top podcasts widthdrew from the platform. Hard headlines rolled...

Stakes were at an all time high when we were hired. Luminary needed to prove they we're worth their cost.

We were asked to take them "everywhere" from smart-headphones, the car, to home echo devices. Make Luminary ubiquitous.


Summer 2019, NY and Chicago

Designing the Process

Because of the size and scope of the project, we needed some serious thinking behind our tooling and Design Process.

As the design director in charge, I had already been considering this for while, and I outlined a new way to apporach conversational design for products, consisting of the following 5 steps.

Research & Alignment

We talked with several users and podcasters, and reviewed the documents that Luminary had already produced about their users.

We took a deep dive on the Alexa Skill Store and the actions for Google Assistant, as a form of "competitive landscape research".

All of this was compiled on a presentation with the different insights, and what would become our guiding POV:

"Less But Better"


Features &
Contextual Scenarios

This is a key document when designing a conversational / voice experience. Coming from the previous step, we should have a guideline on the main features.

During this step we identify all the different states between features and user types.

This is how we flesh out how each feature works, and how the user navigates between them.

This is not a very sexy document, but it's crucial to align on this early in order to move ahead.



We then plotted these scenarios on an emerging user journey chart, that helps us evidence holes, and areas of opportunities.

Milestones and Scripts

Once we've aligned with the client on the different features, and they understand and approve the scenarios, we have to create scripts for each one of these situations.

These Script documents can get lengthy, confusing, boring, and hard to keep track of.
After several failures, I rallied the dev troops and got approval to build a (very minimal) tool to share scripts with the client.

What you see here in an early wireframe prototype of the script reader.

The main advantage to this tool is that you can hear the different scenarios (instead of reading), and you can easily jump from one to the next.


Wireframes and Hi-Fi

Once all the scenarios are milestoned and scripted, we read through them and identify where the user would benefit the most from a screen.

As a rule of thumb, there's usually once screen per scenario, but each one of these might contain a variety of dynamic objects.

Amazon Echo devices have many constraints when it comes to visual design. Limiting the font choices to just two, and placing limits on how much information can be transmitted at each point. But we embraced contraints.


State Machine Diagram & Voice User Interface (VUI)

The last step of the design process is the production of the technical documentation that instructs how everything connects, how the logic works, and what would be the response to each interaction on every state.

This is a hefty document, mostly intended for developers, but is digestible enough for Product Owners.

These two are the documents where most designers begin, as in most agencies they don't write scripts, or think about the states as contextual and interrelated, but rather as lineal progressions, which in most cases creates experiences that resemble an IVR, like the ones we get when we call customer service for almost any given company.


Establishing Metrics

The very last step before development was aligning and tagging all the basic events and properties, and how the funneled into KPI's.

There was a lot of back and forth with the different members of the team, all the way to our client's CFO. The key to this process was transparency and accuracy.

We used mParticle, which is very different to Google Analytics, and this implied a slight learning curve for the whole team, and was an exercise in applying constraints, and thinking how to best shape more abstract KPI's.

The following image shows some of the documents we used to socialize the different metrics and KPI's


The Product

Seamless from Car to Headphones


Several hundred new subscribers acquired in the first few hours of going live
User engagement with the Luminary platform in general increased 34%
5-Star Rating in the Amazon Skill Store

Featured Alexa Skill


Picked up by Media


Tech Crunch






Podcast industry specific Media

Some Learnings

- Don't be afraid of changing the process

- Think big, start small

- Cooperation over hierarchy

- Include everyone (even client) in daily's

Our Team

- Yarden Abukasis, PM

- Paul Zumbrink, Design Director

- Lauren Madsen, Skill Designer

- Shanna Walia, Strategy Analyst

To comply with my non-disclosure agreement, I have omitted and obfuscated confidential information in this case study.
All information in this case study is my own and does not necessarily reflect the views of RAIN or Luminary.

© 2020 Paul Zumbrink. All rights reserved