Wednesday, November 12, 2014

Listen.ai: Voice Interfaces for the Internet of Things

Both Technology giants and several startups in the valley are betting on one of the next big things in technology - the Internet of Things(IoT). Gartner projects that some 26 billion odd devices will be connected to IoT by 2020. These devices will have various sensors and actuators and will adorn our homes, offices, our bodies and our vehicles. These devices would be virtual or embodied, connected to the internet and would enable direct and subtle affordances to end users to perform daily activities more efficiently.

Wit.ai, a valley startup providing voice enabled intent APIs via speech understanding technology hosted Listen.ai conference, an august gathering of technologists at a wine tasting facility in San Francisco on Nov 6th, 2014. The conference was focussed around technologies for supporting the voice modality for IoT and personal assistants. The event speakers included one of the founders of Siri (including others in the audience) Adam Cheyer, ex-Pixar fame CEO of a story-telling kids animation startup (ToyTalk) Oren Jacob, former CEO of Nuance communications Ron Croen, Jibo technologist Roberto Pieraccini, Stanford linguist Dan Jurafsky, Pebble founder Eric Mivikovsky, and ex-VP of Google Now Vishal Verma, Wit.ai CEO Alex Lebrun besides others in different panel sessions and keynotes.

Adam Cheyer gave a very insightful talk tracing back from Siri's acquisition by Apple in 2010 to the early days of SRI from where Siri and several other similar AI technologies were conceived and developed. He made several insightful remarks in his keynote. In his 'Back to the Future' presentation, he suggested that Siri today is a small fraction of what it used to be in the past also during the PAL/CALO days at SRI, with support for multi-modal interface, deep reasoning systems, active ontologies, learning in the wold and open agent architectures. He gave a pretty solid high level overview of Siri technology back then, suggested that they hadn't built a general QnA system but intelligent interfaces with domain and task models for 'Do-engines' and service orchestration with more that 40 different service providers. He also mentioned about his new startup project called Viv, working on building the global brain architecture that could resolve complex queries in the wild and learn new concepts seamlessly.

The next panel session was most interesting with discussions on 'How personal assistants will shape the future of IoT'. There was a debate whether one assistant would emerge out as the all encompassing interface for all personal assistance needs or there would be several systems doing their own stuff. There was no clear winner side emerging out of the debate but the ball seemed to roll towards different systems solutions. Like my car mechanic cannot suggest to me an exotic restaurant for a dinner date, one solution cannot interpret all our complex needs involving so many different domains and verticals. There are things that can be automated (the ~25%) like setting up alarms in the morning, and there are other things (~75%) that require complex interpretation. With the voice modality, the cognitive load has reduced significantly for the end user, but that has translated to increased complexity of the systems that translate the user needs to actionable intents for the intelligent systems. Everyone agreed that expectation management is a huge problem for these intelligent assistants today (think of the endless sessions we spend on Siri asking weird stuff). Adam mentioned how he fails to envision the part of the technology behind building Samantha from the movie 'Her' that would just want to watch Joaquin Phoenix sleep in one of her moments. What could Samantha possibly learn from such an observation? There were other interesting discussions about user models and business models and context understanding.

Dan Jurafsky gave an amazingly appetizing talk just before lunch on 'The Language of Food' linking food and language with history, geography and sex. I just received my copy of the book a couple of days back!

There were other very interesting talks and panel sessions elaborating on the history of speech technologies, on story-telling animated apps for kids (ex-Pixar) and discussions on user interfaces for the future. Oren from ToyTalk described how difficult it is to understand intents from kids and that the industry needs to do much more to build various models customized for interaction with kids. At several places during the day, the topic of Emotion understanding came up and people agreed that it's a huge opportunity space for industry. Another interesting discussion was about whether Personal Assistants should have a character and personality or not - Siri vs. Okay Google debate. Experiences from the distinguished panelists suggested that it is a very hard problem to have character and personality in systems, but this gives a lot of mileage to systems in the long run. You still remember R2D2, don't you?