Saurav Sahay's blog

Wednesday, November 12, 2014

Listen.ai: Voice Interfaces for the Internet of Things

Both Technology giants and several startups in the valley are betting on one of the next big things in technology - the Internet of Things(IoT). Gartner projects that some 26 billion odd devices will be connected to IoT by 2020. These devices will have various sensors and actuators and will adorn our homes, offices, our bodies and our vehicles. These devices would be virtual or embodied, connected to the internet and would enable direct and subtle affordances to end users to perform daily activities more efficiently.

Wit.ai, a valley startup providing voice enabled intent APIs via speech understanding technology hosted Listen.ai conference, an august gathering of technologists at a wine tasting facility in San Francisco on Nov 6th, 2014. The conference was focussed around technologies for supporting the voice modality for IoT and personal assistants. The event speakers included one of the founders of Siri (including others in the audience) Adam Cheyer, ex-Pixar fame CEO of a story-telling kids animation startup (ToyTalk) Oren Jacob, former CEO of Nuance communications Ron Croen, Jibo technologist Roberto Pieraccini, Stanford linguist Dan Jurafsky, Pebble founder Eric Mivikovsky, and ex-VP of Google Now Vishal Verma, Wit.ai CEO Alex Lebrun besides others in different panel sessions and keynotes.

Adam Cheyer gave a very insightful talk tracing back from Siri's acquisition by Apple in 2010 to the early days of SRI from where Siri and several other similar AI technologies were conceived and developed. He made several insightful remarks in his keynote. In his 'Back to the Future' presentation, he suggested that Siri today is a small fraction of what it used to be in the past also during the PAL/CALO days at SRI, with support for multi-modal interface, deep reasoning systems, active ontologies, learning in the wold and open agent architectures. He gave a pretty solid high level overview of Siri technology back then, suggested that they hadn't built a general QnA system but intelligent interfaces with domain and task models for 'Do-engines' and service orchestration with more that 40 different service providers. He also mentioned about his new startup project called Viv, working on building the global brain architecture that could resolve complex queries in the wild and learn new concepts seamlessly.

The next panel session was most interesting with discussions on 'How personal assistants will shape the future of IoT'. There was a debate whether one assistant would emerge out as the all encompassing interface for all personal assistance needs or there would be several systems doing their own stuff. There was no clear winner side emerging out of the debate but the ball seemed to roll towards different systems solutions. Like my car mechanic cannot suggest to me an exotic restaurant for a dinner date, one solution cannot interpret all our complex needs involving so many different domains and verticals. There are things that can be automated (the ~25%) like setting up alarms in the morning, and there are other things (~75%) that require complex interpretation. With the voice modality, the cognitive load has reduced significantly for the end user, but that has translated to increased complexity of the systems that translate the user needs to actionable intents for the intelligent systems. Everyone agreed that expectation management is a huge problem for these intelligent assistants today (think of the endless sessions we spend on Siri asking weird stuff). Adam mentioned how he fails to envision the part of the technology behind building Samantha from the movie 'Her' that would just want to watch Joaquin Phoenix sleep in one of her moments. What could Samantha possibly learn from such an observation? There were other interesting discussions about user models and business models and context understanding.

Dan Jurafsky gave an amazingly appetizing talk just before lunch on 'The Language of Food' linking food and language with history, geography and sex. I just received my copy of the book a couple of days back!

There were other very interesting talks and panel sessions elaborating on the history of speech technologies, on story-telling animated apps for kids (ex-Pixar) and discussions on user interfaces for the future. Oren from ToyTalk described how difficult it is to understand intents from kids and that the industry needs to do much more to build various models customized for interaction with kids. At several places during the day, the topic of Emotion understanding came up and people agreed that it's a huge opportunity space for industry. Another interesting discussion was about whether Personal Assistants should have a character and personality or not - Siri vs. Okay Google debate. Experiences from the distinguished panelists suggested that it is a very hard problem to have character and personality in systems, but this gives a lot of mileage to systems in the long run. You still remember R2D2, don't you?

Tuesday, November 26, 2013

What differentiates you from others?

Monday, November 05, 2012

Markov's take on Teaching

"The alleged opinion that studies in seminars [in classes] are of the highest scientific nature, while exercises in solving problems are of the lowest [rank], is unfair. Mathematics to a considerable extent consists in solving problems, [and] together with proper discussion, [this] can be of the highest scientific nature while studies in ... seminars might be of the lowest [rank]."

Source: http://www.sciencedirect.com/science/article/pii/S0024379504000357

Sunday, July 01, 2012

NIST Bigdata workshop

Facebook generates user logs of size 130TB/day and pictures of size 300TB/day. Google generates >25PB/day of processed data. Bigdata is about storage, processing and analysis of large amounts of data.

In this NIST organized one and a half day bigdata workshop, many stalwarts of computing along with other industry representatives came together to present and discuss the current infrastructure, technology and solutions in the bigdata space. Several people were invited to give talks (http://www.nist.gov/itl/ssd/is/upload/BIG-DATA-Workshop-may25.pdf ) Talks that I found interesting in the workshop were given by these people:

Ian Foster is a Distinguished Fellow and the Associate Division Director in the Mathematics and Computer Science Division at Argonne National Laboratory, where he leads the Distributed Systems Laboratory. He is known as the ‘father of the grid’. He described the Globus(GT) project that has been developed since the late 1990s to support the development of service-oriented distributed computing applications and infrastructures. Core GT components address, within a common framework, basic issues relating to security, resource access, resource management, data movement, resource discovery, and so forth. These components enable a broader “Globus ecosystem” of tools and components that build on, or interoperate with, core GT functionality to provide a wide range of useful application-level functions. These tools have in turn been used to develop a wide range of both “Grid” infrastructures and distributed applications.

M Stonebraker (developer of the Postgres RDBMS, former CTO of Informix, founder of several database startups) is a professor of Computer Science at MIT. Stonebraker has been a strong critic of the NoSQL movement and suggests that Hadoop based systems are a ‘non-starter’ for bigdata scientific problems. Hadoop based systems are only useful for embarrassingly parallel computations (such as parallel grep)

Stonebraker primarily talked about his efforts and involvement with two projects, VoltDB (commercial startup) and SciDB (open-source scientific database system).

(Age of Data; source: VoltDB)

In the graphic above, time is represented on the horizontal axis. To the far left is the point at which data is created. The “things” we do with data are strongly correlated to its age and some usecases are shown above. Just after data is created, it is highly interactive. We want to perform high velocity operations on that data at this stage – how fast can we place a trade, or serve an ad, or inspect a record? Shortly after creation, we are often interested in a specific data instance relative to other data that has also arrived recently – this type of analytics is referred to as real time analytics. As data begins to age, our interest often changes from “hot” analytics to record-level storage and retrieval – store this URL, retrieve this user profile, etc.
Ultimately data becomes useful in a historical context. Organizations have found countless ways to gain valuable insights – trends, patterns, anomalies – from data over long timelines. Business intelligence, reporting, back testing are all examples of what we do to extract value from historical data.

The above graphic (source: VoltDB) shows different technologies being used today for different types of application usecases for data value chain.

VoltDB (100x of standard SQL) is a blazingly fast in-memory relational database management system (RDBMS) designed to run on modern scale-out computing infrastructures. VoltDB is aimed at a new generation of high velocity database applications that require:

· Database throughput reaching millions of operations per second

· On demand scaling

· High availability, fault tolerance and database durability

· Realtime data analytics

SciDB is a new open-source data management system intended primarily for use in application domains that involve very large (petabyte) scale array data; for example, scientific applications such as astronomy, remote sensing and climate modeling, bio-science information management, as well as commercial applications such as risk management systems in the financial services sector, and the analysis of web log data. SciDB is not optimized for online transaction processing (OLTP); it only minimally supports transactions at all. It does not provide strict atomicity, consistency, isolation, and durability (ACID) constraints. It does not have a rigidly-defined, difficult-to-modify schema. Instead, SciDB is built around analytics. Storage is write-once, read-many. Bulk loads, rather than single-row inserts, are the primary input method. "Load-free" access to minimally-structured data is provided.

Stonebraker mentioned applications such as high frequency volume trading, sensor tagging, real time global position assembly as right candidates for database oriented bigdata problems. These data are good for cases involving pattern finding in a firehose, complex event processing, real time complex high performance OLTP as well as data with ‘Big Variety’ (see Daniel Bruckner and Michael Stonebraker. Curating Data at Scale: The Data Tamer System)

Michael Franklin is a Professor of Computer Science at UC Berkeley, specializing in large-scale data management infrastructure and applications. He is the director of AMPLab (See overview slides here: http://www.scribd.com/doc/58637242/ ) at UCB.

Some bigdata projects and systems he mentioned:

Bigdata processing - pregel, dryad, hadoop, M, hbase, mahout, hypertable, cassandra

Bigdata interfaces: Pig, Hive

AmpLab projects leveraging hadoop for complex bigdata problems:

Spark – Scala based

Shark – Spark + Hive - lots of caching for performance gains for iterative machine learning algorithms on big data

He also mentioned AMPCamp being organized on Aug 21- 22 in Berkeley – hands-on tutorial for Shark, Spark and Mesos, machine learning, crowd sourcing overviews, apps and usecases

Kirk Borne (George Mason univ) themed his talk around the following:

- characterize the known (clustering, unsupervised)

- assign the new (classification, supervised learning)

- discover the unknown (outlier, semi-supervised learning)

Dennis Gannon, Microsoft Research described about their 100 globally dist. data centers, 8-9 public data centers with 1M servers each, their International Cloud Research Engagement Project for applications such as Realtime traffic analysis and democratized access to big data. He also described Microsoft’s SAAS based solution competing with mapreduce/hadoop called Daytona and the datamarket build around it.

He also mentioned about their efforts to leverage bigdata and cloud support for MS Excel (some free plugins on MSR site for working with bigdata directly on Excel!)

Joseph Helerstein, Computational Discovery Department, Google described ‘Google Exacycle’ project for Visiting Faculty giving away 1 billion core-hours for researchers (10 numbers)

Mark Ryland, Chief Solution Architect, Amazon Web Services gave some very cool demos demonstrating quick setup and processing for bigdata problems on Amazon’s cloud infrastructure. He mentioned things such as:

- Taser – by Police Dept for video storage on the cloud using AWS infrastructure

- S3, EC2, Relational Data Service infrastructure

- DynamoDB - NoSQL 100Ks of IO/s

- Amazon Elastic MapReduce (EMR) (used by Yelp - 400 GB of log data per day)

- 1000 genome project - Federal Govt initiative

- BioSense 2.0

- Coursera site running on AWS

- NYU Langone project

Charles Kaminski, Chief Architect, LexisNexis also demonstrated their cloud infrastructure for bigdata problems. (http://aws.hpccsystems.com) He mentioned LexisNexis projects for scientific computing such as:

- thor - data crunching

- roxie - key-value store + complex data process

- ECL language - transformative data graph used for setting up workflows for thor and roxie

Ed Pednault, CTO , Scalable Analytics, Business Analytics and Math Sciences, IBM Research also mentioned very interesting bigdata work happening at IBM.

Wednesday, June 29, 2011

Greasemonkey, jQuery and Strophe

Greasemoney is a Firefox add-on that allows third-party applications to add functionality to any site. With javascript libraries such as jQuery, it becomes really easy for a newbie to do this now. In fact, there are thousands of example Greasemonkey scripts to learn from today.

I needed to connect a conversational recommendation engine that recommends relevant content including people for domain specific conversations. When recommending people for different conversations, there is not much transactional value unless the system can bi-directionally ping the users (the recommended user should also get notified about the conversation so that she can possibly respond to the users in the conversation).

Trying to implement this functionality for a third party site, I found this neat XMPP library called Strophe. With the new HTML5 CORS specification, it becomes straightforward to use this client-side IM library for Jabber notifications on external websites using scripts. I had to add just a couple of lines in my apache server comfiguration to enable this feature. I will be updating the scripts online (with notifications) here.

When Information Technologies meet Communication Technologies, a new dimension is added to the existing IT space that allows realization of Vannevar Bush's 'As we may think' ideas. For example, I used Google+ ysterday which seems to have twined a nice combination of these two technologies together (ICT) with feeds from various networks and a logical integration with mail, chat, buzz and blog servers. Google wave failed initially for many reasons such as arriving too early in time and poor interface(with becoming an Apache project, I am sure there will be many cool applications coming out using the technology) but I thought it was a very cool project.

Thanks to open source technologies and foundations (eg. Apache) that have made available tremendous high quality projects and resources available to community to build up on and do creative stuff with it.

Friday, June 17, 2011

Cobot: Modeling long term user activity with time

What are users getting interested in with time?

How are the interests changing/evolving?

These are two important questions Cobot tries to ask and answer. More specifically, Cobot uses domain specific dictionaries (for Health and Education domains) to extract concepts from user's conversations and deciphers user's short term and long term interests based on her conversations.

Here, you will see a couple of LTM (Long term model) graphs for a user asking questions on a site about Math topics in March this year snapshot-ed every 3 days (or more depending on activity).

In these graphs, you will see some new terms getting added, and associations between terms (based on multiple co-occurrences in STM) developing and decaying with time.

What We are trying to do is to heuristically infer some parameters like window for snapshot based on activity(related to user's short term and long term memories), learning and unlearning rates (how fast are users learning and unlearning things - related to semantic memory) for every user, etc. This modeling (done well) eventually helps Cobot to pick right users for recommending in different conversations. (We do a Spreading Activation search in user's LTM graphs mixed with other techniques for user recommendation.)

(There is a similar vector space based model for modeling user's STM (short term model) interests as well in Cobot.)

Thursday, June 02, 2011

News

How should news look like for both publishers and consumers? Here is one quick sketch of a news publishing/consumption prototype.

There are some neat startups in this arena (like zite, flipboard, gabacus and ctrl-news) that blend together amazing UI with great backend filtering and communication technologies.