The Third Era of AI is Upon Us
It’s almost impossible to scroll through social media these days without seeing stories about OpenAI, GPT3, Large Language Models (LLM), ChatGPT and Bing’s chat assistant. This flood of news makes it seem like AI has only now just arrived. Of course, we in the collaboration industry have been using AI technologies for a long time! We can think broadly of technology for collaboration being broken into two categories – speech (speech recognition and text-to-speech) and language (language understanding, text classification, and summarization being the primary use cases). These technologies have gone through different eras, and I believe the arrival of this new tech marks the third, and most important era of all.
The first era is pretty much everything prior to the arrival of deep learning. This era arguably starts with the beginning of the use of computers back into the 1950s or 1960s. As early as 1951, Marvin Minsky and Dean Edmunds built an artificial neural network using 3000 vacuum tubes, and in 1952 Arthur.
A hallmark of both speech and natural language technology in this era, is that it was bespoke. This means that, it was possible to build AI models to perform this task. However, to get good results, the technology had to be customized for each and every use case. For speech recognition, businesses would build their own grammars that represented their own unique vocabularies. These were expensive investments, taking months of time and lots of money to produce. As a result, in this era, only the largest of businesses could really take advantage of the tech. It was largely used for voice responses systems in contact centers.
The second era began in 2012 with the application of deep neural networks (DNNs). It began with a landmark paper that resulted in huge steps forward in accuracy for speech recognition, and later, natural language understanding tasks too. This, in turn, enabled applications like Siri and Alexa in the consumer space. In the enterprise, it meant that businesses could use off-the-shelf speech recognition systems from vendors like Google, Amazon and IBM, without needing to build custom grammar or vocabularies. These generic models were good enough. Meeting vendors could now provide automated transcription and translation features. In the contact center, this era enabled many businesses to implement directed dialog IVRs. Instead of “pressing one for sales, 2 for support”, users could just say “sales” or “support”. These were also widely deployed. Language processing got better too, with tools like Google’s Dialogflow. But you still needed to define bespoke models – intents, training phrases, and entities – and iterate on those until you achieved the desired accuracy. Possible, but still not widely deployed because it was difficult and expensive at scale. That has held back more advanced products like natural language voicebot and chatbots, agent assistants, and automated QM from wide adoption.
This is now changing with the arrival of Large Language Models (LLMs), most notably OpenAI’s GPT3 and ChatGPT. Most folks talk about how it is good at generating content, and how good of a chatbot it is. This is all true, but it misses the true innovation here. Technically, this innovation is called “zero shot learning”. What it means is that you can achieve results for a desired task without the need to collect training data, train the model, measure accuracy and iterate. Instead, you just describe your desired task in plain English, and the model does it for you. This means that natural language technologies become generic too. You just need one model, and it can be applied across many different businesses and use cases – just like speech recognition in the prior era. It is no longer necessary to collect piles of data to get great results.
The significance of a generic natural language model cannot be understated. It will dramatically lower the cost of entry for AI applications utilizing natural language – everything from meeting summarization to contact center call summarization, from voice and chat bots to automated call scoring. Almost no area of our industry will be unaffected.
It is going to be the key that unlocks an entirely new generation of products and services in the collaboration industry. Buckle up, things are about to get really interesting.