Keynote: Perspective on the 5 Levels of Conversational AI

Rasa CTO and co-founder, Alan Nichol revisits how we’re thinking about the 5 Levels of AI assistants. In this talk from the L3-AI conference, Alan discusses how Conversational AI is a huge opportunity to build the most user-friendly applications in the history of software, but to get to level 5 we “just” have to listen.

Originally aired: June 18, 2020

Get news about product updates, webinars, and events from Rasa:

Transcript:

Alan Nichol:

Hi everyone, welcome to L3-AI. This is the first time we're doing this conference. My name is Allen. I'm the co founder and CTO of Rasa. And it's great to have you all joining us today.

That brings us on to my talk today, which is about an update to the five levels of conversational AI. The five levels are something we introduced in 2018, and we felt it was time for an upgrade. Initially, they served as a way of helping people talk about AI assistants beyond just saying, well, all chat bots are dumb and there's this thing called artificial intelligence, which is coming. And then it's all going to be amazing, right? Giving us a structured model to talk about the maturity of this technology. And people have really picked it up. They've pulled it apart and they've done their own versions of it. They've iterated on it. And we felt, it's been a couple of years, the tech has moved on, the market is much bigger and there are far more people thinking about this now, it's probably a good time to do an update. And so that's the subject of today's talk, is kind of a redefining what the five levels are and talking about how we are going to get there.

And just one idea I want you to keep in mind as we talk about the five levels. Just keep this in the back of your mind as I'm talking about them. Is that all of you who have assistants running and talking to real users right now are already gathering the data that you need to climb through level three, up to level four, and level five, and we just need to listen. We need to listen to what people are saying to us. Just keep that in mind as we talk about these. But firstly, I want to talk about why any of this matters, why it even matters if we can build level five conversational AI. Personally to me, it's important because we can build technology that serves far more people than ever before. These are my parents here on the screen. They're reasonably technically savvy, but there are so many things that I experienced and I take for granted, services that are enabled by the internet and other technology that they're just excluded from.

And I think it's a real shame. And I think that we have an opportunity if we let people interact with computers the same way they interact with people, we have an opportunity to bring a lot more people along for the ride. And it's hard enough to make it possible for anyone to use conversational AI, but that's still not the extent of our ambition. We want everyone to be able to build conversational AI because there are a million important and valuable use cases for this technology that will never be commercially viable or would certainly would never be built by any of the big tech companies, right? And so it's not enough to make the technology really good, it also has to be accessible to everyone who has an idea and wants to build something that they see a need for. And so that's of course, why we invest so much in the community and making tools accessible to everybody.

And the only way that this can happen is of course, through a massive developer movement. This gives us two different perspectives on the five levels. The first is on the end user, right? How does my experience change? If you go from level one to two, to three, and on. And the perspective of the developer, the person who's creating the applications, right? What's their experience like? And I think both are needed and both are complimentary. And I'm using developer in a slightly generic sense here. I don't only really strictly mean developers, although the developer experience is core of course, we all know that it takes more than just the developer skillset to build conversational AI, right? There are others involved, of course, and they have yet new perspectives to add.

And so both the end user and the developer experiences have to change dramatically to get to level five. And so on the horizontal axis, we've got the end user experience, which goes on a spectrum from having to be an expert in using this assistant or using this particular tool, to all the way towards level five, where the user doesn't even really have to know what they want at the start of a conversation. And the assistant helps them figure that out. That's kind of the thread that pulls us through the end user experience. And we'll go through each of the levels in turn. On the developer experience side, we've got a real evolution in the developer's role. In the beginning we're kind of saying, well, it's all on the user, tough luck. You can read the documentation. As we want to accommodate greater numbers of users, we have to do that through a process we call conversation driven development, where we look at what people are saying to the assistant, we're listening to them and we're kind of automating that process over time.

We'll start with the end user perspective, which is of course very important. And the theme that draws through the five levels is that the assistant feels more and more like something that's built to help a human and less like an API end point. And so I think that'll become clear when I show some of these examples. And so level one assistants are defined by putting all of the work on the end of the user. And the example I've got here is what looks like a command line app and what it does is it calculates a mortgage quote, right? And so you could imagine an app like this, where you input some parameters, right? The amount of money that the person wants to borrow, the duration, et cetera and it comes out with a quote. Now to write this command correctly, you of course have to be a bit of an expert, right? I mean, I'm a huge command line fan, but it takes a bit of trial and error to get to a level of confidence where you can do this.

But I think that's actually not the main point. I think a form on a website is also a level one assistant, right? Ultimately we're saying to the user, this is the information we need, provide it to us in a valid format, and we will give you an answer, right? And that's of course, much more convenient than having to calculate a mortgage on your own, but it's putting a lot of work on the end user to give kind of the valid input.

And so as we progress beyond level one through to levels, two, three, four, et cetera, each level lowers the burden on the end user to translate what they want into the language of in this case the bank into the language of a mortgage quote. When we think about level two assistants, these are the basic chat bots, right? And so they accept natural language input, right? So the user can say, I'm interested in mortgage rates, right? That's still quite a specific request, but it's in a freeform text. And then they're prompted to provide this information in turn, right? And so they don't have to remember this full command or if they fill out the form or something, they can get some help. But of course the defining quality of level two assistants is that as soon as you do something unexpected, they tend to break.

And so provided the user always complies, right? They stay on the happy path. Every time you ask them for some information, they provide it, things work well, right? If you deviate, if you ask a contextual question, if you ask the bot to clarify something, if you change your mind about something that happened earlier, things tend to go badly very quickly. And so the end user doesn't really have to know anymore how to type out this long command, but there is still a lot work on their part because ultimately they're responsible for kind of knowing how to not break the chat bot, right? They're ultimately having to provide the information in a certain way.

Whereas when we move from level two to level three, we're able to handle a bit more back and forth. And we're able to say, okay, within this fixed narrow domain of providing a mortgage quote, we can have relatively fluid conversation so that when a user has a contextual question, they want clarification, they want to follow ups, they want to change their mind about things, they want to compare and contrast multiple things that we can do that. And we can handle those kinds of edge cases. And when we go from three to level four, we take it up a notch, right? Here the users are coming in. And they are not saying I want a mortgage quote. They're saying my kids have gone to college and I want to downsize, right? That is very much more like something a human would say. And as a developer you say, well, this user's providing ambiguous input, right? What do they want? They're not telling me what they want. Do they want a mortgage quote or what's the deal?

But I think that's actually the wrong perspective, right? The user's not saying anything ambiguous, what the person has said here is perfectly clear and it's up to us to figure out how to help them. And if any of you have worked with conversational AI in customer service, you'll see this quite a lot. You see the first message that comes in from a user. You know, it's a paragraph or two paragraphs tells you all sorts of extra information and then you have to figure out, well, what can we do? Of the things that we can do for you, which one of these is going to solve your problem, right? And so the user doesn't have to know, even that a mortgage quote is the end result, right. That might be the end result, but they don't have to know that coming in.

And then level five takes it to adapting to what the user wants right now. And so when a user comes in, who's clearly done a lot of reading or it's the third or fourth time they're coming back and they're very sophisticated. And they come in with a sophisticated question like this user here, that we give them that level of detail. And we go into that level of depth. And if a user comes in, who just wants a quick answer, they get a quick answer, right? But we adjust the way that we describe things. We adjust the language we use and the level of depth that we go into by reading cues on how users are talking to us.

That's the end user perspective as we go from requiring the user to put in everything, to really consulting the user and helping them out. And at the same time, we can think about the developer experience, which also has to change as we go from levels one, two, to three, and four, and on. The developers' role really transformed. And again, this is the point I have wrote up at the start of the talk. There's a unique thing about conversational AI, which is that uses are literally telling us exactly what they want. They are telling us in their own words, what they want and what their expectations are, right. And I say, we just have to listen in quotes because of course it's much more difficult to actually build those tools, to actually look at conversations and derive insights and learn from real conversations. But the key is that all the data that we need is there, we just have to unlock it. And the other thing which is unique about conversational AI is that unlike other kinds of software, users can drive changes, right?

It's obviously always going to be the case. If you have a successful piece of software, you want to add new features, you want to add new functionality, right? It's doing a great job of providing mortgage quotes, let's help the user with new things. Of course, the developer and the team behind it, are always going to drive for change, but because you're accepting natural language input, the users can drive change, right? And so as I'm recording this, we're in the middle of a humongous public health and economic crisis caused by the coronavirus. And so anyone who has a mortgage advice assistant out today will no doubt have tons of people contacting it, who are worried about making their monthly payments because they've been furloughed, because they've been laid off, right. User behavior can change dramatically overnight, and it has to be able to keep up, it has to be able to evolve. Both sides of the equation also drive AI assistants to get better over time, or at least to serve more users over time.

And let's think about what that means to a developer at level one, it's actually easy. And so unlike in the end user's perspective, the developers job, doesn't just literally get easier all the time. And so on level one we're saying, well, we're putting all the work on the end user, so it's easy for me to add new functionality, right? We can provide mortgage quotes. If I want to add something new, I want to add the savings' calculator, right? I just add new functionality, I cut a release, I make sure the documentation is updated, and the user can educate themselves on this new thing that they can do, right? That's really the level one mentality and that makes the developer's life easy.

When you get to level two, you have a chat bot. It typically has a fixed list of intents that you know about and you support. And then you have a whole bunch of logic that says, in this context, when the user expresses this intent, this is what you do, right? And as you add new functionality, as you try and adjust, and refine, and split intents into smaller intents that are more specific, you keep having to crowbar in one more edge case. And you keep having to sort of adjust one more thing, to get it all to work without breaking everything that worked before, right? The developer's job gets ever harder, the longer a project lives. And at level three, we start to use a process called CDD or conversation driven development to accommodate that users don't think the same way that developers do.

If you're talking to users about savings accounts and about mortgages, those both have an interest rate and a user will happily use interest rate in both of those contexts and not get confused, right? And we have to accommodate that people think about the world differently than developers do. And the process we have for listening to users and accommodating their concepts is called conversation driven development. And it's these six actions. I won't go into them in detail. If you're interested, check out the Rasa blog, where we talk about this whole process. But the key thing is looking at what people are saying, paying attention to what they're saying, looking at where things go wrong and why things go well. And using that to improve the assistant over time.

And then at level four, we start to make the developer's life easier again by automating parts of that process, right? We can say, well, look, this conversation looks like it probably went pretty well. This conversation really went off the rails, you should probably look at that one. Here are cases where NLU probably made a mistake, it probably missed something here, you should go and look at that. And we can automate large parts of the CDD process. And then when it gets to level five, technically we're capable of fully automating the process of CDD, so we can say here's a task to help a user with, here are the specifications of what you need. You go and you talk to a few users as a practice, and then now you can do it and you're perfectly capable. That said, I don't think we should fully automate CDD. I think a human should be involved to look out for problematic behavior, to steer things in the right direction. But in theory, the process of gaining experience by talking to people and using that to improve the assistant becomes fully automated.

And if we think about how that developer's role evolves. I think we can draw an analogy with things like application monitoring, where if you look at a modern stack of monitoring a large cloud deployment with lots of different services, it involves a high degree of automation, it involves a huge amount of data and a single developer can handle much more complexity than they ever could by hacking with individual log files on a laptop, right? And similarly, I think as the tools evolve to build level three, four, and five conversational AI the developer's role will also change.

And so there, we have the two perspectives of the five levels of conversational AI, from the end user and also from the developer, but how are we going to get there? That's the final section of the talk. How are we going to get to level five conversational AI? Well, the first thing I'll say is, I'll tell you how we're not going to get there. We're not waiting for a press release from one of the big tech companies. We're not waiting for one more, even bigger model to get published. That's not how it's going to happen. What I believe we need is open source community and applied research. And we can look to the web for inspiration, right? The web looks very different from the way it did in the mid nineties. And if we think about all the things happened, all the web 2.0 ideas that had to be created. Those were all built on top of open source code, right?

You never needed anybody's permission to go and build a new website, to try something new, to try a new idea. And that makes it field progress much faster, that's a key part. And then in terms of community, machine learning is not just... And conversational AI is not confined to machine learning and machine learning is not confined to data scientists, right? Everyone who's involved in software engineering these days and increasingly in the future will know at least a bit about machine learning. It's becoming a foundational piece of the engineering stack. And so we've invested in a lot of great educational content NLP for developers and the algorithm whiteboard on our YouTube channel, which are extremely popular because we want to invest in the community and give them the tools to understand the algorithms that they're using.

And in terms of research, well, we want to be the best place in the world to do applied research in conversational AI. You can check out some of the projects we're working on at rasa.com/research, but more importantly, I'll plug Tanja's talk, which is later today. Where she talks about the process of how we take a feature from the research team, from just an idea all the way to something we ship in production and what that looks like and all the different ways that the community is involved. And finally conversational AI need you, so if you ask me what is the answer to how we're going to get to level five conversational AI, well, the real answer is, all of you. And so I'll leave you with some homework, things to go and do to help accelerate progress towards level five. Firstly, go build something great, something that you see a need for.

We've got a showcase on our website. We'd love to wholesale your cool projects on there. If you get stuck and you will get stuck because it's hard, go to the forum and let us know, let us know what workarounds you found. Other people will have similar ideas. Maybe they've encountered similar problems. Talk to others about conversation driven development. Talk to your teammates, not just developers, but everyone about how you're going to have a user centric approach to improving your assistant over time.

And if you disagree with the way I've defined the five levels, or you have a new perspective to add, right. We've only had two perspectives so far, but we know that conversation designers, product owners, all sorts of other people are involved in bringing AI into production, so if you disagree or you have something to add. Shoot me an email I'd love to hear from you. And there's a blog post to go with this talk. It goes into a little bit more depth and has a bunch of links for you to check out. So go to blog.rasa.com to read up on it. And otherwise, thank you all very much for listening. And I look forward to seeing you in the Q&A in just a couple of minutes.