Update: A lot of things have changed since this page was written. Rasa X, the freemium companion tool to Rasa Open Source, is no longer supported or maintained, and we are currently focused on the development of the Rasa Enterprise platform. To learn more about this, you can check out this blog post.
Download the full CDD Playbook - no email required.
Includes 5 guided activities that help conversational AI teams adopt conversation-driven development, and build the assistants users want.
Jump to chapter:
→ CDD Self-Assessment
→ Share Your Assistant
→ Review and Annotate Conversations
→ Fix and Test
Conversational interfaces are open ended—a source of their appeal to users and a challenge to the teams who build them. How can you anticipate what users might say when they could say...anything?
The truth is, trying to anticipate every conversation path or message is impractical or even impossible. Development teams can spend many cycles building out conversation paths they expect users to take, only to find users are inclined to take another path entirely. Successful teams avoid building speculative features and instead take their cues from users.
AI assistants offer a built-in solution to the challenge of learning what users want. The best predictor of what future users will say to your assistant is what users have said in the past. We can use conversation data to understand the user based on past interactions and even feed conversations back into the assistant as training data, allowing the assistant to learn and better recognize what users are saying over time. Users tell us what they want through the assistant, in their own words.
This principle is one key idea behind conversation-driven development (CDD): that we can build better AI assistants when we listen to users and use what they say to guide our development. At Rasa, we came together around CDD because we realized that a conversational AI framework alone wasn’t enough to build Level 3 assistants and beyond. Technology will get you part of the way, but design and development practices also play a large role. From your team structure to your workflows, the methodology around building AI assistants is just as important as the tools.
Conversation-driven development doesn’t stop with leveraging conversation data; it also encompasses the engineering practices that help teams build resilient, reliable assistants. CDD touches every part of the development process, from design, to testing, to DevOps. This set of principles captures what we’ve learned by working with thousands of chatbot developers, lessons that provide a blueprint for building better AI assistants.
CDD isn’t proprietary, and it isn’t tied to a particular framework. It’s a set of activities and design principles that help conversational AI teams build AI assistants that really help users.
Conversational interfaces give developers unique insight into user feedback because every conversation users have with an AI assistant is recorded in the assistant’s database. Instead of making educated guesses about how well users’ needs are being met, you can see exactly how each interaction went in the conversation data. Even better, these conversations can become training data that helps your model make better and more accurate decisions over time.
One of the core ideas of CDD is making full use of conversation data: conversations provide training data for your model and guide your development decisions by showing you what is and isn’t working about your assistant’s design. The other piece of CDD is rooted in engineering best practices and DevOps.
We’ve incorporated these ideas into six steps that make up CDD:
- Share - Test your prototype early in the development process with users from outside your development team.
- Review - Read the conversations that users have with your assistant, instead of focusing entirely on top-level metrics.
- Annotate - Convert the messages that users send to your assistant into training examples.
- Test - Make testing an automated step, every time you deploy new changes.
- Track - Measure meaningful metrics, so you can understand what’s working and not working.
- Fix - Reduce failures over time by making adjustments based on analyzing your assistant’s interactions.
Teams building an AI assistant will likely perform several of these steps simultaneously and move back and forth between them. The process also involves multiple roles and skill sets. Developers, content creators, and product owners all work together to make CDD happen.
While CDD isn’t specific to any single technology, it’s easier to do with the right tooling. We developed Rasa X to be the tool that would make it simple to sift through conversation data and make it actionable. Rasa X layers on top of Rasa Open Source and provides a UI for reviewing conversation data and annotating user messages. It also includes features for testing your assistant and sharing it with other users before you go live.
We’ll show you how Rasa X can be used to support CDD throughout this playbook.
CDD at Rasa - How we know it works
The idea of conversation-driven development was born out of our work with customers and through building assistants of our own. Qualitative feedback from users showed us that we were on the right path, but we wanted to understand the tangible benefits of building assistants using these methods.
We recently conducted an experiment to compare CDD methodology with techniques to generate training data that are contrary to CDD: for example, using a script to automatically generate a large number of paraphrased training phrases. We compared two assistants side-by-side, identical except for their NLU training data. Our CDD assistant was shared with a large group of testers early in development, and the user messages collected from test conversations were used to build up the assistant’s NLU training data. The other assistant’s training data was automatically generated by a script and supplemented with phrases written by the development team.
We then tested the accuracy of each assistant’s NLU model. As we’d imagined, the CDD model was able to generalize better than the model trained on hypothetical and autogenerated data. The autogenerated data set was larger, but the CDD dataset was representative; that is, it contained the types of messages real users are likely to send. A 5-fold cross-validation on a model trained on the CDD data produced an F1 score of 90%, whereas a model trained on the autogenerated data and tested against the CDD data set scored just 81%.
If you’re thinking it sounds like the CDD data set has an unfair advantage, you’re right, but not for the reasons you might think. It’s true that the CDD model was trained and tested from the same distribution, but this isn’t too far off from what the model would encounter in the real world. As you practice CDD over time, you’ll reach a point where most of the messages users send will already be in the training data—and DIET (our NLU architecture) typically fits to training data messages with 100% accuracy.
When we tested the autogenerated model against its own autogenerated data, the F1 jumped to 94%, but keep in mind that this only tests the model’s ability to classify the same type of manufactured messages it was trained on. If we test the same model against real-world messages from the CDD data set (a much better indicator of real-world performance), the F1 score is a full 13 points lower.
But there’s more to CDD than creating data from real-world user messages. The six steps of CDD are drawn from our experience building assistants at Rasa. Whether it’s tracking success rates in Carbon bot, an assistant built by the Rasa research team, or the CI/CD workflow we use to release updates to our starter packs and demo assistant, these methods have been tried and tested by the Rasa team, and popularized by the community.
How to Use this Playbook
This playbook includes 5 guided activities to help you put conversation-driven development into practice. The plays are designed for developers, product owners, and conversation designers—anyone who builds AI assistants and wants to create a better experience for their users.
Within each play, you’ll find details on the concepts behind CDD, as well as activities to do with your team and questions to spark discussion. At the end of the playbook, you’ll find a checklist for tracking your progress and additional resources.
To get the most value out of the plays, you’ll need to have an AI assistant that’s at least in the prototype phase. If you haven’t started building your assistant yet, you can still read along to get familiar with the principles behind CDD, but you’ll need a simple assistant to follow along with the hands-on activities. Check out a few of the resources here if you’re just getting started building your assistant. You can also use one of the Rasa Starter Packs as a pre-built starting point.