CDD Playbook - Test and Fix

Update: A lot of things have changed since this page was written. Rasa X, the freemium companion tool to Rasa Open Source, is no longer supported or maintained, and we are currently focused on the development of the Rasa Enterprise platform. To learn more about this, you can check out this blog post.

Fix and Test

Download the full CDD Playbook - no email required.

Includes 5 guided activities that help conversational AI teams adopt conversation-driven development, and build the assistants users want.

Jump to chapter:

→ Introduction

→ CDD Self-Assessment

→ Share your Assistant

→ Review and Annotate Conversations

→ Track

→ Conclusion

Reviewing conversations can uncover important problems: conversation turns that could have gone more smoothly, buggy custom actions, or phrases the assistant didn’t recognize. The first step of reviewing conversations is to identify interactions where something went wrong; the next step is to identify how to fix the problem. That brings us to the next two stages of CDD: fix and test.

The approach you take to fix a problem will depend on the issue. If the NLU or dialogue model isn’t making correct predictions, the issue likely lies in the training data. A good next step would be to check the training data for mislabeled examples, or add more training examples if the model is having trouble recognizing certain phrases. Or, you might observe that the assistant has trouble with requests that are out-of-scope or out-of-domain. If it makes sense to bring the request in-scope, you could build a new feature. If the request is something that you don’t intend to address, providing a fallback or out-of-scope response can help get the user back on the happy path.

Once you’ve identified a possible solution, you can put a fix in place. That might mean annotating or restructuring training data, adjusting custom action code, or changing response templates. No matter which type of changes you make, it’s important to follow engineering best practices like version control, automated testing, and code review to ensure updates don’t introduce new problems.

Conversation-driven development hinges on small, iterative updates based on user input. Engineering best practices ensure that the constant flow of updates results in real improvement (and it reduces the risk of regressions).

What does CI/CD and DevOps look like with a machine learning-based application? Full test coverage means evaluating the performance of the machine learning model in addition to the application logic. The Rasa CLI includes commands for running two important types of tests: conversation tests, which measure how well the model performs against a test set of whole conversations, and NLU tests, which measure the model’s ability to classify intents and extract entities. When you run on a CI/CD server, these tests can tell you whether the changes you’ve made have resulted in better performance or introduced a new problem—before you deploy the new model to production.

In this play, you’ll map out the pieces of your development and testing workflow and ensure that the process for moving changes from development to production is a smooth one.

Play 4: Establish a Development and Testing Workflow

Materials:
Rasa X instance, running on a server
Git repository for your assistant (Bitbucket, GitLab, GitHub, etc.)
CI/CD tool (Circle CI, Jenkins, Bitbucket, GitHub Actions, etc.)

Time:
2-3 hours

People:
2-4 team members

Step 1: Connect version control

The starting point for your workflow is version control. Whether you’re annotating data in Rasa X or pushing changes from your local IDE, your remote Git repository is the source of truth for your assistant’s code. Establish the connection between your deployment and version control by using Integrated Version Control to connect Rasa X to your Git repository.

Step 2: Create a conversation test set

Create a conversation test set. Navigate to the Conversations screen in your Rasa X dashboard. Locate a conversation your assistant handled well—one where your assistant gave the correct responses to the user’s requests. You want your conversation test set to be representative; that is, it should reflect the kinds of conversations users typically have with your assistant or a conversation you want to make sure your assistant continues to handle well.

Once you’ve located a successful conversation, click the Save test conversation button to automatically add the conversation to your test set. When you run the rasa test CLI command, the conversation test function checks the model’s prediction at each step, so you can be sure a conversation that worked in the past hasn’t broken.

Step 3: Set up a CI/CD build pipeline

Using the CI/CD tool of your choice, configure your build pipeline. Here’s the basic recipe for deploying a Rasa assistant:

Start with a base image that supports Python 3.6 or 3.7

Install Rasa Open Source and any dependencies needed by your custom actions

Run data validation, to check for mistakes in training data

Train the new model

Run conversation tests

Test the NLU model

Deploy the new model

Step 4: Test the deployment

Trigger the deployment pipeline by making an update, for example, by annotating a few user messages in the NLU inbox. After annotating a message, you’ll see the Integrated Version Control indicator turn orange; click it and push the changes to Git.

Trigger your CI/CD pipeline (you can configure which actions start the pipeline, but a common trigger is opening a pull request). Review the pull request as a team. Look through the test results produced by the CI run, and merge the pull request if the results check out.

Discussion Questions

What is the team’s current process for making sure changes work in production?

Who is responsible for reviewing and approving updates?

What is the threshold for passing tests before a change can be merged?

How does the team approach writing unit tests for action code?

When might manual testing be required?

→ Next: Track

← Back: Review and Annotate Conversations