How to Automate the End-to-End Lifecycle of Machine Learning Applications

Michael Knighten
June 26, 2020

Machine Learning (and deep learning) applications are quickly gaining in popularity, but keeping the process agile by continuously improving it is getting more and more complex.

There are many reasons for this, but primarily, behaviors are complex and difficult to anticipate, making them resistant to proper testing, harder to explain, and thus not easy to improve.

To solve this, Continuous Delivery for Machine Learning (CD4ML) brings Continuous Delivery principles and practices to Machine Learning applications. This reduces complexity and increases the predictability of code behavior as well as makes testing and improving more efficient.

In this article, we’re going to explain how CD4ML automates the end-to-end lifecycle of Machine Learning to improve it and reduce complexity.

Continuous Delivery and How it Automates the Machine Learning Lifecycle

In the paper, Hidden Technical Debt in Machine Learning Systems, D. Sculley, et. al., explained the effects of technical debt on Machine Learning applications.

Technical debt is a result of development teams taking action to expedite the delivery of functionality that later needs to be refactored. This refactoring adds to the complexity, creates additional dependencies, and makes the environment difficult to reproduce, test, and iterate on.

To combat these challenges, Continuous Delivery helps to automate processes, improve quality, and create a more reliable process that can be reproduced in the production process.

Continuous Delivery for Machine Learning (CD4ML) is a software engineering approach in which a cross-functional team produces Machine Learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short, adaptive cycles.

The Challenges of a Continuous Delivery Model

Challenge: The organizational structure

The first challenge we find when implementing a Continuous Delivery Model in Machine Learning is organizational structure. In an organization composed of many different teams, each team might own a different piece of the development process.

For example, engineers might be constructing a deployment pipeline, while data scientists are attempting to improve the Machine Learning model.

Often in these situations, information and visibility are siloed, and, as handoff from one team to the next occurs, that visibility is lost creating cross-functional friction and project delays.

The end result is often ML models that only work in a testing environment and never make it to a live production environment. Or if they do, they quickly become unmanageable and eventually must be retired.

Challenge: Learning how to make the process repeatable and testable

The second challenge to implementing a Continuous Delivery Model is finding a way to make the process testable and repeatable. As stated previously, with different teams utilizing different tools and workflows, it can become nearly impossible to automate the process end-to-end.

Versioning other artifacts beyond the code is not a simple process. Some may be quite large and require more robust tools to store and retrieve. To solve the organizational challenge, we must utilize Agile MLOps and DevOps strategies and build cross-functional, outcome-oriented teams.

For those just beginning to utilize continuous delivery, you can start by encouraging collaboration, frequent communication, and non-siloed workflows.

In addition to these organizational challenges, we’re now going to explore ways to tackle the technical challenges of CD4ML.

How to Resolve Technical Limitations of CD4ML

1. Make Data Easier to Find

While your core systems will be the biggest source of artificial intelligence data, there is also value in retrieving data from third-party data sources. To make data more readily available, you can utilize a data lake architecture, data warehouse, data streams, or a decentralized data mesh architecture. Whichever method you choose, it’s critical for data to be easy to locate and retrieve. The more your data scientists must work to get the data necessary to do their jobs, the longer it will take to build a production-ready, machine learning model.

Here are a few other steps you can consider to simplify your data architecture:

  • De-normalize multiple files into a single CSV file.
  • Clean up irrelevant data points.
  • Store your post-analysis output into a cloud storage system.
  • Version your dataset based on a central folder structure and naming convention.

2. Validate a Machine Learning Model

Once you’ve analyzed your data and made sure it’s readily available, it’s time to build your workflow model. To begin, you’ll need a model training data set and a validation data set to try different algorithms and tune your parameters and hyperparameters. Once you’re ready, you can evaluate your model against the validation test data to ensure the quality of its predictions. This process will eventually become your ML pipeline. Keep in mind that your Machine Learning pipeline stages will change frequently, so to finalize your model you will want to utilize a data science version control tool to document your process and aid in the reproducibility of your entire pipeline once you’ve perfected it.

3. Determine Your Model’s Production Use Case

Once you’ve validated your model, you’ll need to determine how it will be used in the production environment.

Here are a few potential use cases:

A. Embedded Model

This treats the artifact as a dependency that is built and packaged within the web application. The application artifact and version are a combination of the application code and the chosen model.

B. Model Deployed as a Separate Service

Here, the model is built within a service that can be deployed independently of the applications. In this model, updates can be released independently but can cause latency, as there will be some kind of remote request for each prediction.

C. Model Published as Data

With this method, the model is deployed independently, but the application will consume it as data. Software release methods such as Blue-Green Deployment or Canary are frequently used with this model. Whichever pattern you decide to implement, there’s always an expectation between the model and whoever is consuming it. It will expect input data to take a particular form, and if your data scientists change that, it can cause integration issues and break the application.

4. Implement Testing and Quality Checks

There are different types of testing that can be done to validate your Machine Learning workflow. While some elements are difficult to automate, there are many automated tests that can improve their quality.

Here are some tests to consider:

  • Tests that validate data against the expected schema, or assumptions about its values.
  • Tests that validate component integration between services to ensure that the expected model is compatible with the application.
  • Tests that validate the model quality and evaluate model performance and help to optimize its parameters.
  • Tests that validate the model bias and fairness and how it performs against baselines for specific data.

There may be far more data points for a given value over another so it’s important to check that there is no inherent bias. While these tests are easy to automate, assessing a model's quality is less cut and dry. If you continue to compute metrics against the same dataset, over time you may overfit your model. So it’s important to vary your dataset so that you can ensure your model quality won’t degrade over time.

5. Use Experiments Tracking

To support your model’s governance, you’ll need to capture data that will help humans determine if the model is ready to be implemented in the production environment.

You may have multiple experiments running at once and many might not make it to production. It’s important to have a deployment process in place to ensure that bad experiments don’t make it to production. For that reason, you must define an approach to track them.

To support experimentation, it’s also important to highlight why you should have an elastic infrastructure. For one, you might need more than one experiment available for training, and an elastic infrastructure makes it possible to run multiple models in production, it also allows you to improve your system's reliability and scalability by spinning up more infrastructure when needed.

A cloud-based infrastructure is a great option for elastic infrastructures and many of the cloud providers are now adding solutions to support an elastic infrastructure.

6. How Will You Configure Model Deployment?

When it comes to deploying in the real world, there are many complex scenarios you can run into, and therefore, it’s important to consider how you’ll deploy your Machine Learning model. Here are a few options to choose from:

  • Multiple models: in some cases, you may need to have more than one model performing the same task for better consumption by your application. In this case, deploying separate models might be the better option to get predictions with a single Rest API call.
  • Shadow models: if you’re considering replacing a model already in production, you can deploy a new model in tandem and send the same production traffic to see how it performs before retiring the old model.
  • Competing models: if you’re trying multiple model versions in production to determine which performs better, there will be added complexity. In this case, routing rules are required to make sure the right traffic is going to the right model and that you’re getting enough feedback to make data-based decisions.
  • Online learning models: this model uses learning algorithms and techniques that can continuously improve based on their new data collection. However, with this model you’ll need to version the training data and the production data to impact performance.

Again, an elastic infrastructure will support greater complexity and make it possible to run multiple models in production as well as improve your application reliability and scalability.

7. Tie Everything Together with your Continuous Delivery Pipeline

Now that you have all of your other critical components in place, you’ll need to tie everything together with a Continuous Delivery pipeline. In CD4ML, there are extra requirements to execute an effective pipeline including:

  • Provision infrastructure and ML execution that trains and captures metrics from multiple model experiments;
  • Utilize a model building, test, deploy process for your data pipelines;
  • Test and validate to decide which models to promote to production; and
  • Provision infrastructure and deployment of your models into production.

Over time, the functionality of your pipeline can be enhanced to perform multiple experiments in tandem and you can define your model governance to check for any bias and make informed decisions about which model should be promoted to production.

Another important component of your Continuous Delivery orchestration is to define a process for rollbacks, in case your model performs poorly once promoted to production. Doing so will provide a safety net, should you encounter problems during the deployment process.

8. Continue to monitor and observe deployment tracking

While we’re on the topic of encountering problems during the deployment process, it’s also important to ensure that you have a program in place to help you track, identify, and pinpoint any errors if they occur.

Once a model is live, you need to have visibility into how it performs in production and close the feedback loop. Deployment tracking software allows you to do this by giving your distributed teams a deploy-centric view of your codebase.

The information produced by your deployment tracking software is made available in a way that all of your teams can easily digest and act on quickly. It also allows you to see what’s being shipped, when, and from whom, so you can be prepared.

Most importantly, deployment tracking measures impact in real-time so you can be sure your model is performing as expected and if not, move quickly to rectify problems before your application suffers.


Even while Machine Learning evolves and complexity increases, so do our ability to manage and deliver applications to production through an applied Continuous Delivery approach.

With Continuous Delivery, we can manage the risks of releasing changes to Machine Learning applications in a predictable, and accurate way by iterating in small and safe increments that can be reproduced and reliably released in faster development cycles.

A critical component to improving and adapting your Machine Learning applications through Continuous Delivery is deployment tracking. Try Sleuth free and you’ll be on your way to successfully automating the end-to-end lifecycle of machine learning applications.