November 2022 - Cloud Services | Cloud Migration | Cloud Consulting

Why is MLOps necessary?

The field of statistics and machine learning has witnessed significant advancements, including the emergence of Artificial Intelligence (AI). Prominent AI-based products like Apple's Siri and Amazon's Alexa exemplify the practicality and longevity of AI technology.

From a Data Scientist's perspective, developing a model, even a simple one like a binary classifier, involves a considerable amount of work. However, that is just the beginning. Integrating the model into a continuous development and delivery cycle requires additional effort.

Data Scientists often struggle to grasp the systems necessary for automating tasks related to their models, such as data ETL, feature engineering, model training, inference, hyperparameter optimization, and performance monitoring. Automating all these components can be challenging.

This is where MLOps comes into play, bridging the gap between DevOps CI/CD practices and the world of data science.

Building an MLOps Infrastructure

Constructing an MLOps infrastructure is one aspect, but becoming proficient in its use requires time and effort. For early-career Data Scientists, it may seem overwhelming to learn how to leverage cloud infrastructure while also developing production-ready Python code. Simply relying on a Jupyter notebook outputting predictions to a CSV file is insufficient in the current machine learning landscape.

Established companies with a history of Data Science projects typically have dedicated DevOps and Data Engineer/Machine Learning Engineer roles. These professionals work closely with Data Scientist teams to handle the various tasks involved in deploying machine learning models in production. Some companies may have even developed custom tools and infrastructure to facilitate easier model deployment. However, many Data Science teams and data-driven organizations are still navigating the complexities of MLOps implementation.

Why Choose SageMaker Pipelines?

One challenge in building an MLOps infrastructure is the multitude of approaches available for its construction and deployment. Fortunately, AWS, as the leading cloud provider, offers a comprehensive suite of tools to address these needs. AWS's commitment to Data Science is evident in their SageMaker product, which continually introduces new features.

AWS aims to address some of the technical debt associated with production machine learning. I have recently been involved in a project that built and deployed an MLOps pipeline for edge devices using SageMaker Pipelines, which provides valuable insights into its strengths and areas for improvement compared to a completely custom-built MLOps pipeline.

The SageMaker Pipelines approach is ambitious. Instead of Data Scientists needing to master complex cloud infrastructure, what if they could deploy to production by simply learning a single Python SDK? The initial stages of learning can be conducted locally without relying on the AWS cloud.

SageMaker Pipelines streamlines MLOps for Data Scientists. The entire MLOps pipeline can be defined in a Jupyter Notebook, enabling automation of the entire process. AWS offers numerous prebuilt containers for data engineering, model training, and model monitoring, specifically tailored for their platform. However, users can also leverage their own containers to handle tasks not supported out of the box. Additional niche features, such as out-of-network training, provide security against external interference during model training by isolating the environment from the internet.

Model versioning can be easily managed through the model registry. If multiple use cases require different versions of the same model architecture, selecting the appropriate version from the SageMaker UI or Python SDK allows for seamless adaptation of the pipeline. This approach facilitates the reuse of components across different projects, leading to faster development cycles and reduced time to production.

SageMaker Pipelines automatically logs every step of the workflow, capturing details such as training instance sizes and model hyperparameters. Deployment to the SageMaker Endpoint is seamless, and post-deployment, models can be automatically monitored for concept drift in data or API latencies. Multiple versions of models can be deployed simultaneously, enabling A/B testing to determine the most effective one.

Moreover, SageMaker provides tools and seamless integration with Pipelines for deploying models to edge devices, such as Raspberry Pi 4 or similar platforms. Models can be recompiled for specific device types using SageMaker Neo Compilation jobs, ensuring compatibility, and then deployed to fleets using SageMaker fleet management.

Considerations before Choosing SageMaker Pipelines

By consolidating these features into a single service accessible through an SDK and UI, Amazon has automated a significant portion of the CI/CD work required to deploy machine learning models into production at scale, aligning with agile project development methodologies. Additionally, other SageMaker products, such as Feature Store or Forekaster, can be leveraged as needed.

While SageMaker Pipelines is an excellent product to begin with, it does have limitations. It is well-suited for batch learning scenarios but lacks support for streaming/online learning tasks at present.

For Citizen Data Scientists, who may not possess advanced Python skills, SageMaker Pipelines may not be the ideal choice. Such individuals may find BI products like Tableau or Qlik, which use SageMaker Autopilot as their ML backend, more suitable. Alternatively, products like DataRobot can also be considered.

Additionally, in scenarios where software products experience high usage, the SageMaker Endpoints model API deployment may fall short. If the API receives overwhelming traffic, resulting in an inability to handle requests, simply increasing the cluster size within SageMaker Pipelines is insufficient. In such cases, employing a Kubernetes cluster with horizontal scaling is recommended to ensure the model can handle increasing API traffic.

Overall, SageMaker Pipelines is a well-packaged product with numerous useful features. The challenge with MLOps on AWS has been the abundance of different methodologies for achieving the same outcome. SageMaker Pipelines represents an effort to streamline and package these methodologies for machine learning pipeline creation.

AWS MLOps is an excellent choice for working with batch learning models and swiftly creating efficient machine learning pipelines. However, if you're dealing with online learning or reinforcement models, a custom solution is required. Moreover, if autoscaling is a priority, API deployments need to be managed manually as SageMaker endpoints may not meet the necessary requirements.

For a comprehensive architecture example, refer to this AWS blog:

https://aws.amazon.com/blogs/machine-learning/automate-model-retraining-with-amazon-sagemaker-pipelines-when-drift-is-detected/

Why is MLOps necessary?

Building an MLOps Infrastructure

Why Choose SageMaker Pipelines?

Considerations before Choosing SageMaker Pipelines

START TODAY

WHAT WE DO

SOLUTIONS

COMPANY