Skip to main content

MLOps: what is it and what you need to know

Our world moves at a rapid pace and with every turn of our planet, the data we collect and use to keep up this pace grows. For enterprises to avoid drowning in data-quicksand, it is essential that they scale their AI capabilities to unearth the insights within the data. A majority of enterprises are stuck in the experimentation phase, and with the proliferation of cloud technologies there is no room for inefficiency.1 While there has been some improvement over recent years, the latest State of AI report by Deloitte has found that only 27% of surveyed businesses were Transformers and had achieved substantial AI deployments at scale.2 So, the question arises; how can enterprises shift towards this industrialised and repeatable AI process at scale and reap the benefits of this incredible technology?

Learning from history

As they say, necessity is the mother of invention. Travel a decade back in history, and you would find that the entire software industry was going through a similar scaling challenge. Time taken to deploy new software features at scale was quite high due to the siloed way of working between developers and the operations community. The entire software development process was divided into multiple independent links and executed by different disparate teams. Fast forward and many high-tech enterprises are now churning out production code every minute at scale with the highest levels of reliability; all thanks to the movement called ‘DevOps’.

But what exactly is DevOps?

Under a DevOps model, the development team and operations team work together towards a common software development goal. This team works with a set of guiding principles focused on systems thinking, enhanced feedback loops and inculcating a culture of continuous experimentation and learning. The outcome is shortening the cycle time of code development to deployment in production environments ensuring high quality.

So how can we apply a similar model to the development and deployment of AI?

Introducing Machine Learning Operations (MLOps)

The recent and rapid innovation of cloud-based technologies has provided a plethora of tool sets available to analyse, process and model data. There is also an increasingly skilled and talented workforce of machine learning and data engineers. What’s missing? A framework to bind the technologies and resources so that enterprises can efficiently deliver AI at scale.

Designed with similar principles to DevOps; MLOps focuses on standardisation, optimisation, and automation of ML model deployment activities to achieve scale. The goal of MLOps is to smoothen the process of getting an ML model from an ideation stage to production in the shortest possible time, with minimal risk. It is a methodology that unifies ML development and ML operations. The focus of MLOps is to promote:

  • Reusability
  • Repeatability
  • Reproducibility
  • Automation
  • Collaboration

Key components of MLOps

Some of the key components of establishing MLOps across the ML lifecycle are highlighted below:  

 

Deriving features from data is a critical part of the ML model development cycle. Often it has been observed that data scientists across the enterprise derive the same features for model building in a siloed fashion. The reason being these features remain confined within the individual models and are not made available as a centralised, common and re-usable feature. MLOps prescribes to build a central repository of common features called a ‘feature store’, which can then be consistently re-used by multiple data scientists across many different models. Having a centralised repository as part of the model building cycle saves time by enabling quick discovery of ready-to-use and consistently applied features from across the enterprise.

Building a complex ML model is a multi-stage process. Breaking the task into multiple smaller tasks is critical for development. While in the experimentation phase, a single data scientist can do the end-to-end process; however breaking the work into smaller components is recommended when building a production grade ML model. The concept of an ML pipeline is to break the entire process into individual re-usable components and orchestrate each component accordingly. The aim is to prevent different developers building components from scratch every time. Typical components include data validation routines, data clean-up modules, model training modules, hyperparameter tuning, etc. Solutions assembled from prebuilt components enable higher degrees of automation. This also facilitates a structured and collaborative approach to ML model development.

Key roles involved: Data engineer, Data scientist, ML engineer

Model registries facilitate the governance of ML models in a central enterprise repository. The entire model management process; review, approve, release, and rollback - can be executed efficiently and the enterprise is clear about the version of models running in production at any point in time. Enabling this capability ensures the quality of the models across environments, as well as faster model discovery and diagnosis.

Key roles involved: Data scientist, ML engineer

A model needs to be tested thoroughly before production deployment to ensure that it meets all the necessary business and technical acceptance criteria. Some of the key criteria would include model performance metrics, model fairness, regulation standards and risk compliance. MLOps practice recommends codifying all the evaluation scenarios (including regulation scenarios coined as ‘regulations as code’) as components to be executed as mandatory steps during the deployment process. This prevents low grade models being deployed to production and saves the enterprises from potential financial and reputational risks.

Key roles involved: ML engineer, Software engineer, Domain SME, Regulation & Risk SME

Data drifts can happen over time impacting the predictive accuracy of a model. Performance degradation over time without the enterprise knowing about it, has the potential to significantly impact the business. Continuous monitoring of deployed ML models helps to track the efficiency and effectiveness of the models in production to ensure predictive quality and business continuity. Appropriate alert mechanisms help to identify the staleness of a model in real time and trigger actions to retrain the model. 
Key roles involved: ML engineer, Data scientist, Data engineer

What’s the conclusion?

There is no doubt that embracing MLOps helps to increase the productivity, speed, and reliability of the ML lifecycle while also reducing risk to the enterprise. For MLOps to successfully contribute to the scaling of AI across the business, enterprises need to keep in mind that MLOps requires a multi-disciplinary eco-system, hence the right mix of skillsets should be brought in across the enterprise to lay the foundation for MLOps. Scaling AI is not just for data scientists anymore, it’s a team game and everyone has a role to  play.