About Cloud Data Science Platforms

In recent years we've seen machine learning model development evolve from a set of technologies used only by data science specialists into a mainstream extension of data analytics employed by a wide range of disciplines.  As the demand for easy to deploy and scalable data science tools has increased, cloud providers have stepped up their offerings.

All top-tier general-purpose public cloud providers now offer cloud data science platforms, for example Amazon SageMaker and Bedrock, Microsoft Azure Machine Learning and OpenAI Servies, Google Vertex and Gemini.

These services primarily provide the five key elements of a Data Science platforms:

  1. Data integration & storage
  2. Scalable Compute
  3. Model Development
  4. Model Development & Deployment (MLOps)

While a completely on-premises Data Science platform can be built and used effectively, the scalability, flexibility and integration efficiency of cloud-centric have made cloud DS platforms a go-to solution for both big tech and enterprise users.

Microsoft's Data Science Platforms

Microsoft's two key service offering for Data Science model development and deployment are Azure Machine Learning and Microsoft Fabric Data Science. Synapse Analytics is a third, but new users may opt for Fabric DS (which incorporates Synapse tech), so I won't discuss it in this article.

Each platform has specific strengths--and quite a bit of overlap as well.  Sorting out which Azure service is most appropriate for a specific scenario isn't always obvious.

Azure Machine Learning

Azure Machine learning is a Platform-as-a-Service Machine Learning/Data Science platform squarely aimed at "traditional" Machine Learning model development teams.  

Its specific strengths in my view are:

  • Flexible Local Compute.  Compute instances and Kubernetes clusters can be provisioned a wide range of characteristics (CPU, GPU, RAM, Storage). Provisioning compute in Azure ML is very similar to creating VMs to run ML jobs--but Azure ML compute is integrated with the development experience and provides a superior experience compared with custom VM deployment.
  • Local/VM Compute. Azure ML also supports running DS tasks on local compute (i.e. on your own machine locally) and can integrate with custom VMs if custom compute profiles are needed.
  • Real-time and Batch inference deployment. As models are perfected and approved to serve production workloads, Azure ML users can deploy them directly from the Azure ML model registry to Kubernetes clusters.
  • Model Catalog. A more recent addition to Azure ML is a model catalog that serves as a source for Foundation Models. This allows Azure ML users to--for example--to deploy an instance of a Foundation Model like Llama2 from Hugging Face to an inference cluster.

Microsoft Fabric Data Science

Fabric Data Science could be thought of as the evolution of Synapse Analytics ML services. In that sense, Fabric is new, but Fabric DS is a mature service offering in its own right.

In my view, Fabric Data Science has three fundamental differences from Azure ML that provide unique strengths:

  1. It's integrated as a module of the Software-as-a-Service Fabric platform and uses Fabric compute capacities, rather than a PaaS service. Generally it requires less configuration/management of compute and data resources than Azure ML.
  2. Its architecture makes it extremely well suited to integrate with data stored in a Fabric Data Lake/Data Warehouse both for training and inference use cases.
  3. The default ML processing model is based on Apache Spark ML, making it well suited for efficiently training models against big data/large data volumes that are found in Data Lakes.

Is there Clear Decision Graph?

As with most technology selection decisions, there's not a clear choice between these two excellent services.  For example, both Azure ML and Fabric DS have excellent coverage across a range of requirements and features:

  • Both support the most common DS languages (Python and R) for development.
  • Both encourage Jupyter Notebooks as the default development tooling.
  • Both can import custom libraries from public repositories.
  • Both support Visual Studio Code for client-side IDE development.
  • Both support MLflow model registration and management.
  • Both support Git version control.
  • Both have excellent workgroup/co-editing support.
  • Both support common ML training frameworks, such as the popular scikit-learn.

That said, I don't want to end this post by saying "it depends" or "they're both good--you decide".  The most important influencer of which platform is "best" as of this writing may depend on what the data source for DS work will be, and how deployed models will be used to score data in production.

Below are a couple examples where Fabric or Azure ML may be better. This list certainly isn't exhaustive, and the best choice as always has many variables and won't be triggered by a single feature!

💡
As of this writing (February 2024), Fabric feature development is fast-moving, and Microsoft can be expected to integrate many of Azure ML's advantages into Fabric over time. I'll try to update this post as Fabric development changes.

When Fabric DS may be the best choice

  • For developing models over training data that's stored in Fabric data lakes/data warehouses (or can be accessed via a OneLake Shortcut), Fabric DS may be the best choice.  If data is in OneLake, a data scientist can use it directly without creating a data set or copying data from other sources.
  • If models will be used to score data as it's processed in a data lake (e.g. adding predicted columns to Semantic Models), Fabric DS may be the best choice, as it supports this use case directly in a few lines of Jupyter code.

When Azure ML may be the best choice

  • Azure ML well suited for using ad-hoc data sets and supports data set versioning out of the box. If most data are external to the company's Fabric deployment, Azure ML may make more sense.
  • Azure ML has mature support for Real-time and Batch endpoints, for example to deploy models that will be leveraged by web applications, while Fabric is primarily designed to leverage models against Fabric OneLake data.
💡
As of this writing (February 2024), the Fabric roadmap contains a Model Endpoint feature as well, though the details of how this feature will work and what use cases it will support is not public information yet.

Do I have to Choose?

Actually, you don't.  Fabric and Azure ML already integrate well together. For example, data can be provisioned in Azure storage accounts so that both Azure ML and Fabric DS can access it.  Also, Azure ML models are primarily consumed via inference endpoints--which can be called from Fabric notebooks.  The risk of "making the wrong choice" is minimal since an integration usually is possible to cover edge cases.

Examples where both services might be used:

  • Deploying a foundation model from the Azure ML catalog to an Azure ML endpoint, then calling that endpoint from a Fabric Notebook.
  • Training a model in Fabric (against a large Data Lake data set), then deploying it to a Real-time inference endpoint in Azure ML for use by external web applications.

Summary

I hope this article was thought-provoking and gave some foundation of factors to consider when evaluating cloud data science platforms. I regularly use both Azure ML and Fabric DS, and they're both excellent products. In most of scenarios, I could use either product to satisfy project requirements.

The advantages in my view are the margins and are probably short-term. As time goes on, I expect the integration between these products to make them feel more like related tools in a single toolbox than separate services.