Unleash Your Model's Potential: Step-by-Step Guide to Deploying a Fabric Machine Learning Model on an Azure ML Inference Endpoint

Models developed in Fabric can also be deployed in other environments like Azure Machine Learning (Azure ML), other cloud providers, or even on-premises deployment. In this post we'll take a look at how to export models trained in Fabric and deploy them into other compute infrastructures.

The Use Cases for Exporting Fabric Models

Fabric's Data Science workload uses the distributed Spark cluster which provides the ability to train ML models using large data sets, and trained models can be used with Synapse Transformers to efficiently score large input data efficiently at scale.

However, there may be other production ML uses for the models we trained in Fabric. Some examples:

Deploying of real-time inference endpoints to REST web servers for consumption by other applications.
Deploying batch endpoints to serve ETL pipeline use cases.
Re-using models trained in Fabric in applications running in other cloud infrastructures (AWS, Google, IBM, etc.) or in custom compute instances on-premises or on edge devices.

Because we can export Fabric models, we can leverage Fabric's scalable compute and access to OneLake data sources for training, and repurpose trained models in many other applications across the enterprise.

💡

As of this writing in February 2024, the Fabric roadmap contains a Model Endpoint feature which, when available, may enable some of these uses cases directly from Fabric.

Walk Through

💡

The walk-through and demo of model export, import and Azure ML inference endpoint deployment is covered in the embedded YouTube video. The written summary of the technique continues after the embedded video.

Export the ML Model from Fabric

The first step in deploying a Fabric ML model is to export it. In this basic walk-through, I simply export the model to my local Windows system.

💡

The model I'm exporting was created in a Fabric notebook, and registered in the Fabric workspace using MLflow. MLflow creates all the metadata needed to import models to Azure ML, avoiding the need to manually create an inference script (Python)!

Exporting a model from Fabric is easy--just tap the Download ML model version button in the UI.

Exporting a Fabric ML Model from the Web UI

Explore the contents of the downloaded file

Fabric will combine the ML Model along with a Python Object Serialization file (a/k/a pickle file) having a .pkl extension, and YAML files (.yml) that describe to other platforms the structure and interface provided by the model. All these files are packaged into a .zip file, which will be placed in your Downloads folder.

If you open the .zip file, you can review the model and metadata files before deploying them to other systems.

Contents of the Fabric downloaded .zip file

The contents of the .zip file contain YAML files that define the runtime environment a serving compute engine needs to provide for the model, and the input/output interface (data structures) needed to call the model.

Upload the Model to Azure ML

In this example, we're going to deploy an Azure ML real-time inference endpoint to publish the model to the Internet via a RESTful web service.

While it is possible to deploy an endpoint using a model from our local file system, I'm going to register the model in and Azure ML workspace, and use the Azure ML Studio web UI to deploy the endpoint.

💡

In a CI/CD environment, using the Azure ML CLI or API endpoints enables deployment automation.

To upload the model, we'll use the Register/From local files menu in Azure AI Machine Learning Studio, and then press the Register button at the bottom of the screen.

Importing the Fabric ML Model to Azure ML

After uploading the files, the new model is available in the Models page of the Azure ML Workspace.

Deploy the Model to an Endpoint

Once the model is registered in the Azure ML workspace, click on the model name, and then select Real-time endpoint from the Deploy menu.

💡

Imported models can be deployed either to Real-time or Batch endpoints. In this walk-through I've chosen to create a real-time endpoint, such as would be published for interactive user applications.

Start the Real-time endpoint deployment process

Configure the Endpoint and Deployment

A model opens up to configure the compute to use for the endpoint, and to specify the deployment name.

💡

An endpoint is a REST URL applications will connect to when using the ML Model. A Deployment is a published instance of a model. An endpoint may contain one or more Deployments, for example if a solution has different models for different purposes.

Specify the compute size desired for the endpoint, and the names for the endpoint and deployment, then tap the Deploy button.

Configuring the Azure ML Endpoint and Deployment

Test the Deployment

After the deployment completes (10-15 minutes, typically), we can make a quick "smoke test" within Azure ML Studio to ensure the deployment is functional, and we're using the correct data structures when calling it.

Testing the endpoint within the Azure ML workspace

Once the interactive test succeeds, it's time to move on to consume the model from outside the Azure ML environment.

Consuming the Deployment via the REST Endpoint

We can now send data for scoring to the model deployment via the endpoint using any application or programming language that supports RESTful POST requests. Azure ML provides boilerplate code in JavaScript, Python, C# and R.

For this post, I'll use Postman to test the endpoint from the public Internet (using the code samples provided by Azure ML as inputs to the Postman call configuration).

Calling the Endpoint from Postman

Making the API call to the public Internet endpoint verifies the endpoint and deployment are configured correctly, and the input/output interfaces are working as expected.

💡

While we used the default endpoint configuration, which allows Internet connections, Azure ML endpoints can be configured with Network Isolation and prohibit open Internet access. Refer to Azure documentation for details.

Summary

Microsoft Fabric has a rich Data Science workload, backed by Synapse ML and scalable Spark compute, which gives us excellent tools for training ML models and using them to transform data in our data lakes.

Our models are created in standard formats, and registered within Fabric workspaces using MLflow, making it easy to repurpose them in other data science environments!

In this post, I deployed a Fabric model to Azure Machine Learning, and repurposed a model trained in Fabric for use by applications via scalable Azure compute clusters. The ability to export and repurpose models opens endless opportunities!

Unleash Your Model's Potential: Step-by-Step Guide to Deploying a Fabric Machine Learning Model on an Azure ML Inference Endpoint

Fine-Tuning LLMs Using a Local GPU on Windows

How to create and install custom Python libraries in Microsoft Fabric

Using Azure AI Document Intelligence with Microsoft Fabric