Deploying and Consuming Foundation Models from the Azure Model Catalog

About the Azure Model Catalog

When using pre-trained foundation models in our applications, we have several ways to incorporate them into our solutions, including:

Using a pre-trained model-as-a-service on a "price per token" basis, such as OpenAI GPT4.
Downloading models from repositories like Hugging Face and provisioning our own server compute (e.g. a VM that runs a large language model).
Deploying a model and endpoint from a cloud provider Model Catalog, such as Azure, AWS or Google Cloud

In this post I'll walk through the process of deploying a foundation model from the Microsoft Azure Machine Learning Model Catalog, creating a REST endpoint, and then calling the REST endpoint from an external application.

💡

The walk through detailed in this post is available in video format! Click the embedded YouTube video to continue in video. The text version of this post continues after the video.

YouTube Video Walk Through

Creating an Azure Machine Learning Workspace

To use the Azure Machine Learning Model Catalog, we'll first need an Azure Machine Learning workspace. To create a workspace, we'll add a new Resource Group, and then create an Azure Machine Learning service within the resource group.

Creating an Azure Machine Learning Service from Azure Marketplace

Once the Azure ML workspace is created, we launch the Azure ML Studio. We can either use the launch button from the Azure resource, or browse directly to http://ml.azure.com.

Select a Model from the Model Catalog

Once in the Azure ML Studio, we can browse the model catalog and select a model that meets our needs. Each model has summary information, and links to the originator of the model we can follow to learn more about each one.

Deploying a Model

Once we've found the model we want to use, we can deploy the model right from Azure ML Studio.

💡

Azure Machine Learning can provision Real-time or Batch endpoints. Real-time endpoints are generally used to service interactive end-user applications, while Batch endpoints generally are used to service machine-to-machine request queues.

Deploying a model from the Model Catalog automatically provisions two resources within our Azure Resource Group:

A compute instance to host the model within our Azure account.
A compute instance to host a REST endpoint that allows external applications to send queries to the model we deployed.

💡

It's possible to deploy more than one model to the same REST endpoint. The deploy wizard creates a new endpoint and new model deployment.

Consuming the Endpoint

When the model and endpoint are provisioned and on-line, we can review the deployment. Calling the endpoint requires the following inputs:

The REST URL for the deployed endpoint
A bearer token to add to the REST request header
The name of the model to send a request to (there can be more than one model per endpoint)

Within the Azure ML Studio, sample code is provided in JavaScript, Python, C# and R to get us started.

Calling the Endpoint

We can now send prompts to the deployed model, and have responses returned to our application in JSON format.

Below is an example of calling the model we deployed using Postman:

💡

For a more detailed, step-by-step walk through of the process in this post, check out the YouTube video version.

Summary

There's a vast array of foundation models available to use for generative AI solutions, and the number and quality is growing by the day. Often the challenge is how to provision and host models so they're accessible by applications that can make use of their capabilities. Model Catalogs in cloud platforms like Azure make this process convenient and relatively easy to manage.

Deploying and Consuming Foundation Models from the Azure Model Catalog

Fine-Tuning LLMs Using a Local GPU on Windows

Unleash Your Model's Potential: Step-by-Step Guide to Deploying a Fabric Machine Learning Model on an Azure ML Inference Endpoint

How to create and install custom Python libraries in Microsoft Fabric