What is Azure AI Search

Azure AI Search (formerly known as Azure Cognitive Search) is a highly scalable, high performance search engine similar to Apache Lucene. AI Search indexes can be queried from virtually any front-end application and are commonly used for web applications and for indexing a wide variety content for enterprise applications.

Azure AI Search indexes are typically updated from scripts or by reading documents into a search index from a file storage location, such as Azure Blob Storage.  In this post we'll export selected data elements from a Fabric Lakehouse into an Azure AI Search index.

Why Export Fabric Data to Azure AI Search?

Fabric is a great platform for ingesting, processing and analyzing data.  While we could integrate with external applications (e.g. using SQL endpoints), for high-performance keyword/vector index searching from external applications, a technology like AI Search is preferable, as it can deliver faster and more flexible search capabilities and support a range of client APIs (such as REST).

Fortunately, Fabric Data Science includes integration libraries to make it easy to directly write data from Fabric to AI Search!

Using Synapse ML to Update an Index

We can use Python or R and the Synapse ML library to easily update an Azure AI Search Index from Fabric in just a few lines of code.

SynapseML contains all the code needed to transform each row of a Spark DataFrame into an AI Search index entry, and we can call the update index methods from any location that can import the Synapse ML libraries (such as a Notebook or a Spark Job).

Using SynapseML to Update Azure AI Search

Writing a DatFrame

In this example, I have some data in a Spark DataFrame which I'd like to add to an AI Search Index. By writing this data to an index, I'll enable end-user applications to use keyword (and optionally vector search) to find images by keyword or description.

Getting an Azure AI Search Key

SynapseML will need an access key that allows it write access to the Search Index. In this example I'll use one of the API Keys created automatically when an Azure AI Search service is created.

Azure AI Search API Keys

Storing the key in a vault

It's always a good idea to store secret keys securely, so I've stored the key in Azure Key Vault, using the Azure Key Vault CLI.

az keyvault secret set --vault-name <vault-name> \ 
  --name SEARCH-SERVICE-KEY \ 
  --value "<key-value>"
Storing a secret key in Azure Key Vault
💡
For more details about using Azure Key Vault with Microsoft Fabric, take a look at this prior post on this topic!

Fetch the Key in a Notebook

To fetch the key in a notebook, we can use the PyTridentTokenLibrary Fabric dependency.

from trident_token_library_wrapper import PyTridentTokenLibrary as tl

key_vault_name = "designmind-fabric-ai"
access_token = mssparkutils.credentials.getToken("keyvault")

ai_search_key = tl.get_secret_with_token( f"https://{key_vault_name}.vault.azure.net/", "SEARCH-SERVICE-KEY", access_token)

Next we'll use SynapseML to write each row in the DataFrame to AI Search!

Call SynapseML to Update the Index

This step is really easy.  We only need to provide the following inputs for SynapseML:

  1. The DataFrame to use as the source of new index content
  2. The AI Search Service Name (the container of the index)
  3. The AI Search Index Name (the index to update)
  4. The key SynapseML can use to write data to the index
from synapse.ml.services import *

df2.writeToAzureSearch(
    subscriptionKey=ai_search_key,
    actionCol="searchAction",
    serviceName="name_of_search_service", 
    indexName="name_of_search_index",
    keyCol="ObjectID",
)

Search the Index

With the index updated, we can use many methods to search it for content. The most common will be to use the Azure AI Search a REST endpoint.

Here's an example in Python (you can even put this into a notebook cell!)

url = "https://{}.search.windows.net/indexes/{}/docs/search?api-version=2019-05-06"
.format(
    "name_of_search_service", "name_of_search_index"
)
requests.post(
    url, json={"search": "Glass"}, headers={"api-key": ai_search_key}
).json()

The result of the search will be a list of matching documents in JSON format.

Summary

Fabric provides powerful, scalable data processing and analysis features that can easily meet big data and unstructured/semi-structured data analysis requirements. As data is prepared, it can be staged into a variety of end-user surface areas--from Power BI to any application that can query a SQL Endpoint.  

In this post we took a look at how to move finished, gold data from Fabric into an Azure AI Search index. Azure AI Search update is one of many AI features provided out-of-the-box Data Science workload, making it easier than ever to integrate our cloud data platforms and AI/ML platforms into a cohesive solution.