What is Azure AI Search
Azure AI Search (formerly known as Azure Cognitive Search) is a highly scalable, high performance search engine similar to Apache Lucene. AI Search indexes can be queried from virtually any front-end application and are commonly used for web applications and for indexing a wide variety content for enterprise applications.
Azure AI Search indexes are typically updated from scripts or by reading documents into a search index from a file storage location, such as Azure Blob Storage. In this post we'll export selected data elements from a Fabric Lakehouse into an Azure AI Search index.
Why Export Fabric Data to Azure AI Search?
Fabric is a great platform for ingesting, processing and analyzing data. While we could integrate with external applications (e.g. using SQL endpoints), for high-performance keyword/vector index searching from external applications, a technology like AI Search is preferable, as it can deliver faster and more flexible search capabilities and support a range of client APIs (such as REST).
Fortunately, Fabric Data Science includes integration libraries to make it easy to directly write data from Fabric to AI Search!
Using Synapse ML to Update an Index
We can use Python or R and the Synapse ML library to easily update an Azure AI Search Index from Fabric in just a few lines of code.
SynapseML contains all the code needed to transform each row of a Spark DataFrame into an AI Search index entry, and we can call the update index methods from any location that can import the Synapse ML libraries (such as a Notebook or a Spark Job).
Writing a DatFrame
In this example, I have some data in a Spark DataFrame which I'd like to add to an AI Search Index. By writing this data to an index, I'll enable end-user applications to use keyword (and optionally vector search) to find images by keyword or description.
Getting an Azure AI Search Key
SynapseML will need an access key that allows it write access to the Search Index. In this example I'll use one of the API Keys created automatically when an Azure AI Search service is created.
Storing the key in a vault
It's always a good idea to store secret keys securely, so I've stored the key in Azure Key Vault, using the Azure Key Vault CLI.
Fetch the Key in a Notebook
To fetch the key in a notebook, we can use the PyTridentTokenLibrary
Fabric dependency.
from trident_token_library_wrapper import PyTridentTokenLibrary as tl
key_vault_name = "designmind-fabric-ai"
access_token = mssparkutils.credentials.getToken("keyvault")
ai_search_key = tl.get_secret_with_token( f"https://{key_vault_name}.vault.azure.net/", "SEARCH-SERVICE-KEY", access_token)
Next we'll use SynapseML to write each row in the DataFrame to AI Search!
Call SynapseML to Update the Index
This step is really easy. We only need to provide the following inputs for SynapseML:
- The DataFrame to use as the source of new index content
- The AI Search Service Name (the container of the index)
- The AI Search Index Name (the index to update)
- The key SynapseML can use to write data to the index
from synapse.ml.services import *
df2.writeToAzureSearch(
subscriptionKey=ai_search_key,
actionCol="searchAction",
serviceName="name_of_search_service",
indexName="name_of_search_index",
keyCol="ObjectID",
)
Search the Index
With the index updated, we can use many methods to search it for content. The most common will be to use the Azure AI Search a REST endpoint.
Here's an example in Python (you can even put this into a notebook cell!)
url = "https://{}.search.windows.net/indexes/{}/docs/search?api-version=2019-05-06"
.format(
"name_of_search_service", "name_of_search_index"
)
requests.post(
url, json={"search": "Glass"}, headers={"api-key": ai_search_key}
).json()
The result of the search will be a list of matching documents in JSON format.
Summary
Fabric provides powerful, scalable data processing and analysis features that can easily meet big data and unstructured/semi-structured data analysis requirements. As data is prepared, it can be staged into a variety of end-user surface areas--from Power BI to any application that can query a SQL Endpoint.
In this post we took a look at how to move finished, gold data from Fabric into an Azure AI Search index. Azure AI Search update is one of many AI features provided out-of-the-box Data Science workload, making it easier than ever to integrate our cloud data platforms and AI/ML platforms into a cohesive solution.