What's the On-premises data gateway?

The on-premises data gateway has been available for quite some time and is frequently used to refresh Power BI cloud-based semantic models from on-premises data sources.

As of March 2024, the on-premises gateway supports Fabric data integration tools as gateway data consumers.  This essentially means we can connect to on-premises data via a secure connection and pull data into the Fabric OneLake environment as part of scheduled jobs.

The Microsoft gateway deployment documentation is comprehensive, so I'll skip the detailed setup instructions. Instead, I'll walk through how I tested it, some observations about how to best use it, and wrap up with some thoughts about the feature as it currently exists.

On-Premises Setup

To test the gateway, I setup a VM-based on-premises lab environment with an Active Directory controller, a single SQL Server 2022 Enterprise instance, and a dedicated VM to run the on-premises data connector.

An inside look at a virtual Windows server lab!

In this scenario, the on-premises data gateway listens for requests from Data Pipelines and Data Flows, queries data from the remote source (in this case a SQL Server default instance on server SQL1) and send query results back from the on-prem network to the Fabric environment.

Essentially on-premises gateway is a proxy server Fabric and other cloud-based services can use to incorporate on-premises data.

Configuring the On-premises Gateway

Fundamentally, the on-prem gateway is a Windows service that's installed using a Windows installer on the machine that will serve as the data gateway.

While the gateway service can be installed on virtually any machine (except domain controllers), I'd generally recommend dedicating a machine (or VM) to this process to isolate it and limit resource contention and configuration conflicts with other services running on the same host.

💡
While you can run an on-premises gateway on a single computer, in a production environment deploying multiple instances in a cluster provides for high-availability and is recommende din production.

To install the gateway, follow Microsoft's documentation script, which includes a link to the gateway installer (or download it from this link).

After installing the gateway and authenticating with Microsoft Fabric (typically using an Entra account that has appropriate permissions in the Fabric environment), the gateway is configured, and the Microsoft Fabric support should be in "Ready" status.

💡
I'm skipping some steps here, but the Microsoft Docs go through them in detail.
A Properly Configured On-premises gateway service
💡
Note that the machine hosting the gateway service may need drivers and/or middleware to connect to remote services, such as SQL Server.

Control Who Can Create Gateway Connections

Your Fabric tenant can be configured to allow all users to create their own connections to gateways, or this permission can be restricted by default and authorized to users explicitely.

💡
If you don't intend to allow any user to create a new gateway connection, this is a way to lock-down the feature. If users are restricted from creating gateway connections, they can be authorized to use connections created by others (I'll demonstrate this below).
Gateway Connection Authorization Form

Browse to the Connections Configuration Settings Panel

If you're authorized to create connections, browse to the connection list under the settings cog to get started.

Entering the Gateway Connection Configuration in Settings

Create a New Connection

From the Connection List form:

  1. Select the Connections tab
  2. Tap the New button
  3. Select On-premises in the New connection dialog
  4. Select the name of the on-premises gateway (the name you gave it when you installed the software on the gateway machine in your on-prem environment).
  5. Provide a name for the new connection (which users will use to select the gateway when connecting to data).
  6. Select the Connection type – in this example an on-prem SQL Server 2022 instance.
  7. Complete the connection information that will be passed to the gateway as it makes connections to the on-premises data.
Creating a new On-premises gateway connection
💡
Since many users of connected on-prem data won't know connection secrets, it probably makes sense much of the time for Administrators to create connections, then authorize Fabric users to use them according to security policies – see the next step.

Authorize Users to the Connection

In this scenario, I created the connection to my on-prem environment, but I want to authorize others in my company to use the connection without knowing the SQL user credential or needing to create their own connections.

  1. Select the On-premises data gateways tab
  2. From the ... menu, select Manage users
  3. Search for a user or security group to authorize to access the gateway
  4. Select the user or group just added to the ACL
  5. Select the appropriate level of authorization for them.
Accessing user authorization for an on-premises gateway
Adding users and selecting granular permissions

Using the Gateway from a Data Pipeline

Installing the gateway on prem and creating a connection to it from Fabric has some complexity, though is relatively easy.  Using the gateway is quite straightforward!

Add a Copy Data Task

  1. Add a Copy Data task using the copy assistant.
  2. Select the appropriate data source type (in this example SQL Server), then tap Next.
  3. Select existing connection.
  4. Open the drop-down to view existing connections.
  5. Choose the name of the connection to the gateway (Created in the last section), then tap Next.
  6. Tap the database drop-down.
  7. Select the desired source Database (on the remote SQL1 server in this case), then click Next.
Create a data copy task using copy assistant
Selecting SQL Server as the source
The gatway is an "existing connection"
Available remote databases on SQL1 as enumerated by the gateway

Select the Desired Tables or Create a Query

Fabric will query the remote data source's metadata (SQL Server in this case), listing available tables.  Alternately, you can write a query.

Once the desired source data is specified, tap the Next button.

Selecting data from SQL1

Select the Target Lakehouse

In this case, we'll copy the source data to a Lakehouse table, so select Lakehouse and tap the Next button.

Targeting copied data into a Lakehouse Delta Parquet table

Then select the destination for the data in the Lakehouse:

  1. Select Existing Lakehouse.
  2. Tap the Lakehouse drop-down.
  3. Select the desired destination Lakehouse from the list.
Select the specific Lakehouse taget from the drop-down list

From this point we can specify how to land the data, change destination table names, data types and column names as usual.

Modify target metadata and data types

Run the Pipeline

Finally, we can run the pipeline as usual!

Successful pipelien run

Summary

Support for On-prem data gateway in Fabric is an important step toward providing an easy-to-use, well integrated SaaS data engineering and analytics platform.  

While the underlying data gateway technology is mature and familiar to most enterprise Power BI users, providing integration with Fabric Data Engineering tooling opens up a host of new possibilities for consolidating data from many sources into a cohesive OneLake data environment serving Reporting, BI, Data Science and Streaming use cases.