In this post I'm, going to walk through step-by-step how to connect a single Jupyter notebook to two Lakehouses, and then easily read & write data between them. In a medallion architecture, we can process data between Bronze/Silver/Gold Lakehouses using Data Flows, Data Pipelines or Notebooks. In this example I'll use a notebook.

💡
A good reference for Medallion design in Fabric is this article on Microsoft Learn. In this post I'm focusing on the "Single Workspace/Multiple Lakehouse" deployment model.

Initial Configuration

To start, I have the following artifacts in my solution:

  1. Workspace named MedallionWorkspace, that contains my entire solution.
  2. Lakehouse named Bronze_LH
  3. Lakehouse named Silver_LH
  4. Lakehouse named Gold_LH
  5. Jupyter Notebook named Bronze_to_Silver where I'll conduct my data import (from Bronze), processing, and output (to Silver).
Initial Solution Workspace

Within the Bronze Lakehouse I have an input file named raw_data.csv. In this post, I'm going to:

  1. Read the input CSV file from Bronze /Files into a Dataframe.
  2. Transform the Dataframe by adding a current date/time column.
  3. Write the Dataframe as a CSV back to the original Bronze/Files folder.
  4. Write the Dataframe as a CSV to Silver/Files.
  5. Write the Dataframe as a Delta table to Bronze/Tables.
  6. Write the Dataframe as a Delta table to Silver/Tables.

Add Bronze and Silver Lakehouses to the Notebook

Since we'll be referencing both Lakehouses when reading the CSV and writing Tables, we'll first add both Lakehouses to the explorer navigation tree of the Notebook.

Click the Add button to get started.

Select Bronze and Silver Lakehouses. We don't need to select Gold because we're not going to access that data in this Notebook.

Select the Lakehouse(s) that will be source or destinations in the notebook

Next, in the sidebar, select the Bronze Lakehouse as the current context.

Select Bronze as the "current" Lakehouse
💡
The selection of Lakehouse only influences which data will be accessed as "local" and which needs a Lakehouse reference. You could select Silver_LH as the local Lakehouse instead--it would just change the Python syntax somewhat.

Read the Data File from Bronze

In the first cell, let's just read the source CSV file into a Dataframe. Because the source data is in Bronze, and that's the default Lakehouse, no special syntax is needed yet.

df = spark.read.format("csv") \
    .option("header","true") \
    .load("Files/raw_data.csv")

Transform the Dataframe

To show that we can process the data, let's add a new column to the Dataframe. This column will simply state the date/time the data processing occurred.

from pyspark.sql.functions import current_timestamp
df = df.withColumn("processed_date_time", current_timestamp())

Write the Data back to Bronze

Writing the Dataframe back to Bronze is trivial and requires no special syntax. Here we'll write it both as a CSV to the Files section, then as a Delta table to the Tables section.

# Write a CSV file
df.write.mode('overwrite') \
    .csv('Files/output_csv_Bronze.csv')

# Write a Delta Parquet table
df.write.format("delta").mode("overwrite") \
    .saveAsTable('output_table_Bronze')

Write the Data to Silver

Now, let's do the same thing, but to Silver. In this case, since Silver is not the current Lakehouse context, we need to use some additional syntax sugar:

  1. For the Files section, we'll use the mssparkutils library to mount the Silver_LH Lakehouse in the context of the notebook.
  2. For the Tables section, it's easier--we can just use a two-part reference to access tables in any Lakehouse that's connected in the notebook, but not the current Lakehouse.

First, let's create a function that will get a mount path given a Lakehouse name.

def get_mount_path(lakehouse_name):
    mnt_point = f'/mnt/mnt_{lakehouse_name}'
    mssparkutils.fs.mount(lakehouse_name, mnt_point)
    return f'file:{mssparkutils.fs.getMountPath(mnt_point)}'

Next, let's write the CSV file to Silver_LH/Files, using the function defined above in-line to build a full URL for the output table.

mount_path = get_mount_path('Silver_LH')

df.write.mode('overwrite') \
    .csv(f'{mount_path}/Files/output_csv_Silver.csv')

And finally, we'll write the Dataframe as a Delta Parquet table in Silver. This time the change is trivial, just adding Silver_LH. to the beginning of the output table name.

df.write.format("delta").mode("overwrite") \
    .saveAsTable('Silver_LH.output_table_Silver')

Summary

And that's it for this post. While it's not entirely obvious how to reference multiple Lakehouses in a Notebook, it's fairly straightforward once you know the syntax!

💡
To review a Jupyter notebook with the full code samples covered in this post, click on this link to GitHub.