Case Study

Databricks: Migration to Unity Catalog

After migrating several data products to Databricks Unity Catalog. Here are my key findings of things I wished I had known that would have had eased the migration process.

What is Databrick's Unity Catalog?

Unity Catalog (UC) is Databricks’ unified governance layer for all your data, AI models, and analytics assets. It centralizes discovery, access control, and compliance in a single platform. Providing Unified Governance, Built-in Intelligence and Open Source Formats, at it’s core the key benefits are what you would expect from a modern data lakehouse.

But this isn’t a marketing article, you can read more here

Throughout my experience guiding corporations through Databricks modernization and migration initiatives, I have consistently encountered a recurring set of challenges, challenges that would frequently involve the few key considerations that I will be talking about.

Key Consideration 1

Network and IP Ranges

When deploying Unity Catalog in a VPC/VNet, choose subnet CIDRs that do not overlap Databricks’ reserved ranges. Databricks reserves certain IP blocks for internal cluster connectivity, and these must be avoided in your network design. For example, both AWS and Azure documentation warn against using 127.187.216.0/24, 192.168.216.0/24, and 198.18.216.0/24 in any workspace or cluster subnet. These reservations apply “to all types of workspaces and all cluster types, including classic and serverless”. In practice, ensure your VPC/VNet CIDR ranges (and any VPN or on-prem networks) exclude those subnets.

Azure specifics: When creating Azure VNets for Unity Catalog, you’ll typically provision custom subnets (e.g. a /24 range) instead of the default. As one guide notes, in the VNet creation wizard you “remove the default IP address and enter the procured IP address” for the CIDR. Also set up Private Endpoints (or Private Service Connect) for the control plane and storage accounts, ensuring their IPs are reachable.

AWS specifics: On AWS, you typically use AWS PrivateLink for secure connectivity. Configure VPC interface endpoints for the Databricks workspace and SCC (Secure Cluster Connectivity) relays using the regional service names. Ensure your VPC’s IP range does not collide with the reserved list. The Databricks AWS docs provide the exact PrivateLink service IDs per region (e.g. com.amazonaws.vpce.us-east-1.vpce-svc-09143d1e626de2f04 for us-east-1).

By planning your network CIDRs upfront and using private connectivity (Azure Private Link or AWS PrivateLink), you can safely spin up UC-enabled clusters without IP conflicts.

Key Consideration 2

Account-Level vs Workspace-Level RBAC

Databricks’ access model has two layers:

account-level RBAC (in the Account Console)
workspace-level roles/ACLs. Unity Catalog adds its own privileges on top of these.

It’s important to understand their scope and trade-offs:

Account Admins (Account RBAC): These users have global privileges across all workspaces. An Account Admin can create and link metastores, assign the Metastore Admin role, grant privileges on metastores, enable Delta Sharing, and configure storage credentials. In short, account admins manage the metastore and cross-workspace objects.

Pros: Centralized control (one place to manage multiple workspaces’ catalogs and shares).

Cons: Very high privilege (risk of over-permissioning), requires careful delegation. Only an account admin can initiate a Unity Catalog metastore and make someone a metastore admin.

Workspace Admins (Workspace RBAC): These users manage a single Databricks workspace’s compute and objects. They can add users/service principals, manage jobs, notebooks, and other workspace-specific ACLs. In UC-enabled workspaces (especially if UC was auto-enabled), workspace admins automatically get the ability to create UC assets in the attached metastore (for example, the default workspace catalog privileges like CREATE CATALOG, CREATE EXTERNAL LOCATION, etc.).

Pros: Familiar and fine-grained control for day-to-day tasks; less risk than giving everyone account-level rights.

Cons: Scoped to one workspace; if you have many workspaces, managing permissions across them can be more effort. Also, workspace admins cannot create or link new metastores or assign metastore-wide roles (only account admins can).

Design Tip: During migration, define which teams need account-level duties (metastore setup, global governance) vs workspace duties. For example, one team (or service principal) might be the Metastore Admin (via the Account Console), while workspace admins continue managing their local jobs and ETL. Be aware that account admins can even restrict workspace admins (using the RestrictWorkspaceAdmins setting) to tighten control.

If your organization is new to databricks and you need a rough start on RBAC groups, databricks has an easy proposal on creating groups here.

Key Consideration 3

Handling Non-Delta Data

Unity Catalog tables can be managed (Delta/Iceberg), external, or foreign. Importantly, Unity Catalog supports external tables on various formats (Parquet, ORC, CSV, JSON, etc.), but only Delta Lake (or Iceberg) tables get full ACID guarantees and performance optimizations. As the docs note, “External tables register data stored in cloud storage that you manage… Unity Catalog governs data access but doesn’t manage data lifecycle… Unity Catalog external tables support Delta Lake (recommended) and CSV, JSON, AVRO, PARQUET, ORC, and TEXT formats. Non-Delta external tables lack the transactional guarantees and performance optimizations of Delta Lake.”.

Key Point: If you point Unity Catalog at existing Parquet/ORC files, the table will be external and work, but you won’t get transactionality (no time-travel, no MERGE, etc.) or automatic optimization. Queries may also be slower without Delta’s indexing.
Recommended migration: Convert legacy data to Delta when possible. Unity Catalog supports the CONVERT TO DELTA command on external Parquet/Iceberg data. For example, you can register your Parquet files as a UC external table and run:
This rewrites the files into a Delta Lake table, unlocking all Delta features. Note that converting Iceberg tables is still in preview (per docs). If you can’t convert certain tables, at least be aware of the limitations (no ACID).
Legacy Hive tables: If you have old Hive-metastore tables (even if on Parquet), Databricks recommends migrating them into Unity Catalog. Hive tables can be federated or copied, but the ideal is to CTAS them into new UC managed Delta tables so you can use UC’s governance and time-travel. For example, use CREATE TABLE ... AS SELECT (CTAS) from the old table into a UC catalog as outlined in the “Upgrade Hive to UC” guide.

In short, treat Unity Catalog managed Delta tables as the goal. Non-Delta formats should either be converted to Delta or accepted as limited external tables. Plan ahead which existing datasets need conversion, and assign privileges for using CONVERT TO DELTA (you need CREATE EXTERNAL TABLE on the location, etc.).

Key Consideration 4

Incorporating External Lineage

Unity Catalog provides built-in lineage for Spark/SQL workloads run on Databricks, but it also lets you “bring your own lineage” for external systems. In practice, you can add external metadata objects (representing outside assets) and link them to UC tables to show end-to-end data flow. For instance, if data is ingested from Salesforce or read by a Tableau dashboard, you can create an external metadata entry for that source or BI report and connect it to your Unity Catalog table. As the docs explain: “You might have workloads that run outside of Databricks… Unity Catalog lets you add external lineage metadata… giving you an end-to-end lineage view”.

To use this feature (currently in Preview):

Privileges: You’ll need CREATE EXTERNAL METADATA on the metastore to register an external source object, plus MODIFY on that object to link it to UC assets. (Basic SELECT/WRITE privileges apply for linking directionally.)
Process: In the UC Catalog Explorer (or via REST API/Python SDK), create a new External Metadata object (specify name, system type, entity type, etc.). Then use the “Add lineage” UI or API to create an upstream or downstream link between that external object and a UC table/model. You can even map column-level relationships or attach custom metadata (like the query that moved the data).
Result: In the Unity Catalog lineage graph you’ll see nodes for these external systems connected to your tables/models. For example, a PostgreSQL table feeding into a UC-managed table, or a Looker dashboard reading from it. This gives BI teams and auditors an integrated view of data provenance.

While still new, this capability means you don’t have to lose track of data when it crosses system boundaries. (Databricks is also exploring OpenLineage integration for automated lineage capture in future releases.) In the meantime, plan to use the External Metadata feature to import or manually define any critical lineage relationships from your pre-UC pipelines or downstream tools.

AFTERWORD

Conclusion

After All that has been shared, If you’re working on modernization or migration project into Databricks, I hope that your platform and data engineering teams would be able to look back at these few considerations before diving deeper into the complexities that Unity Catalog has to offer.

This post is not sponsored nor endorsed by Databricks in anyway. It is merely a case study of actual work that I’ve conducted across my time working with the product. Please stay updated with the latest news by Databricks as the product may evolve overtime. Thank you.