dvc_databricks package

Submodules

dvc_databricks.filesystem module

DVC filesystem plugin for Databricks Unity Catalog Volumes.

Architecture:

DatabricksVolumesFileSystem ← dvc_objects.FileSystem subclass

│ DVC-facing layer: config parsing, │ checksum strategy, plugin registration │ └── self.fs ← _DatabricksVolumesFS (fsspec.AbstractFileSystem)

I/O layer: upload, download, list, delete via Databricks SDK Files API

When this package is installed, the dvc.plugins entry point registers DatabricksVolumesFileSystem under the dbvol protocol. DVC discovers it automatically — no imports or manual configuration required.

Users configure the remote once:

dvc remote add -d myremote dbvol:///Volumes/catalog/schema/volume/path export DATABRICKS_CONFIG_PROFILE=<profile>

Then use standard DVC commands as usual:

dvc push / dvc pull / dvc status …

class dvc_databricks.filesystem.DatabricksVolumesFileSystem(**config)[source]

Bases: FileSystem

DVC remote filesystem backed by Databricks Unity Catalog Volumes.

Extends dvc_objects.fs.base.FileSystem.

DVC delegates all storage operations to self.fs (a _DatabricksVolumesFS instance), which communicates with the Databricks Volume via the SDK Files API — no direct S3 access.

Configuration (one-time setup per repo):

dvc remote add -d myremote

dbvol:///Volumes/catalog/schema/volume/dvc_cache

export DATABRICKS_CONFIG_PROFILE=<profile>

After that, standard DVC commands work without any code changes:

dvc push / dvc pull / dvc status

Note

DATABRICKS_CONFIG_PROFILE must be set in the environment because DVC remotes do not support arbitrary config keys. The profile cannot be stored in .dvc/config.

protocol = 'dbvol'
PARAM_CHECKSUM: ClassVar[str | None] = 'md5'
REQUIRES: ClassVar[dict[str, str]] = {'databricks-sdk': 'databricks.sdk'}
__init__(**config)[source]

Parse DVC remote config and prepare the filesystem.

Parameters:

**config

DVC remote configuration dict. Expected keys:

  • url (str): Full remote URL, e.g. dbvol:///Volumes/catalog/schema/volume/path.

  • profile (str, optional): Databricks CLI profile name. Falls back to DATABRICKS_CONFIG_PROFILE env var.

unstrip_protocol(path)[source]

Reconstruct the full dbvol:// URL from an absolute path.

Parameters:

path (str) – Absolute Volume path, e.g. /Volumes/catalog/schema/volume/file.

Return type:

str

Returns:

Full URL string, e.g. dbvol:///Volumes/catalog/schema/volume/file.

property fs: _DatabricksVolumesFS

Return the underlying fsspec filesystem, created lazily and cached.

Thread-safe: uses an RLock to ensure only one instance is created even under concurrent access.

Returns:

A _DatabricksVolumesFS instance authenticated with the configured Databricks profile.

Module contents

dvc-databricks — DVC remote plugin for Databricks Unity Catalog Volumes.

Registers the dbvol protocol into dvc_objects.fs.known_implementations so that DVC can resolve dbvol:// remotes in any process where this package is installed.

This registration runs on import. The package uses a .pth file (installed into site-packages) to ensure this module is imported at Python startup, which makes dvc push / dvc pull work from the CLI without any manual imports.