dvc_databricks package¶
Submodules¶
dvc_databricks.filesystem module¶
DVC filesystem plugin for Databricks Unity Catalog Volumes.
Architecture:
- DatabricksVolumesFileSystem ← dvc_objects.FileSystem subclass
│ DVC-facing layer: config parsing, │ checksum strategy, plugin registration │ └── self.fs ← _DatabricksVolumesFS (fsspec.AbstractFileSystem)
I/O layer: upload, download, list, delete via Databricks SDK Files API
When this package is installed, the dvc.plugins entry point registers
DatabricksVolumesFileSystem under the dbvol protocol. DVC discovers
it automatically — no imports or manual configuration required.
Users configure the remote once:
dvc remote add -d myremote dbvol:///Volumes/catalog/schema/volume/path export DATABRICKS_CONFIG_PROFILE=<profile>
Then use standard DVC commands as usual:
dvc push / dvc pull / dvc status …
- class dvc_databricks.filesystem.DatabricksVolumesFileSystem(**config)[source]¶
Bases:
FileSystemDVC remote filesystem backed by Databricks Unity Catalog Volumes.
Extends
dvc_objects.fs.base.FileSystem.DVC delegates all storage operations to
self.fs(a_DatabricksVolumesFSinstance), which communicates with the Databricks Volume via the SDK Files API — no direct S3 access.Configuration (one-time setup per repo):
- dvc remote add -d myremote
dbvol:///Volumes/catalog/schema/volume/dvc_cache
export DATABRICKS_CONFIG_PROFILE=<profile>
After that, standard DVC commands work without any code changes:
dvc push / dvc pull / dvc status
Note
DATABRICKS_CONFIG_PROFILEmust be set in the environment because DVC remotes do not support arbitrary config keys. The profile cannot be stored in.dvc/config.- protocol = 'dbvol'¶
- __init__(**config)[source]¶
Parse DVC remote config and prepare the filesystem.
- Parameters:
**config –
DVC remote configuration dict. Expected keys:
url(str): Full remote URL, e.g.dbvol:///Volumes/catalog/schema/volume/path.profile(str, optional): Databricks CLI profile name. Falls back toDATABRICKS_CONFIG_PROFILEenv var.
- property fs: _DatabricksVolumesFS¶
Return the underlying fsspec filesystem, created lazily and cached.
Thread-safe: uses an
RLockto ensure only one instance is created even under concurrent access.- Returns:
A
_DatabricksVolumesFSinstance authenticated with the configured Databricks profile.
Module contents¶
dvc-databricks — DVC remote plugin for Databricks Unity Catalog Volumes.
Registers the dbvol protocol into dvc_objects.fs.known_implementations
so that DVC can resolve dbvol:// remotes in any process where this
package is installed.
This registration runs on import. The package uses a .pth file (installed
into site-packages) to ensure this module is imported at Python startup,
which makes dvc push / dvc pull work from the CLI without any
manual imports.