How it Works
Pricing
About
Customers
Careers
Resources
Get a saving estimate
Log in
Book a Call

Sign up for our
Databricks optimizer

Welcome!

Espresso AI makes real-time decisions powered by ML models to optimize your Databricks SQL workloads.

Please run the following commands in Databricks SQL. This will set up a user account we can use to access the metadata we need for our models as well as  optimize your Databricks SQL account's operations.

Note that we never access, log, or store any data from Databricks SQL. We only look at metadata.

Prerequisites: Make sure you are a Databricks workspace admin, account admin, and metastore admin.

Steps
‍

1. Open your Databricks workspace, create a Python notebook, and attach it to Serverless compute.
‍
2. In the first cell, install/upgrade the SDK:  

%pip install databricks-sdk --upgrade

3. When finished, create a second cell and restart the Python kernel:  

%restart_python

4. In the next cell, copy-paste the code below and run it:

import json
from datetime import datetime

from databricks.sdk import WorkspaceClient
from databricks.sdk.service import catalog
from databricks.sdk.service.iam import (
    AccessControlRequest,
    ComplexValue,
    Patch,
    PatchOp,
    PatchSchema,
    PermissionLevel,
)
from pydantic import BaseModel, field_validator
from pyspark.sql import SparkSession


class DatabricksOAuthToken(BaseModel):
    id: str
    workspace_id: str
    oauth_secret: str
    created_at: datetime
    expires_at: datetime
    client_id: str

    @field_validator("oauth_secret", "client_id", mode="before")
    @classmethod
    def strip_whitespace(cls, v: str) -> str:
        """Strip whitespace from string fields."""
        return v.strip() if isinstance(v, str) else v


class DatabricksCredentials(BaseModel):
    oauth_token: DatabricksOAuthToken
    workspace_url: str
    service_principal_name: str = "espresso-ai-optimizer"
    service_principal_id: str
    warehouse_id: str
    warehouse_name: str

    def to_json(self) -> str:
        return json.dumps(self.model_dump(mode="json"))


spark = SparkSession.getActiveSession() or SparkSession.builder.getOrCreate()
workspace_url = f"https://{spark.conf.get('spark.databricks.workspaceUrl') or ''}"


def find_workspace_admin_group(client):
    return next(client.groups.list(filter="displayName eq 'admins'")).id


def get_or_create_service_principal(client):
    if sps := list(
        client.service_principals.list(filter="displayName eq 'espresso-ai-optimizer'")
    ):
        sp = sps[0]
        client.service_principals.patch(
            id=sp.id,
            operations=[
                Patch(
                    op=PatchOp.ADD,
                    path="entitlements",
                    value=[
                        {"value": "databricks-sql-access"},
                        {"value": "allow-cluster-create"},
                    ],
                ),
            ],
            schemas=[PatchSchema.URN_IETF_PARAMS_SCIM_API_MESSAGES_2_0_PATCH_OP],
        )
        return sp

    # No existing service principal was found, so create one
    sp = client.service_principals.create(
        display_name="espresso-ai-optimizer",
        active=True,
        entitlements=[
            ComplexValue(value="allow-cluster-create"),
            ComplexValue(value="databricks-sql-access"),
        ],
    )

    return sp


def create_oauth_token(client, service_principal):
    token = client.service_principal_secrets_proxy.create(
        service_principal_id=service_principal.id,
        lifetime=f"{2 * 365 * 24 * 60 * 60}s",  # 2 years
    )
    return DatabricksOAuthToken(
        id=token.id,
        workspace_id=str(client.get_workspace_id()),
        oauth_secret=token.secret,
        created_at=token.create_time,
        expires_at=token.expire_time,
        client_id=service_principal.application_id,
    )


def allow_service_principal_to_manage_warehouses(client, service_principal):
    for warehouse in client.warehouses.list():
        client.permissions.update(
            request_object_type="warehouses",
            request_object_id=warehouse.id,
            access_control_list=[
                AccessControlRequest(
                    service_principal_name=service_principal.application_id,
                    permission_level=PermissionLevel.CAN_MANAGE,
                )
            ],
        )


def make_service_principal_workspace_admin(client, service_principal):
    try:
        client.groups.patch(
            id=find_workspace_admin_group(client),
            operations=[
                Patch(
                    op=PatchOp.ADD,
                    value={"members": [{"value": service_principal.id}]},
                )
            ],
            schemas=[PatchSchema.URN_IETF_PARAMS_SCIM_API_MESSAGES_2_0_PATCH_OP],
        )
    except Exception as e:
        if "already exists" not in str(e).lower() and "duplicate" not in str(e).lower():
            raise


def get_or_create_warehouse(client):
    for warehouse in client.warehouses.list():
        if warehouse.name == "ESPRESSO_AI_WAREHOUSE":
            return warehouse.id

    created_warehouse = client.warehouses.create_and_wait(
        name="ESPRESSO_AI_WAREHOUSE",
        cluster_size="X-Small",
        auto_stop_mins=1,
        enable_serverless_compute=True,
        min_num_clusters=1,
        max_num_clusters=1,
    )
    return created_warehouse.id


def is_account_admin_error(e):
    return "account admin" in e.lower()


def is_metastore_admin_error(e):
    return "does not have MANAGE on catalog" in e.lower() or "metastore" in e.lower()


def allow_service_principal_to_read_system_logs(client, service_principal):
    errors = []

    def grant(asset_name, asset_type, privilege):
        try:
            client.grants.update(
                full_name=asset_name,
                securable_type=asset_type.value,
                changes=[
                    catalog.PermissionsChange(
                        add=[privilege], principal=service_principal.application_id
                    )
                ],
            )
        except Exception as e:
            errors.append(str(e))

    grant("system", catalog.SecurableType.CATALOG, catalog.Privilege.USE_CATALOG)
    for schema in client.schemas.list(catalog_name="system"):
        schema_full_name = f"system.{schema.name}"
        grant(schema_full_name, catalog.SecurableType.SCHEMA, catalog.Privilege.USE_SCHEMA)
        grant(schema_full_name, catalog.SecurableType.SCHEMA, catalog.Privilege.SELECT)

    if any(is_account_admin_error(e) for e in errors):
        print("\n⚠️  ACCOUNT ADMIN required: Ask an account admin to grant you access.")
    if any(is_metastore_admin_error(e) for e in errors):
        print("⚠️  METASTORE ADMIN required: Visit https://accounts.cloud.databricks.com/data")


if __name__ == "__main__":
    client = WorkspaceClient()
    service_principal = get_or_create_service_principal(client)
    oauth_token = create_oauth_token(client, service_principal)
    warehouse_id = get_or_create_warehouse(client)
    allow_service_principal_to_manage_warehouses(client, service_principal)
    make_service_principal_workspace_admin(client, service_principal)
    allow_service_principal_to_read_system_logs(client, service_principal)
    credentials = DatabricksCredentials(
        oauth_token=oauth_token,
        workspace_url=workspace_url,
        service_principal_name=service_principal.display_name,
        service_principal_id=service_principal.id,
        warehouse_id=warehouse_id,
        warehouse_name="ESPRESSO_AI_WAREHOUSE",
    )

    print("\n🎉 Setup complete! Here are the Databricks credentials to send to Espresso AI:")
    print("=" * 50)
    print(credentials.to_json())
    print("=" * 50)
    

5. Copy the JSON credentials printed between the ===== lines and share it securely with Espresso AI.

You can securely upload the output here.

What the Script Does:
- Create or reuse a service principal named "espresso-ai-optimizer" for Espresso AI.
- Grant that service principle the ability to manage SQL Warehouses.
- Grant SELECT access on Databricks system tables used for usage and cost analysis (e.g. query history, warehouse events, node timeline)
- Print a JSON blob with credentials and your workspace URL for you to share securely with Espresso AI.

Troubleshooting

The script provides error messages for permission issues:
1. "You are not a workspace admin" - Contact one of the listed admins for access
2. "ACCOUNT ADMIN PERMISSIONS REQUIRED" - You need account admin to create service principals
3. "METASTORE ADMIN PERMISSIONS REQUIRED" - You need metastore admin to grant system table access

Quick permission checks:
- Verify that you started a serverless notebook, not a notebook on a specific cluster
- Verify you're a workspace admin:
   - Click on the circle with your initials
   - Click on "Settings"
    - You should see "Workspace Admin" section in addition to a "User" section.
- Verify you're an account admin:
   - You should be able to login to https://accounts.cloud.databricks.com/ and see the console to manage your account.
- Verify you're a metastore admin:
   - Visit https://accounts.cloud.databricks.com/data
   - Click on the name of the metastore listed, which will open up a "Configuration" page.
   - Set yourself as the "Metastore Admin".

Please rerun the script once you have the required permissions.

Questions?

Book a Call
Sign up for news
Sign up
Success!
Oops! Something went wrong while submitting the form.
How it WorksPricingAboutCustomersCareersResources
DocumentationOnboardingSaving EstimateTerms of ServiceSupportSecurity
© 2025 Espresso AI. All rights reserved.
Website Visitor Tracking