.webp)
Sign up for our
Databricks optimizer
Welcome!
Espresso AI makes real-time decisions powered by ML models to optimize your Databricks SQL workloads.
Please run the following commands in Databricks SQL. This will set up a user account we can use to access the metadata we need for our models as well as optimize your Databricks SQL account's operations.
Note that we never access, log, or store any data from Databricks SQL. We only look at metadata.
Prerequisites: Make sure you are a Databricks workspace admin, account admin, and metastore admin.
Steps
1. Open your Databricks workspace, create a Python notebook, and attach it to Serverless compute.
2. In the first cell, install/upgrade the SDK:
%pip install databricks-sdk --upgrade3. When finished, create a second cell and restart the Python kernel:
%restart_python4. In the next cell, copy-paste the code below and run it:
import json
from datetime import datetime
from databricks.sdk import WorkspaceClient
from databricks.sdk.service import catalog
from databricks.sdk.service.iam import (
AccessControlRequest,
ComplexValue,
Patch,
PatchOp,
PatchSchema,
PermissionLevel,
)
from pydantic import BaseModel, field_validator
from pyspark.sql import SparkSession
class DatabricksOAuthToken(BaseModel):
id: str
workspace_id: str
oauth_secret: str
created_at: datetime
expires_at: datetime
client_id: str
@field_validator("oauth_secret", "client_id", mode="before")
@classmethod
def strip_whitespace(cls, v: str) -> str:
"""Strip whitespace from string fields."""
return v.strip() if isinstance(v, str) else v
class DatabricksCredentials(BaseModel):
oauth_token: DatabricksOAuthToken
workspace_url: str
service_principal_name: str = "espresso-ai-optimizer"
service_principal_id: str
warehouse_id: str
warehouse_name: str
def to_json(self) -> str:
return json.dumps(self.model_dump(mode="json"))
spark = SparkSession.getActiveSession() or SparkSession.builder.getOrCreate()
workspace_url = f"https://{spark.conf.get('spark.databricks.workspaceUrl') or ''}"
def find_workspace_admin_group(client):
return next(client.groups.list(filter="displayName eq 'admins'")).id
def get_or_create_service_principal(client):
if sps := list(
client.service_principals.list(filter="displayName eq 'espresso-ai-optimizer'")
):
sp = sps[0]
client.service_principals.patch(
id=sp.id,
operations=[
Patch(
op=PatchOp.ADD,
path="entitlements",
value=[
{"value": "databricks-sql-access"},
{"value": "allow-cluster-create"},
],
),
],
schemas=[PatchSchema.URN_IETF_PARAMS_SCIM_API_MESSAGES_2_0_PATCH_OP],
)
return sp
# No existing service principal was found, so create one
sp = client.service_principals.create(
display_name="espresso-ai-optimizer",
active=True,
entitlements=[
ComplexValue(value="allow-cluster-create"),
ComplexValue(value="databricks-sql-access"),
],
)
return sp
def create_oauth_token(client, service_principal):
token = client.service_principal_secrets_proxy.create(
service_principal_id=service_principal.id,
lifetime=f"{2 * 365 * 24 * 60 * 60}s", # 2 years
)
return DatabricksOAuthToken(
id=token.id,
workspace_id=str(client.get_workspace_id()),
oauth_secret=token.secret,
created_at=token.create_time,
expires_at=token.expire_time,
client_id=service_principal.application_id,
)
def allow_service_principal_to_manage_warehouses(client, service_principal):
for warehouse in client.warehouses.list():
client.permissions.update(
request_object_type="warehouses",
request_object_id=warehouse.id,
access_control_list=[
AccessControlRequest(
service_principal_name=service_principal.application_id,
permission_level=PermissionLevel.CAN_MANAGE,
)
],
)
def make_service_principal_workspace_admin(client, service_principal):
try:
client.groups.patch(
id=find_workspace_admin_group(client),
operations=[
Patch(
op=PatchOp.ADD,
value={"members": [{"value": service_principal.id}]},
)
],
schemas=[PatchSchema.URN_IETF_PARAMS_SCIM_API_MESSAGES_2_0_PATCH_OP],
)
except Exception as e:
if "already exists" not in str(e).lower() and "duplicate" not in str(e).lower():
raise
def get_or_create_warehouse(client):
for warehouse in client.warehouses.list():
if warehouse.name == "ESPRESSO_AI_WAREHOUSE":
return warehouse.id
created_warehouse = client.warehouses.create_and_wait(
name="ESPRESSO_AI_WAREHOUSE",
cluster_size="X-Small",
auto_stop_mins=1,
enable_serverless_compute=True,
min_num_clusters=1,
max_num_clusters=1,
)
return created_warehouse.id
def is_account_admin_error(e):
return "account admin" in e.lower()
def is_metastore_admin_error(e):
return "does not have MANAGE on catalog" in e.lower() or "metastore" in e.lower()
def allow_service_principal_to_read_system_logs(client, service_principal):
errors = []
def grant(asset_name, asset_type, privilege):
try:
client.grants.update(
full_name=asset_name,
securable_type=asset_type.value,
changes=[
catalog.PermissionsChange(
add=[privilege], principal=service_principal.application_id
)
],
)
except Exception as e:
errors.append(str(e))
grant("system", catalog.SecurableType.CATALOG, catalog.Privilege.USE_CATALOG)
for schema in client.schemas.list(catalog_name="system"):
schema_full_name = f"system.{schema.name}"
grant(schema_full_name, catalog.SecurableType.SCHEMA, catalog.Privilege.USE_SCHEMA)
grant(schema_full_name, catalog.SecurableType.SCHEMA, catalog.Privilege.SELECT)
if any(is_account_admin_error(e) for e in errors):
print("\n⚠️ ACCOUNT ADMIN required: Ask an account admin to grant you access.")
if any(is_metastore_admin_error(e) for e in errors):
print("⚠️ METASTORE ADMIN required: Visit https://accounts.cloud.databricks.com/data")
if __name__ == "__main__":
client = WorkspaceClient()
service_principal = get_or_create_service_principal(client)
oauth_token = create_oauth_token(client, service_principal)
warehouse_id = get_or_create_warehouse(client)
allow_service_principal_to_manage_warehouses(client, service_principal)
make_service_principal_workspace_admin(client, service_principal)
allow_service_principal_to_read_system_logs(client, service_principal)
credentials = DatabricksCredentials(
oauth_token=oauth_token,
workspace_url=workspace_url,
service_principal_name=service_principal.display_name,
service_principal_id=service_principal.id,
warehouse_id=warehouse_id,
warehouse_name="ESPRESSO_AI_WAREHOUSE",
)
print("\n🎉 Setup complete! Here are the Databricks credentials to send to Espresso AI:")
print("=" * 50)
print(credentials.to_json())
print("=" * 50)
5. Copy the JSON credentials printed between the ===== lines and share it securely with Espresso AI.
You can securely upload the output here.
What the Script Does:
- Create or reuse a service principal named "espresso-ai-optimizer" for Espresso AI.
- Grant that service principle the ability to manage SQL Warehouses.
- Grant SELECT access on Databricks system tables used for usage and cost analysis (e.g. query history, warehouse events, node timeline)
- Print a JSON blob with credentials and your workspace URL for you to share securely with Espresso AI.
Troubleshooting
The script provides error messages for permission issues:
1. "You are not a workspace admin" - Contact one of the listed admins for access
2. "ACCOUNT ADMIN PERMISSIONS REQUIRED" - You need account admin to create service principals
3. "METASTORE ADMIN PERMISSIONS REQUIRED" - You need metastore admin to grant system table access
Quick permission checks:
- Verify that you started a serverless notebook, not a notebook on a specific cluster
- Verify you're a workspace admin:
- Click on the circle with your initials
- Click on "Settings"
- You should see "Workspace Admin" section in addition to a "User" section.
- Verify you're an account admin:
- You should be able to login to https://accounts.cloud.databricks.com/ and see the console to manage your account.
- Verify you're a metastore admin:
- Visit https://accounts.cloud.databricks.com/data
- Click on the name of the metastore listed, which will open up a "Configuration" page.
- Set yourself as the "Metastore Admin".
Please rerun the script once you have the required permissions.
Questions?