All posts

Launching our Databricks SQL Optimizer

Published on
29 Oct 25

Turning the Lakehouse into an Intelligent, Self-Optimizing System

At Espresso AI, we’ve helped Snowflake users cut their data warehouse costs by up to 70% using LLM-driven automation. Today, we’re excited to announce the next step in making data infrastructure more intelligent: Espresso AI for Databricks SQL.

From Data Warehouse to Agentic Lakehouse

Databricks has become the platform of choice for companies building AI-native data architectures. More than 20,000 organizations worldwide- including Block, Comcast, Condé Nast, Rivian, Shell, and over 60% of the Fortune 500 - rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI.

Customers are rapidly expanding their usage. In fact, Databricks is growing roughly twice as fast as Snowflake. But as these systems grow in scale and sophistication, managing performance and cost efficiency also becomes increasingly complex.

Espresso AI now brings the same intelligence that transformed Snowflake cost efficiency to the Databricks ecosystem, transforming it into what we call the agentic lakehouse: a self-optimizing system that continuously learns, predicts, and takes action to eliminate waste and improve performance.

Our new Databricks Agents are powered by models trained on each customer’s unique metadata logs, enabling automatic, context-aware optimization across the lakehouse.

Our Development Process

We worked closely with design partners in an extensive beta program, collaborating with enterprise engineering teams to fine-tune the platform in real-world conditions.

Hundreds of organizations including  major enterprises like Booz Allen Hamilton and Comcast joined our waitlist. Customers in the beta reported cost reductions  up to 50% and performance improvements, all without manual tuning.

Meet the Three Agents Powering It All

  • Autoscaling Agent: Predicts usage spikes and adjusts compute in real time to prevent over-provisioning.
  • Scheduling Agent: Maximizes utilization by routing workloads to existing resources, eliminating idle clusters.
  • Query Agent: Optimizes every SQL query before it hits the lakehouse, improving performance and reducing cost across the board.

Together, these agents act as a 24/7 optimization layer for Databricks, continuously learning, adapting, and saving.

The result? Databricks users can cut their bill in half and dramatically improve efficiency with zero manual tuning.

Espresso AI for Databricks is available starting today.


Make your Databricks environment autonomous and efficient today - book a demo!

Frequently Asked Questions

Cloud costs are hard to control because Databricks workloads exhibit highly variable performance profiles, often involving unpredictable ETL spikes, schema-dependent SQL plans, and underutilized clusters created for isolation or SLA guarantees. Traditional rule-based autoscaling cannot model these dynamics. Cost overruns emerge from cluster over-provisioning and suboptimal query plans - problems that require continuous, model-driven inference rather than static configurations.

Espresso AI applies model-based optimization using agents trained on each customer’s metadata logs. These models predict workload intensity, resource requirements, and query performance characteristics in real time. The Autoscaling Agent infers upcoming compute demand, the Scheduling Agent allocates jobs based on resource availability, and the Query Agent rewrites SQL before execution. Together, they form an optimization system that continuously adjusts the lakehouse environment to minimize idle timeand maximize cluster and query-level efficiency.

The Databricks SQL lakehouse model differs from Snowflake’s architecture in three fundamental ways: openness, execution flexibility, and metadata design. The lakehouse stores all data in open object storage (e.g., Parquet/Delta on S3, ADLS, GCS), allowing multiple compute engines—including Databricks SQL, Spark, Photon, and ML runtimes—to operate directly on the same files. Snowflake, by contrast, uses a proprietary storage layer accessible only through Snowflake’s own virtual warehouses, which tightly couples storage access to Snowflake-managed compute. The lakehouse’s open transaction and metadata layer (Delta Lake + Unity Catalog) enables cross-engine governance, adaptive file optimization, and interoperability with external systems, while Snowflake relies on a fully closed, vertically integrated metadata subsystem.

Databricks is the full Data Intelligence Platform that powers the lakehouse - providing unified storage, governance, Spark compute, ML runtimes, orchestration, and Delta Lake transactions - while Databricks SQL is the specialized SQL analytics engine within that platform. Databricks supports all workload types (ETL, ML, streaming, batch), whereas Databricks SQL provides a warehouse-like experience optimized for high-performance SQL using the Photon engine. Both operate on the same Delta Lake data, but Databricks SQL focuses specifically on interactive analytics and BI workloads, while the broader Databricks platform handles end-to-end data and AI operations.

Ben Lerner
Co-Founder and CEO
Share this post

Never miss an update

Subscribe to our newsletter. Get exclusive insights delivered straight to your inbox.