All posts

A Script to Find and Delete Unused Snowflake Tables without Enterprise Access History

Published on
21 Feb 25

If you're managing a Snowflake data warehouse, knowing which tables are gathering digital dust can save you significant storage costs and help maintain a cleaner data environment. While Snowflake's Enterprise Edition offers access detection features out of the box via the ACCESS_HISTORY table, there's a way to get the same data using just the Standard Edition and your query history.

One Snowflake user we shared this with was able to find and delete 150TB of stale data - that's over $40,000/year in saved storage costs.

The Code

To cut to the chase, here’s a query to find tables that have not been accessed in the last 3 months, or the last accessed date of the table:

WITH tokens AS (
  SELECT
    start_time,
    f.value as token 
  FROM snowflake.account_usage.query_history,
  LATERAL FLATTEN(REGEXP_SUBSTR_ALL(UPPER(query_text), '\\b\\w+\\b')) f
  WHERE start_time >= DATEADD(month, -3, CURRENT_TIMESTAMP())
)
SELECT 
  m.table_catalog,
  m.table_schema,
  m.table_name,
  sum(m.ACTIVE_BYTES + m.TIME_TRAVEL_BYTES) as TOTAL_BYTES,
  MAX(t.start_time) as last_query_time
FROM tokens t
RIGHT JOIN snowflake.account_usage.table_storage_metrics m 
  ON t.token = UPPER(m.table_name)
WHERE m.table_name IS NOT NULL 
  AND m.TABLE_DROPPED IS NULL
  AND m.ACTIVE_BYTES + m.TIME_TRAVEL_BYTES > 0
GROUP BY ALL
ORDER BY last_query_time DESC

How It Works

Let's break down this query step by step:

  1. Query History Tokenization
    • The tokens CTE extracts individual tokens from all SQL queries run in the last 3 months
    • It uses REGEXP_SUBSTR_ALL to split queries into tokens
    • The LATERAL FLATTEN converts the array of tokens into rows
  2. Storage Metrics Join
    • We RIGHT JOIN with table_storage_metrics to ensure we get ALL tables
    • Tables that don't appear in any queries will have NULL last_query_time
    • The join matches table names against our extracted tokens
  3. Results Analysis
    • TOTAL_BYTES shows storage cost (active + time travel)
    • last_query_time shows when the table was last referenced in a query
    • Tables with NULL last_query_time haven't been queried in 3 months

Limitations to Consider

This approach is effective, but it isn’t perfect. In particular, it has the following issues:

  1. False Positives: Tables mentioned in comments or unused CTEs might appear as "used"
  2. Indirect References: Tables accessed through views or stored procedures will not be detected

Storage vs. Compute

Most Snowflake users spend 90% of their bill on compute. If you're looking to cut Snowflake costs, Espresso AI can help you automatically bring down your compute bill by up to 70% - you can get a savings estimate with the script here

Frequently Asked Question

No items found.
Alex Kouzemtchenko
Founder
Share this post

Never miss an update

Subscribe to our newsletter. Get exclusive insights delivered straight to your inbox.