data:image/s3,"s3://crabby-images/f3881/f3881ce495ed29d3111892d139050e551e5ff62d" alt="".webp)
A Script to Find and Delete Unused Snowflake Tables without Enterprise Access History
data:image/s3,"s3://crabby-images/04a62/04a6225bab21828a2587cf1531b795d10cc230c3" alt=""
If you're managing a Snowflake data warehouse, knowing which tables are gathering digital dust can save you significant storage costs and help maintain a cleaner data environment. While Snowflake's Enterprise Edition offers access detection features out of the box via the ACCESS_HISTORY table, there's a way to get the same data using just the Standard Edition and your query history.
One Snowflake user we shared this with was able to find and delete 150TB of stale data - that's over $40,000/year in saved storage costs.
The Code
To cut to the chase, here’s a query to find tables that have not been accessed in the last 3 months, or the last accessed date of the table:
WITH tokens AS (
SELECT
start_time,
f.value as token
FROM snowflake.account_usage.query_history,
LATERAL FLATTEN(REGEXP_SUBSTR_ALL(UPPER(query_text), '\\b\\w+\\b')) f
WHERE start_time >= DATEADD(month, -3, CURRENT_TIMESTAMP())
)
SELECT
m.table_catalog,
m.table_schema,
m.table_name,
sum(m.ACTIVE_BYTES + m.TIME_TRAVEL_BYTES) as TOTAL_BYTES,
MAX(t.start_time) as last_query_time
FROM tokens t
RIGHT JOIN snowflake.account_usage.table_storage_metrics m
ON t.token = UPPER(m.table_name)
WHERE m.table_name IS NOT NULL
AND m.TABLE_DROPPED IS NULL
AND m.ACTIVE_BYTES + m.TIME_TRAVEL_BYTES > 0
GROUP BY ALL
ORDER BY last_query_time DESC
How It Works
Let's break down this query step by step:
- Query History Tokenization
- The
tokens
CTE extracts individual tokens from all SQL queries run in the last 3 months - It uses
REGEXP_SUBSTR_ALL
to split queries into tokens - The
LATERAL FLATTEN
converts the array of tokens into rows
- The
- Storage Metrics Join
- We RIGHT JOIN with
table_storage_metrics
to ensure we get ALL tables - Tables that don't appear in any queries will have NULL
last_query_time
- The join matches table names against our extracted tokens
- We RIGHT JOIN with
- Results Analysis
TOTAL_BYTES
shows storage cost (active + time travel)last_query_time
shows when the table was last referenced in a query- Tables with NULL
last_query_time
haven't been queried in 3 months
Limitations to Consider
This approach is effective, but it isn’t perfect. In particular, it has the following issues:
- False Positives: Tables mentioned in comments or unused CTEs might appear as "used"
- Indirect References: Tables accessed through views or stored procedures will not be detected
Storage vs. Compute
Most Snowflake users spend 90% of their bill on compute. If you're looking to cut Snowflake costs, Espresso AI can help you automatically bring down your compute bill by up to 70% - you can get a savings estimate with the script here.
Discover your Savings
What would your Snowflake bill look like with Espresso?
Our AI can tell you.
data:image/s3,"s3://crabby-images/04a62/04a6225bab21828a2587cf1531b795d10cc230c3" alt=""
A Script to Find and Delete Unused Snowflake Tables without Enterprise Access History
data:image/s3,"s3://crabby-images/ce7f6/ce7f6ec3a3daedfdde1566222632ce9159ede068" alt=""
Stop Paying for Snowflake Multicluster
data:image/s3,"s3://crabby-images/a585e/a585e0fbbe8720c7ee0461e02bb11d28d58d8f3e" alt=""