Run DBT 2x faster
Analyzing DBT DAGs can lead to much faster job runtimes.
Read
If you're managing a Snowflake data warehouse, knowing which tables are gathering digital dust can save you significant storage costs and help maintain a cleaner data environment. While Snowflake's Enterprise Edition offers access detection features out of the box via the ACCESS_HISTORY table, there's a way to get the same data using just the Standard Edition and your query history.
One Snowflake user we shared this with was able to find and delete 150TB of stale data - that's over $40,000/year in saved storage costs.
To cut to the chase, here’s a query to find tables that have not been accessed in the last 3 months, or the last accessed date of the table:
WITH tokens AS (
SELECT
start_time,
f.value as token
FROM snowflake.account_usage.query_history,
LATERAL FLATTEN(REGEXP_SUBSTR_ALL(UPPER(query_text), '\\b\\w+\\b')) f
WHERE start_time >= DATEADD(month, -3, CURRENT_TIMESTAMP())
)
SELECT
m.table_catalog,
m.table_schema,
m.table_name,
sum(m.ACTIVE_BYTES + m.TIME_TRAVEL_BYTES) as TOTAL_BYTES,
MAX(t.start_time) as last_query_time
FROM tokens t
RIGHT JOIN snowflake.account_usage.table_storage_metrics m
ON t.token = UPPER(m.table_name)
WHERE m.table_name IS NOT NULL
AND m.TABLE_DROPPED IS NULL
AND m.ACTIVE_BYTES + m.TIME_TRAVEL_BYTES > 0
GROUP BY ALL
ORDER BY last_query_time DESC
Let's break down this query step by step:
tokens
CTE extracts individual tokens from all SQL queries run in the last 3 monthsREGEXP_SUBSTR_ALL
to split queries into tokensLATERAL FLATTEN
converts the array of tokens into rowstable_storage_metrics
to ensure we get ALL tableslast_query_time
TOTAL_BYTES
shows storage cost (active + time travel)last_query_time
shows when the table was last referenced in a querylast_query_time
haven't been queried in 3 monthsThis approach is effective, but it isn’t perfect. In particular, it has the following issues:
Most Snowflake users spend 90% of their bill on compute. If you're looking to cut Snowflake costs, Espresso AI can help you automatically bring down your compute bill by up to 70% - you can get a savings estimate with the script here.
The script is designed specifically for Snowflake Standard Edition users, who do not have access to Snowflake’s native ACCESS_HISTORY feature. Instead, it uses query history analysis to approximate table access patterns, allowing Standard Edition users to identify unused Snowflake tables without requiring Enterprise features.
The script helps with Snowflake cost optimization by identifying unused or rarely accessed Snowflake tables that consume storage space. By deleting these idle tables, organizations can reduce storage costs and maintain a leaner data environment, contributing to overall cloud cost efficiency.
The script compares the list of Snowflake tables in the table_storage_metrics view to the tokenized query history from the past 3 months. If a table does not appear in any recent queries, it is flagged as a potentially unused Snowflake table. This enables data teams to find and remove stale data.
Subscribe to our newsletter. Get exclusive insights delivered straight to your inbox.