Postgres
Last updated at
Views and Materialized Views
Cache an expensive query with a materialized view with historical data (except for the latest two days), then union it with the live data. This way, we get the best of both worlds: fast query times and up-to-date data.
-- create the materialized viewCREATE MATERIALIZED VIEW historical_data ASSELECT id, avg(duration) as avg_durationFROM logsWHERE timestamp < NOW() - INTERVAL '2 days';-- combine hisotical data with live dataSELECT id, avg(duration) as avg_durationFROM logsWHERE timestamp >= NOW() - INTERVAL '2 days'UNION ALLSELECT id, avg_durationFROM historical_data;-- refresh the materialized view every some timeREFRESH MATERIALIZED VIEW historical_data;
Working with Data Duplication
Pick the latest record per group with DISTINCT ON
and ORDER BY
-- get the highest score for each student and the associated subjectSELECT DISTINCT ON (name) name, subject, scoreFROM student_scoresORDER BY name, score DESC;-- get the latest request_duration for each URLSELECT DISTINCT ON (url) url, request_durationFROM logsORDER BY url, timestamp DESC
Add a is_duplicate
flag with window functions
CREATE VIEW table_with_duplicates AS (SELECT id,ROW_NUMBER() OVER (PARTITION BY column1, column2, ...ORDER BY created_at DESC) > 1 AS is_duplicatefrom table_name)-- then delete the duplicatesDELETE FROM table_with_duplicatesWHERE id in (SELECT idFROM table_with_duplicatesWHERE is_duplicate)