We're sharing our follow up post on constraint programming in action, a recording of a talk at North Bay Python, as well as the best of "5mins of Postgres" episodes in this year so far.
Welcome to the July 2024 edition of the pganalyze newsletter!
Continuing our series on constraint programming, we put CP into practice based on our experience optimizing index selection in Postgres. We deep dive into what makes Postgres Index Selection a difficult optimization problem and how utilizing a CP model with multi-objective optimization enables developer-driven selection of goals, such as prioritizing lower Index Write Overhead over the best possible read performance.
At pganalyze, we build tools to enable all engineers across different teams to solve Postgres performance issues. An important building block to using pganalyze effectively is to understand how Postgres works, and, in particular, how it plans its queries. I recently gave a talk at North Bay Python aimed at Python backend engineers looking to understand query optimization better. Using Ruby on Rails? I also did a similar talk at the SF Ruby Meetup in July if you're interested.
5mins of Postgres took a short break in July, but we'll be back in August. In the meantime, enjoy a roundup of our most popular episodes so far this year. And, as always, you'll find the product updates at the end.
Depending on the workload, there could be hundreds of possible index combinations on a given table, which makes manual testing an impossible task. In a follow up to this blog post, Philippe Olivier deep dives into thePostgresindex selection problem and how constraint programming can be applied by using a solver (CP-SAT) and Python programming.
The CP Model shared in the blog post is a simplified version of the model available with thepganalyze Indexing Engine, which you can utilize as part of the Index Advisor available in a free 14-day trial.
I gave a 25min talk with slides at North Bay Python last month that broke down the challenges around query optimization and why it matters. In this talk I cover:
We discuss partitioning Postgres tables by timestamp based UUIDs. We also talk about the status of UUIDv7 in the current Postgres development branch. We have a look at Chris O'Brien's great write-up on how to do time-based partitioning with ULID, and are quite excited about being able to do the same with the upcoming UUIDv7.
We discuss how Figma scaled out their Postgres installation by 100x over 4 years, and recently switched to horizontal sharding using their DBProxy query proxy. We also compare their approach to Notion's sharding setup, as well as the Citus extension for Postgres.
We discuss how to go about finding the source of a locking problem in Postgres. Specifically, we look at how to debug a lock problem, how to end a process that’s holding a lock in Postgres, and show general helpful settings for handling Postgres locking.
We discuss a commonly encountered Postgres planner quirk, which is how Postgres behaves when you have a LIMIT and an ORDER BY clause, and it picks the wrong index.
We discuss how to optimize the row size in Postgres to reduce the overall table size. Specifically, we take a look at how column ordering affects storage size, how you can play column tetris to optimize it, and the big benefits that array-based storage can yield for some workloads.
Log Insights: Fix handling of syntax error events when STATEMENT is missing
Log Insights: Support multi-line logs with Google AlloyDB
Test run: Improve handling of interrupts via CTRL+C (SIGINT)
Avoid collector hanging and don't print summary
Allow HTTP clients to be cancelled to avoid shutdown delays
Add option to avoid collecting distributed index stats for Citus
This allows setting the DISABLE_CITUS_SCHEMA_STATS / disable_citus_schema_stats setting to the "index" value, which will cause the collector to skip collecting index statistics for Citus distributed tables (which can time out when there is a significant count of indexes)
Install script: Avoid deprecated usage of apt-key command