Hi there,

Welcome to the July 2024 edition of the pganalyze newsletter!

Continuing our series on constraint programming, we put CP into practice based on our experience optimizing index selection in Postgres. We deep dive into what makes Postgres Index Selection a difficult optimization problem and how utilizing a CP model with multi-objective optimization enables developer-driven selection of goals, such as prioritizing lower Index Write Overhead over the best possible read performance.

At pganalyze, we build tools to enable all engineers across different teams to solve Postgres performance issues. An important building block to using pganalyze effectively is to understand how Postgres works, and, in particular, how it plans its queries. I recently gave a talk at North Bay Python aimed at Python backend engineers looking to understand query optimization better. Using Ruby on Rails? I also did a similar talk at the SF Ruby Meetup in July if you're interested.

5mins of Postgres took a short break in July, but we'll be back in August. In the meantime, enjoy a roundup of our most popular episodes so far this year. And, as always, you'll find the product updates at the end.

All the best,

Lukas

Blog post: Constraint Programming in Action - Optimizing Postgres Index Selection

constraint_programming_in_action_optimizing_postgres_index_selection_pganalyze

Depending on the workload, there could be hundreds of possible index combinations on a given table, which makes manual testing an impossible task. In a follow up to this blog post, Philippe Olivier deep dives into the Postgres index selection problem and how constraint programming can be applied by using a solver (CP-SAT) and Python programming.

The CP Model shared in the blog post is a simplified version of the model available with the pganalyze Indexing Engine, which you can utilize as part of the Index Advisor available in a free 14-day trial.

Read the blog post

Tech Talk Recording: Explaining Postgres Query Optimization to Developers

optimizing_postgres_for_python_developers-1

I gave a 25min talk with slides at North Bay Python last month that broke down the challenges around query optimization and why it matters. In this talk I cover:

The iterative process of optimizing a query
How to interpret EXPLAIN (ANALYZE, BUFFERS)
Cost estimation and selectivity
JOIN order and Parameterized Index Scans
5 Actionable Tips!

View the tech talk recording

This Year's Best 5mins of Postgres

E98: How to partition Postgres tables by timestamp based UUIDs→

We discuss partitioning Postgres tables by timestamp based UUIDs. We also talk about the status of UUIDv7 in the current Postgres development branch. We have a look at Chris O'Brien's great write-up on how to do time-based partitioning with ULID, and are quite excited about being able to do the same with the upcoming UUIDv7.

E106: How Figma built DBProxy for sharding Postgres→

We discuss how Figma scaled out their Postgres installation by 100x over 4 years, and recently switched to horizontal sharding using their DBProxy query proxy. We also compare their approach to Notion's sharding setup, as well as the Citus extension for Postgres.

E99: Finding the root cause of locking problems in Postgres→

We discuss how to go about finding the source of a locking problem in Postgres. Specifically, we look at how to debug a lock problem, how to end a process that’s holding a lock in Postgres, and show general helpful settings for handling Postgres locking.

E113: Postgres Planner Quirks: The impact of ORDER BY + LIMIT on index usage→

We discuss a commonly encountered Postgres planner quirk, which is how Postgres behaves when you have a LIMIT and an ORDER BY clause, and it picks the wrong index.

E100: Reducing table size with optimal column ordering and array-based storage→

We discuss how to optimize the row size in Postgres to reduce the overall table size. Specifically, we take a look at how column ordering affects storage size, how you can play column tetris to optimize it, and the big benefits that array-based storage can yield for some workloads.

Subscribe on YouTube

We publish weekly videos on Postgres, our conference talks, tutorials, product walk-throughs, and more.

Subscribe to pganalyze on YouTube

Product updates

pganalyze-collector v0.57.0

Log Insights: Fix handling of syntax error events when STATEMENT is missing
Log Insights: Support multi-line logs with Google AlloyDB
Test run: Improve handling of interrupts via CTRL+C (SIGINT)

Avoid collector hanging and don't print summary
Allow HTTP clients to be cancelled to avoid shutdown delays

Add option to avoid collecting distributed index stats for Citus

This allows setting the DISABLE_CITUS_SCHEMA_STATS / disable_citus_schema_stats setting to the "index" value, which will cause the collector to skip collecting index statistics for Citus distributed tables (which can time out when there is a significant count of indexes)

Install script: Avoid deprecated usage of apt-key command
Read the full release notes

Other improvements and bugfixes

Add more predefined time periods to date picker
Index Advisor: Resolve pgvector extension not being detected correctly
Remove support for collector versions below 0.40.0

If your collector is too old, you will now see a notice to remind you to upgrade
See documentation for steps to upgrade

Schema Statistics:

Increase width of schema name column in table listing
Clarify VALID state of indexes on index listing to make it easier to notice indexes that are invalid (e.g. due to a cancelled REINDEX)

Review us on G2

Please leave your feedback (good or bad!) in this survey - we appreciate it!

Fill out the G2 survey