Welcome to the April 2024 edition of the pganalyze newsletter!
Over the last month we have focused on improving the first user experience for pganalyze, especially in regards to setting up pganalyze.
When you're configuring pganalyze, you will often run the "--test" command of the pganalyze collector, which now features an updated test summary. We now have a dedicated setup guide for Amazon EKS, and have improved the starting page for AWS. Plus: We now have a published Helm chart!
At our recent talk at PGDay Chicago we presented our new technical whitepaper for "A Constraint Programming Approach for Index Selection in Postgres". This is a math-heavy deep dive into how our constraint programming model for pganalyze Index Advisor works.
And last but not least - we published two patch releases for pganalyze Enterprise Server, to fix some bugs in the recent release that introduced a new snapshot API server. Upgrading to v2024.04.2 is recommended.
As always, this newsletter also has our latest "5mins of Postgres" episodes, where we just kicked off our new series, Postgres Planner Quirks. Check out the latest episodes below!
All the best,
Lukas
Improved Collector Test Output
The pganalyze collector is the component that integrates your database server with pganalyze. With the new collector test output, introduced in release 0.56.0, you can now quickly see whether all required steps for integration are completed.
This summary gets shown at the end, after regular log output from the test run is shown. You can use it to identify potential problems, for example when the integration with Postgres error logs does not work as expected, or when the statistics helpers are not set up. If you have many database servers, you may find it useful to limit the test to a specific configuration section with the "--test-section=server1" option.
We've spoken at length about our new constraint programming model, be it in our documentation, or our webinar. Today, we're happy to introduce an additional resource that you can dive into: Our new technical whitepaper, first introduced at the recent PGDay Chicago, goes one step further, by sharing the foundation of the constraint programming model in depth.
You will most likely find this beneficial if you're interested in the Computer Science that makes things work behind the scenes. But even if you're not a CS major (or its been quite a while!), you may find this useful to gain a better understanding of the mechanism that achieves a configurable trade-off between Index Write Overhead and query performance.
We discuss the pg_buffercache_evict function recently committed to Postgres 17, that lets you remove individual pages from the Postgres shared buffer cache. We show how this can be used for testing query performance on a test system, and the impact of double buffering.
We discuss faster B-tree index scans in Postgres 17 for queries that involve IN lists or other cases where multiple array values are being passed to Postgres. We show how even simple cases now avoid repeated page access, and how turning filters into index conditions and processing like an Index Skip Scan can yield significant speedups for certain queries.
We discuss streaming and vectored I/O in Postgres 17. This is an important step towards implementing asynchronous I/O in Postgres. In Postgres 17 we do not yet have asynchronous I/O, but we do see a performance benefit from what is essentially a refactoring work around the internal APIs.
To kick off our new "Postgres Planner Quirks" series, we discuss how Postgres behaves when you have a LIMIT and an ORDER BY clause, and it picks the wrong index. This is something that we've actually seen many times on production systems, and in this episode we walk through a few examples of how this can happen and what you can do to resolve issues like this.
Subscribe on YouTube
We publish weekly videos on Postgres, our conference talks, tutorials, product walk-throughs, and more.
Besides minor product updates, these two patch releases contain important patches that resolve issues we discovered when rolling out the new snapshot API mechanism introduced in v2024.04.0
If you haven't upgraded yet, we recommend going directly to v2024.04.2
If you are on v2024.04.0 or v2024.04.1, we recommend upgrading especially if you're seeing any problems with SSL/TLS related web requests to the pganalyze application
In addition to the existing test output, the new summary is added to provide a consolidated result showing the state of the collector setup
Add more verbose output for the "--test-explain" flag
Amazon RDS/Aurora: Use 5432 as a default DB port
Previously IAM authentication would fail with "PAM authentication failed" when the port was not explicitly set in the collector configuration
Publish Helm Chart package
The Helm Chart repository can be accessed via https://charts.pganalyze.com/ (see instructions)
The collector chart is available at `pganalyze/pganalyze-collector`
The oldest available package version is 0.55.0
Docker image: Support taking additional arguments for `test`, `test-explain`, `collector`
Previously, adding the verbose flag like "test -v" wasn't working. With this update, the additional arguments are now correctly passed to the process and "test -v" will run the test with verbose mode
Docker image: Update the internal collector config file location
When the Docker container is passed the "CONFIG_CONTENTS" environment variable, the file used to be written to "/home/pganalyze/.pganalyze_collector.conf" location, and then read by the collector
Instead, this file is now written to the "/config/pganalyze-collector.conf" location - this fixes problems when having a read-only root filesystem
Add `--generate-stats-helper-sql` helper command
This command generates a SQL script that can be passed to `psql` to install stats helpers (e.g. for collecting column stats) on all configured databases for the specified server
Update pg_stat_statements logic: Support updated fields in Postgres 17
Autovacuum: Add support for updated log format (frozen:) in Postgres 16+
Index Advisor
Add support for "ltree", "pg_bigm", "cube" and "earthdistance" extensions
Previously queries depending on these extensions (e.g. by using a "cube" data type) would have shown an error on the Index Advisor tab
These queries now are processed correctly and considered for Index Advisor
Avoid errors when unsupported index types/data types are referenced by a table
There are certain index types that Index Advisor does not support when processing a table definition, like the RUM index access method.
Previously Index Advisor would fail on any tables including such unsupported index types or unsupported data types. Going forward, these will be ignored, so that query analysis can succeed (turning queries into scans), but such indexes will not be considered by Index Advisor.
Similarly, when encountering known unsupported data types (such as the `vector` data type), columns referencing such types will be skipped by Index Advisor.
This may lead to a slightly different error when queries directly reference these columns, but will allow other queries not referencing that column to succeed.
Add ability to filter out acknowledged state insights
Avoid Internal Error for tables that had no scans/query activity in the last 7 days
Other improvements
Automated EXPLAIN
View EXPLAIN Source: Hide "Async Capable" field when converting from JSON to text format
Improve support for handling new lines with the text log format
Rename memory stats that belong to the OS to avoid confusion with Postgres `shared_buffers`
Install Docs: Keep same generated password when moving install between steps
Fix handling of single-line SQL comments in formatted queries
Duboce Labs, Inc., 1501 Mariposa St, Unit 408, San Francisco, CA 94107