Postgres + OpenTelemetry tracing, pganalyze on the Stack Overflow podcast & more

View in browser

Hi there,

Welcome to the September 2023 edition of the pganalyze newsletter!

Last month we've had the honor of joining some of our favorite podcasts (Stack Overflow podcast, O11ycast & others!), to talk about Postgres performance optimization, database observability best practices, the role of data platform teams, how to scale Postgres expertise, and more. See direct links to all podcast episodes later below.

We've also released the first version of a pganalyze feature we're very excited about: The new OpenTelemetry tracing integration for combining application-side tracing info ("which of my app requests was slow, and why") with Postgres EXPLAIN plans. This adds a direct link to pganalyze from within your application observability tools, enabling easier performance optimization work for the whole team. Read more below, or give it a try!

And of course as always you can find our 5mins of Postgres episodes (including two on the recently released Postgres 16), as well as recent pganalyze product improvements in this newsletter.

All the best,

Lukas

Exporting Postgres EXPLAIN plans as traces to OpenTelemetry pipelines with pganalyze

Over the last couple of years, the OpenTelemetry standard has emerged as a unified way of collecting traces, metrics and logs from applications in a vendor-agnostic way. And as the most mature part of it, the tracing component of OpenTelemetry works well for connecting data points across different services.

What if we treated the database as another service, that is producing tracing data for slow query plans? This would allow us to have a unified view of application performance, including specifics on database slowdowns.

There are a couple of steps to make this work:

1. Sending trace parent information to the database. We can rely on the "traceparent" method of propagating context across services, by having applications add it to queries as a query tag in a SQL comment. You can utilize existing libraries such as sqlcommenter to enable this in your application.

2. Tracking detailed query plans for slow queries with the auto_explain extension (supported by all released Postgres versions), and logging them to the Postgres error log, preserving the trace parent as a query comment.

3. Parsing the Postgres logs using the pganalyze collector, and having it construct a span for each collected EXPLAIN plan, with the appropriate parent span configured, and then sending it to an OTLP endpoint.

The good news: We have a first version of this ready for testing! You can try this out today with the new pganalyze-collector 0.52.0 release, and give application teams a direct link to the full query plan, and tuning recommendations by pganalyze, right within existing application tracing tools, such as Honeycomb or New Relic!

For more background on why this matters, you can listen to the recent O11ycast episode, where Lukas spoke with Charity and Jess from Honeycomb on the benefits of combining database and application observability.

We'll have more details on how this works in the next couple of weeks, with specific instructions for different APM tools - in the meantime, don't hesitate to reach out with questions or feedback!

Learn more about OpenTelemetry exporter settings

Podcasts with pganalyze

Stack Overflow Podcast: How to write high-performance SQL for your Postgres database→ (24min)

Ben and Ryan talk with Lukas on why Lukas considers himself a founder-CEO by title but engineer by trade, what’s important to understand about optimizing slow SQL queries, and the role of AI in database optimization.

O11ycast with Charity Majors and Jessica Kerr: Observability in the Database with Lukas Fittl of pganalyze→ (36min)

Charity and Jess from Honeycomb speak with Lukas about database observability. In the podcast we explored the challenges of making it intuitive to understand why databases are slow, the open source library sqlcommenter, query plans, and insights on supporting application teams.

Tech ONTAP Podcast: Improving Postgres with pganalyze→ (39min)

Lukas joined the Tech ONTAP Podcast to discuss the ins and outs of Postgres, how to improve performance, the impact of storage latency, and how pganalyze makes life better for data platform engineers.

Software Engineering Radio: Lukas Fittl on Postgres Performance→ (71min)

SE Radio host Philip Winston speaks with Lukas about database indexing, queries, maintenance, scaling, and stored procedures. We also discussed some features of pganalyze, such as Index Advisor and VACUUM Advisor.

5mins of Postgres

E82: Postgres 16 is released!→

We give a rundown of Postgres 16 features we find exciting, as well as a few nuggets of improvements that you might not know about yet.

E83: EXPLAIN (ANALYZE, BUFFERS) and interpreting shared buffers hit in Nested Loops→

We take a close look at using EXPLAIN (ANALYZE, BUFFERS) to optimize Postgres queries, and what to know when interpreting the shared hits counter for Nested Loops.

E84: 4 improvements in Postgres 16 that you might not know about yet→

We feature four new features in Postgres 16 that you might not know about yet. You’ll learn about pg_input_is_valid, pg_input_error_info, improvements to \watch, number separators in the SQL:2023 standard, and more!

E85: Postgres I/O Basics, and how to efficiently pack table pages→

We talk about understanding Postgres IOPS, why looking at I/O performance matters even when your data fits into the cache, and how Postgres structures its physical table structure - and how we can query the "ctid" field to find the actual location of a table page.

Subscribe on YouTube

We publish weekly videos on Postgres, our conference talks, tutorials, product walk-throughs, and more.

Subscribe to pganalyze on YouTube

Product updates

VACUUM Advisor

VACUUM Blocked by Xmin Horizon check: Fix an edge case where pganalyze was not correctly triggering an alert due to a problem holding back the Xmin Horizon for a very long time
Increase density of events shown on VACUUM Timeline for multi-day time selections
VACUUM Activity: Update lane label for manual VACUUMs in VACUUM Timeline for better distinction from autovacuums

OpenTelemetry integration

Optionally configure pganalyze to export EXPLAIN plans as OpenTelemetry tracing spans (requires pganalyze-collector 0.52.0)
Read more in the collector documentation

Query Performance

Include query tags in header of query detail page, and treat query fingerprint and role as tags automatically assigned by pganalyze
Fix loading of query stats when a query cannot be tied to a Postgres role
Query details page: Show IN/VALUES lists with multiple parameters as ($0) instead of (?)
Summary runtime graph: Show entry with runtime as 0.0 if no queries have run at a given time
- This helps distinguish periods of no activity from integration issues where the collector did not send data into pganalyze

Other improvements and bugfixes

pganalyze-collector release 0.52.0
- OpenTelemetry tracing integration
- Fix normalization of query texts collected from pg_stat_activity when using "filter_query_sample = normalize"
- Allow concurrent locks on individual tuples/pages whilst collecting table statistics
  - This previously caused table statistics to sometimes be missing on busy tables (since the collector intentionally skips locked tables)
Alerts & Check-Up: Make email notifications available for High Replication Lag alert
GraphQL API: Add additional filter options for getServers API
Timeseries graphs: Fix client-side cache keys to avoid stale data when switching object

Review us on G2

Please leave your feedback (good or bad!) in this survey - we appreciate it!

Fill out the G2 survey