Using bpftrace for postgres query execution tracing

Being able to understand where the query time is spent in PostgreSQL is a bit confusing. To understand what I mean by 'confusing', a fundamental concept that must be understood is the different statement execution phases of the PostgreSQL query engine. These are:

Parse
Rewrite
Plan
Execute

These phases must[1] be executed in this order before the final phase of Execute can perform the actual request that was submitted.

pg_stat_statements

In lots of cases, pg_stat_statements is used to see the time a query took, using the total_time column (up to PostgreSQL version 12) or the total_exec_time (PostgreSQL version 13 or higher). Both columns show the time spent in the Execution phase only.

With pg_stat_statement with PostgreSQL version 13 this (the fact that total_time is not the full picture) becomes somewhat more evident because of the addition of the total_plan_time column, which makes it more obvious that total_exec_time needs more detail. However, this still omits the Parse and Rewrite phases.

log_min_duration_statement

As far as I see, most of the other PostgreSQL database utilities do omit the total execution time that includes all phases, with the sole exception of log_min_duration_statement(!). When log_min_duration_statement is set to a value higher than 0 and an execution exceeds this time, it logs the query text and duration (time) for the actual total time for the simple query protocol or divided into parse (which contains the server-side parse and rewrite phases), bind (which includes the server-side plan phase) and execute (which obviously includes the execute phase) for the extended query protocol.

The logging takes place in the central PostgreSQL log file, and the measurements do try to give an impression of the time as seen by the database client, but from the PostgreSQL backend, it is not a close registration of the actual server-side execution phases.

bpftrace: `query-analyzer.bt`

What if you want to closely track what is going on from the viewpoint of the server-side execution phases? This is what query-analyzer.bt does: it tracks the fundamental phases of parse, rewrite, plan and execute and times these. Currently, it is intended to closely trace or debug one or more sessions. Available via: github.com/FritsHoogland/postgres-bpftrace...

Note: the postgres executables that are available on Docker and in the deb and rpm formats seem to have the probes in the PostgreSQL source that allow this functionality to be enabled, which is what `query-analyzer.bt` uses.

This is what that looks like:

root@d3afc158acbe:/# ./postgres-bpftrace/query-analyzer.bt
Attaching 19 probes...
PostgreSQL statement execution analyzer.
Time in microseconds (us).
pid   :Phase      :time to phase :time in phase : query
------|-----------|--------------|--------------|------
[13432]Query start:              :              : select '1';
[13432] parse     : (       667) :            66: select '1';
[13432] rewrite   : (        58) :            51: select '1';
[13432] plan      : (        43) :           175: select '1';
[13432] execute   : (       430) :          1230: select '1';
[13432]Query done : (      1791) :          4515: select '1';

simple query protocol execution

This is the query select '1'; executed from psql. The psql utility/client currently always uses the simple query protocol. The simple query protocol performs the query steps, but these are sent to the database backend with a single token 'Q'. The start of executing everything in a single server-side step is shown with Query start, and is done once Query done indicates the single server-side execution is done.

Once the query is started, it has to be syntactically parsed, which is what in the parse phase is performed. The time in time to phase indicates how much wall clock time is spent to get to the phase from the previous phase. For parse, this means the time between the PostgreSQL backend starting with the simple query execution until it reaches the parse phase.

The next step is the rewrite phase. The rewrite phase performs two tasks: a semantical parse (such as: does the table mentioned exist, and is the current user allowed to access it), and the actual rewrite step if that is needed (views require rewriting into the actual query as an example).

Both phases of parse and rewrite work on a parse tree and that parse tree is used by the planner, which creates an execution tree for the last step, the executor. The task of the planner is to calculate the most efficient way of execution given the options that the database objects involved allow (such as using an index or performing a sequential scan).

This all leads up to the execution phase, which takes the execution tree and performs the actual work of producing a result given the execution tree that was produced from the provided statement.

extended query protocol execution

If the same statement is taken and performed using the extended query protocol it looks like this:

[13444] parse     : (         0) :           173: select '1'
[13444] rewrite   : (        77) :           171: select '1'
[13444] plan      : (       943) :           198: select '1'
[13444] execute   : (       103) :            36: select '1'

These are the exact same steps, but we lose the total query time. This has a very good reason: using the extended query protocol, the client requests individual steps to be executed independently instead of using a single step that includes all these steps which we saw above with the simple query protocol.

The extended query protocol allows submitting multiple phases in a single request sent to the database backend, or single phases. The most common extended query protocol calls are:

Parse: which contains both the server-side parse and rewrite phases.
Bind: which contains the server-side plan phase.
Execute: which contains the server-side execute phase.

The probes that query-analyzer.bt uses are inside the individual parse, rewrite, plan and execute phases, which is why the analyzer can show these phases for both the simple query protocol as well as the extended query protocol, because, despite the different protocols, these are performing the same basic processing phases.

not-so-strict phases

By using the query-analyzer.bt you can see what is going on for query processing. This also allows validation of advice that I performed in the past of using prepared statements, and whether PostgreSQL must always perform these phases, no matter what.

Let's look at a very simple example:

[13432]Query start:              :              : set log_min_duration_statement=-1;
[13432] parse     : (       273) :           366: set log_min_duration_statement=-1;
[13432] rewrite   : (       117) :            89: set log_min_duration_statement=-1;
[13432] execute   : (       698) :           246: set log_min_duration_statement=-1;
[13432]Query done : (       181) :          1973: set log_min_duration_statement=-1;

Do you spot the oddity? The plan phase is gone! There is a very good reason for this: the backend is performing work that is not SQL, and thus it doesn't need planning; it can perform the requested task of setting a (session) variable directly; the execute therefore is not executing using an execution tree, but performing the change of the variable directly.

Let's look at another example:

[13432]Query start:              :              : prepare t as select now();
[13432] parse     : (       770) :           787: prepare t as select now();
[13432] rewrite   : (       396) :           112: prepare t as select now();
[13432] rewrite   : (      1194) :          1897: prepare t as select now();
[13432] execute   : (       851) :          2405: prepare t as select now();
[13432]Query done : (       117) :          5441: prepare t as select now();

Here there are two rewrites, quite probably because of the actual execution first, and the second one for the preparation of the statement that is set to become a prepared statement. Also, the plan phase is missing here.

As you might suspect, the execution of a prepared statement in psql will result in a non-standard sequence too:

[13501]Query start:              :              : execute t;
[13501] parse     : (        47) :            27: execute t;
[13501] rewrite   : (        44) :             9: execute t;
[13501] plan      : (       173) :            42: execute t;
[13501] execute   : (         0) :            50: execute t;
[13501] execute   : (       124) :           444: execute t;
[13501]Query done : (        86) :           783: execute t;

The query processing goes through the expected phases first and then needs an additional execute for performing the execute of t.

However, I assume using a prepared statement in psql in most situations will be an exception. This is how prepared statement execution looks like when done via the extended query protocol:

[13523] parse     : (         0) :           146: select $1
[13523] rewrite   : (        46) :           103: select $1
[13523] plan      : (       957) :           122: select $1
[13523] execute   : (       102) :            41: select $1
[13523] plan      : (       470) :            12: select $1
[13523] execute   : (        82) :             4: select $1

This is completely (as I) expected: the statement execution goes through the parse, rewrite, plan and execute phases first when the prepared statement is introduced, and a second execution that explicitly uses the prepared statement can skip the parse and rewrite phases. The prepared statement stores the result of the parse and rewrite steps, which makes it obvious these do not have to be performed again.

When I tried to come up with an as simple as possible example, I performed the above without a variable, and the used phases looked like this:

[13522] parse     : (         0) :           290: select '1'
[13522] rewrite   : (       144) :           260: select '1'
[13522] plan      : (      1173) :           364: select '1'
[13522] execute   : (       161) :           209: select '1'
[13522] execute   : (       831) :             4: select '1'

If a statement has no filters, there are no variables that can alter the plan, and thus the execution of a prepared statement can skip the plan phase and only perform the execute phase. This requires that logic to be present in the client too, which ultimately is the one requesting the steps.

Another construction that performs something that breaks with the common pattern: insert into select:

[13501]Query start:              :              : insert into t select id, repeat('x',10) from generate_series(1,
[13501] parse     : (        72) :            57: insert into t select id, repeat('x',10) from generate_series(1,
[13501] rewrite   : (        44) :           693: insert into t select id, repeat('x',10) from generate_series(1,
[13501] plan      : (       418) :           165: insert into t select id, repeat('x',10) from generate_series(1,
[13501] execute   : (         0) :           814: insert into t select id, repeat('x',10) from generate_series(1,
[13501] execute   : (        62) :           892: insert into t select id, repeat('x',10) from generate_series(1,
[13501]Query done : (      1908) :          4314: insert into t select id, repeat('x',10) from generate_series(1,

I am pretty sure this works with multiple portals in the execution phase, where the first portal is the insert, and it finds that it needs to execute the select in the second portal to get the results that are then used to perform the insert

Finally, something that quite drastically can alter the sequence of phases is execution pipelining:

[13530] parse     : (         0) :           207: select 'one'
[13530] rewrite   : (        81) :           161: select 'one'
[13530] parse     : (         0) :            15: select 'two'
[13530] rewrite   : (        11) :             4: select 'two'
[13530] parse     : (         0) :             3: select 'three'
[13530] rewrite   : (        63) :             5: select 'three'
[13530] plan      : (       856) :           214: select 'one'
[13530] execute   : (       215) :            55: select 'one'
[13530] plan      : (       149) :             9: select 'two'
[13530] execute   : (        30) :             4: select 'two'
[13530] plan      : (        36) :             4: select 'three'
[13530] execute   : (        19) :             3: select 'three'

Three statements are each parsed, and then executed.

Query analyzer: github.com/FritsHoogland/postgres-bpftrace...

[1] These phases must be executed: I do hope the article makes it clear that in principle this is true, but that in certain specific cases, steps can be skipped.