Test drive PostgreSQL with wait event probes on Docker

This is a short blog introducing a docker build file that I created to quickly build a container for test driving PostgreSQL with wait event probes; see databaseperformance.hashnode.dev/timed-wait.. about that.

To create a container, you should have docker setup, which can be validated using the docker info command:

➜ docker info
Client: Docker Engine - Community
 Version:    24.0.6
 Context:    colima
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2
    Path:     /Users/fritshoogland/.docker/cli-plugins/docker-buildx

Server:
...

If docker is not setup correctly you see:

➜ docker info
Client: Docker Engine - Community
 Version:    24.0.6
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2
    Path:     /Users/fritshoogland/.docker/cli-plugins/docker-buildx

Server:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info

If docker is setup, git clone the postges-bpftrace repository:

git clone https://github.com/FritsHoogland/postgres-bpftrace.git

And cd to postgres-bpftrace/docker. Now run the script build, which is a simple wrapper script for the docker build command:

➜ ./build

This will create the container for you.

All that is needed after the build is to run the container:

➜ ./run
postgres@10fc6773997c:~$

Now start the database:

postgres@10fc6773997c:~$ ./start_database
waiting for server to start.... done
server started
postgres@10fc6773997c:~$

And run psql:

postgres@10fc6773997c:~$ psql
psql (17devel)
Type "help" for help.

postgres=#

To use bpftrace, use another shell and execute into the container using the exec script, which also can be found in the postgres-pbftrace/docker directory:

➜ ./exec
postgres@10fc6773997c:~$

Now you can get histograms from wait events per pid for example:

postgres@10fc6773997c:~$ sudo ./postgres-bpftrace/wait-histograms-pid.bt
Attaching 183 probes...
Wait event times are in microseconds(us).

Perform an action in the psql session, such as:

postgres=# select count(*) from generate_series(1,1000000);
  count
---------
 1000000
(1 row)

postgres=#

And interrupt the bpftrace session:

^C



@wait_event_hist[3745, io, DATA_FILE_READ]:
[4, 8)                 8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  |
[8, 16)               12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32)               2 |@@@@@@@@                                            |

@wait_event_hist[3745, io, BUFFILE_TRUNCATE]:
[4, 8)              1058 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16)              647 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     |
[16, 32)               3 |                                                    |
[32, 64)               1 |                                                    |

@wait_event_hist[3745, io, BUFFILE_READ]:
[2, 4)              1679 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4, 8)                29 |                                                    |
[8, 16)                2 |                                                    |




postgres@10fc6773997c:~$

What you can see is there are 3 distinct waits, all from the io class, and all from pid 3745. The times are in microseconds, and the latency times are incredibly low. That is because these calls all do perform a system call for IO, but the requests are fully serviced from the operating system file cache.

(word of warning: BUFFILE_TRUNCATE should be BUFFILE_WRITE, but these are switched from the file that lists the events (?))