New implementation of CREATE DATABASE
The CREATE DATABASE command was rewritten to WAL-log all the writes it does when it makes a new database as a copy of the template database.
It does much more WAL writing than the old version but as it avoids the CHECKPOINT at the start and end of the command, it is in most cases faster and has less impact on concurrent workloads.
This can be slower than the old version in case of a very large template database — for example in a multi-tenant cluster where the template has lot of schemas, tables and initial data — so the old way of doing it is still available and can be selected by specifying STRATEGY = FILE_COPY in the CREATE DATABASE command. The default of STRATEGY = WAL_LOG is the better one to use in most cases.
Performance
There are more but these are the most interesting new features.
Faster sorting
First, the handling of cases where the sorted data did not fit in work_mem is improved by switching to disk-based sorting with more sort streams.
More cases where sorting can be avoided
Second, improvement for sorting is the ability to allow ordered scans of partitions to avoid sorting in more cases than before so sorting can be replaced by already pre-ordered index scans.
Previously, a partitioned table with a DEFAULT partition or a LIST partition containing multiple values could not be used for ordered partition scans. Now they can be used if such partitions are pruned during planning.
Smarter postgres_fdw
Postgres_fdw is a “foreign data wrapper” which allows exposing tables from other PostgreSQL databases as local tables.
In PostgreSQL 15 there are a few new options:
First, now the query optimizer can send CASE expressions to be executed in the foreign database, lowering the need to fetch more data or even more rows for local processing.
There already was support for pushdown of simpler filters and joins when the wrapper could prove that it was possible to process them fully on the remote side. This, together with the ability to have foreign tables as partitions of local partitioned tables, opens up more ways to use PostgreSQL with distributed data.
Another new feature related to above is the ability to do commits in all foreign servers involved in a transaction in parallel. This will be really helpful in cases of large numbers of foreign tables, which can easily happen in the case of partitioned tables with foreign partitions. This is enabled with the CREATE SERVER option parallel_commit.
Yet another new option, this time not performance related, for foreign tables is postgres_fdw.application_name, which allows setting the application_name used when establishing connections to foreign servers. This lets DBAs and users easily see which connections are opened by postgres_fdw. There are even escape sequences available for customization of the application_name used. Previously the remote session’s application_name could only be set on the remote server or via a postgres_fdw connection specification.
New options in logical replication
Native logical replication has been improved in multiple ways.
First, it now has support for row filtering and column lists.
While row filtering has a set of rules you have to follow for different replication strategies, at a high level, it is specified the same way as one would do for a query:
CREATE PUBLICATION pub2 FOR TABLE table1 WHERE (name like ‘TX%’);
And just rows who have TX in their name will be replicated.
Column lists work in a similar way, allowing one to specify a subset of table columns that are replicated:
CREATE PUBLICATION pub1 FOR TABLE table1 (id, a, c);
Also new is the option FOR TABLES IN SCHEMA, which publishes all current and future tables in a specified schema. Earlier the ALL option was available only database-wide.
And we now have support for proper two-phase commits. For this the replication slot needs to be created with an option called TWO_PHASE.
One sample user of this is pg_recvlogical, which has added a –two-phase option to be used during slot creation.
Logical replication also no longer sends empty transactions. When it finds that there are no DML statements in a decoded transaction for a certain slot, it sends nothing and moves directly on to the next transaction.
It also now detects the case of a partially streamed transaction which has crashed on source and sends info about this to the subscriber. Before, this case caused subscriber to keep such transactions open until the subscriber restarted.
There are now functions to monitor the directory contents of logical replication slots:
pg_ls_logicalsnapdir(), pg_ls_logicalmapdir(), and pg_ls_replslotdir().
They can be run by members of the predefined pg_monitor role.
And although partitioned tables can have foreign tables as partitions, replicating into such a partition isn’t currently supported. The logical replication worker used to crash if it was attempted. Now, an error is thrown.
Comparison with the pglogical extension
While there have been lots of improvements, there are still cases where the pglogical extension is needed.
Native replication has no support for filtering by replication origin, meaning that setting up a bi-directional replication will fail, either resulting in an infinite loop for UPDATE, or in case of INSERT into a table with Primary Key, the replication stops with key violation when trying to replicate the same insert back. In case of insert-only publication, replication will keep inserting the new row over and over again, resulting in unlimited table growth.
(See Setting bi-directional replication for Cloud SQL for PostgreSQL | Google Cloud Blog for how you can do this in Cloud SQL with pglogical)
You can’t define primary to have multiple IP addresses (pg_logical has the concept of “interfaces” for this).
Bi-directional replication support is the most useful of the three. The others are for really rare use cases but perhaps worth mentioning in case you happen to have one of them.
And of course if you need some of the new options when replicating *into* PostgreSQL 15 from an older version, you also still need to use pg_logical, as PostgreSQL core only gets a new feature in latest version.This is different from extensions, where you can often use the latest extension version on many PostgreSQL versions.
Tooling
Improvements to pgbench
The bundled performance testing tool pgbench can now retry serializability errors, including deadlocks. This is good news if you want to test workloads, which occasionally do deadlock or have other serialization violations that could be fixed by re-running the transaction.
For example the standard TPC-C tests define that 10% of transactions are aborted.
Now this should be possible to be tested using pgbench with custom scripts.
Improved psql experience
While psql is already quite amazing, PostgreSQL 15 managed to add even more features for advanced users.
Multi-statement commands
Now psql will return results for all statements in a multi-statement query string.
Pre-15 versions of psql emulated the behavior of sending the whole string to the server and returned only the result of the last statement, even though psql does parse the strings given to it and sends them as separate statements. Now results for each individual statement are returned. To get old behavior, set SHOW_ALL_RESULTS psql variable to off.
(The only way to ask psql to send “select 1; select 2; select 3;” as a single string is to escape the ;, so “select 1; select 2; select 3;” will be sent as a single string)
Faster copy
Now the copy command in psql uses larger chunks to send data thus improving the speed of the copy.
Easier way to show a set of server variables
A new command dconfig is added to show server variables.
This can also handle wildcards, so now dconfig *log* shows all variables with ‘log’ in their names.
Earlier you had to manually run
SELECT name, settings unit FROM pg_settings WHERE name like ‘%log%’
to get this.
Observability
New statistics collection subsystem
The Cumulative Statistics System was rewritten to use shared memory.
In earlier versions there was a special statistics collector process that got the stats from individual back-ends via UDP packets. And the collected stats became available to back-ends after transferring them via file system.
The new system should:
be faster
need less configuration
not randomly lose some collected statistics in case of high workloads (UDP is by design lossy), so counts in pg_stats_* views should be more trustworthy
Monitoring and new monitoring roles
A new statistics view pg_stat_subscription_stats is added for monitoring subscriptions.
Also a view pg_stat_recovery_prefetch which tracks pre-fetching in recovery.
Now pg_stat_statements has new fields for temporary file I/O and JIT counters.
And lastly there are two new server variables:
shared_memory_size to check the size of allocated shared memory
shared_memory_size_in_huge_pages for the number of huge memory pages required
Preparing for larger data volumes
As an interesting feature, the functions to pg_size_pretty() and pg_size_bytes() were updated to be able to convert to Petabytes. Before version 15 the largest unit they knew about was Terabytes :