Cloud

Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. Today, we are announcing the general availability of real time document translation feature that allows customers to translate HTML and Text documents in real time. Until today, to translate documents in real time, customers had to extract the text, translate the text into the target language, and post-process the translated text to convert it into the original file format. With the new real time document translation feature there is no need for any pre-processing or post-processing steps. Customers can use APIs or the AWS console to submit a translation request and receive documents back with source formatting intact.

Read More for the details.

2023 05 24

AWS – Amazon EC2 M1 Mac instances now support beta macOS versions

Starting today, customers can run beta macOS versions on their EC2 M1 Mac instances by registering their instances with the Apple Developer Program, and upgrading to their desired preview, seed, or release candidate macOS versions from within their guest environments. With this capability, Apple developers can now integrate latest macOS features into their applications and test existing applications for compatibility before public macOS releases.

Read More for the details.

2023 05 24

Azure – Public preview: Soft delete of Recovery Points for Azure Backup

Azure Backup’s soft delete of recovery points allows you to recover any recovery points that may have been accidentally or maliciously deleted owing to operations that could lead to deletion of one or more recovery points.

Read More for the details.

2023 05 24

Azure – Virtual Network Integration Support (Public Preview) for Azure Stream Analytics

ASA is thrilled to announce public preview of Virtual Network Integration support for Azure Stream Analytics!

Read More for the details.

2023 05 24

Azure – Azure Stream Analytics is Launching a New Competitive Pricing Model!

Up to 80% price reduction in Azure Stream Analytics

Read More for the details.

2023 05 24

Azure – Announcing Azure AI Content Safety

Azure AI Content Safety is a new Azure AI service that identifies harmful online content, categorizes it, and assigns severity scores, supporting human content moderator workloads.

Read More for the details.

2023 05 24

Azure – Private Preview: Kafka Input and Output with Azure Stream Analytics

You can connect directly to Kafka to ingest from Kafka clusters or output data to Kafka clusters in your Azure Stream Analytics (ASA) job.

Read More for the details.

2023 05 24

Azure – Generally Available: Exactly Once Delivery to ADLS Gen2 Output

Azure Stream Analytics supports for end-to-end exactly once semantics when writing to Azure Data Lake Storage Gen2.

Read More for the details.

2023 05 24

AWS – Amazon MWAA is now SOC Compliant

You can now use Amazon Managed Workflows for Apache Airflow (MWAA) for use cases that are subject to Service Organization Control (SOC) requirements. MWAA is now SOC 1, 2 and 3 compliant, allowing you to get deep insight into the security processes and controls that protect customer data. AWS maintains SOC compliance through extensive third-party audits of AWS controls. These audits ensure that the appropriate safeguards and procedures are in place to protect against security risks that may affect the confidentiality, integrity, and availability of customer and company data. AWS SOC reports are independent third-party examination reports that you can download in AWS Artifact.

Read More for the details.

2023 05 24

AWS – AWS Elemental MediaPackage now supports Low-Latency HLS

Starting today you can package your media streams in Low-Latency HTTP Live Streaming (LL-HLS) format with AWS Elemental MediaPackage. LL-HLS functionality can be used with AWS Elemental MediaLive or AWS Elemental Live v2.25.0, in combination with SCTE-35 for ad insertion workflows, and with SPEKE v2 for multi-key digital rights management (DRM), including key rotation.

Read More for the details.

2023 05 24

AWS – AWS announces new AWS Direct Connect location in Toronto

Today, AWS announced the opening of a new AWS Direct Connect location within the Equinix TR2 data center in Toronto, Canada. By connecting your network to AWS at the new location, you gain private, direct access to all public AWS Regions (except those in China), AWS GovCloud Regions, and AWS Local Zones.

Read More for the details.

2023 05 24

GCP – Benchmarking for AlloyDB and how to do it

AlloyDB Transactional (OLTP) Benchmarking Guide

From time to time, I read and participate in conversations about performance tests comparing AlloyDB with different Postgres deployments on premises and in other cloud environments. In some cases, it sounds similar to the “Sensible Fiesta Test” by Jeremy Clarson in the 12th season of Top Gear. When asked the question “Can I afford it?” He stated that if you had £11,000, you certainly could, but if you had only 40p, then probably not. All other tests and results in that challenge (maybe all of them) were equally useless.

Why did that episode pop up in my mind?

Every test should have some goals and metrics to quantify the result and lead to conclusions and decisions. Let’s imagine a farmer testing the usability of a car by checking if it can carry a sack of potatoes for 40 miles. A small car like Fiat 500 will do the same job as a Ford F150 track and be much cheaper. Does that mean we should ditch all trucks and replace them with small city cars? Of course not.

But why are we testing the performance of a beefy 4 CPU 32GB Postgres using pgbench with a default scale 1 dataset, which is about 24 MB in size? Then we do the same test on a comparable AlloyDB and make a conclusion – both can handle a 24 MB dataset without any noticeable difference in performance. Is it the right conclusion? Absolutely. Is it helpful and valuable? Probably not too much, except of course in the case when you want to figure out if you can use a 4 CPU 32 GB server for a 24 MB dataset.

Is there a right approach for benchmark testing?

So what is the right testing approach? Great question and the answer, as you can guess, of course – “It depends …”. The best possible performance test is when you use real data and real workload. But it is not always achievable and it is possible on the initial stage you want to compare the performance of one database deployment to another. There is no universal solution but some approaches make sense. Let’s take one of them. Here are the high level stages:

Define the goals. Do you want to know if the target environment is able to sustain the same or bigger max load than the current implementation? Or your goal is to achieve the certain workload parameters on a system with minimum CPU and memory?

Prepare requirements for the benchmark tests based on your goal and workload on your current system. The workload can be OLTP, Analytical or both. It will help you with future evaluations.

Identify key metrics to measure output and assess test success. For example it can include transactions per second and conditions like max latency.

Choose the right tool for benchmarks. It might not be the ideal but it has to serve the purpose and provide objective results. Even standard pgbench might do the job if used right. But it depends on the goals and what you want to measure.

Choose the baseline system parameters such as CPU, memory and storage. It can be your database server on premises or a managed service platform. Let’s call it a system to not limit it to a single server installation.

Make sure the client machine(s) with the benchmark tool has the same configuration and capacity on the baseline and the target environments and the network topology is similar or it will be hard to compare the results since the network overhead can be significant for some types of tests.

Create the dataset with appropriate size depending on the system parameters.

Run the tests gradually increasing the load on the baseline system until you reach your max key metrics. For example max throughput or max throughput with latency no more than defined.

The duration of the test should be sufficient to get reliable and repeatable results. Having a test running for 5 min is not enough since some short term factors can skew the results.

Repeat the benchmark several times to get the reliable results.

Move to the system you want to compare and run the same sequence there.

The condition for the testing should be close to the original for the baseline and to the planned implementations for proper comparison. I will talk about conditions and impact later but I would emphasize again the importance of the right approach to the testing.

Inside a database instance

Before going forward let’s have a bird view to data processing in a database. Without going too deep in general the life of data in a page or block for a database starts and ends in the memory. It is a bit simplified approach but in general you need to copy the block to the memory to make any changes. And in relational databases with enabled transaction log all changes are written to the log before they are fixed in the data block. Then the block version with the new data eventually makes its way back to the disk where it replaces the old version of the block. But it doesn’t mean that all the block versions disappeared from the memory. The database instance has a cache where the recently used blocks are stored to reduce unnecessary IO. The size and configuration of the cache depends on the database engine but if we talk about standard postgres we have the buffer cache in the shared memory and filesystem cache behind it.

Database storage and cache.

The AlloyDB cache is a bit more complex due to different concepts behind the storage layer but if a block is in the shared buffer it is going to be accessible by all the processes.

So how is your client session working with the data when it updates a record? The server process for your session gets a copy of an actual version of the block in the buffer, logs the changes to the transaction log, makes necessary changes in the block and marks changes completed. And if we look outside to the work with the cache it is primarily CPU activity.

Life of a database block.

I am not going to sink to a deep dive about internals for AlloyDB database engine, it is not the scope of the blog.If you want to know more about it you can read an excellent blog written by Ravi Murthy about the storage layer for AlloyDB. This chapter aims to show how the settings of a benchmark test can influence the results and potentially make them not fully relevant to the goals of the benchmark test.

Factors and settings during the tests

Let’s start with the dataset size used for testing. The standard pgbench with scale factor 1 produces a dataset roughly 24 MB in size. If your buffer is GB and the filesystem cache is the same size then your data will be almost all the time in the cache. So, basically you test the CPU and kernel on one system versus the other. Will it be useful? Does it represent the planned dataset on your database? It is quite possible then if you have only a 24 MB database you might not need AlloyDB. I would recommend using a scale factor producing a dataset comparable with your current database or projected to the future and it should probably be bigger than effective cache on the instance. For example if you have 16GB memory on the system the scale factor 2000 (parameter “-s 2000”) will create a 34GB dataset which might be sufficient for comprehensive testing not only the CPU but also the storage subsystem.

Speaking about characteristics of your baseline system, you probably need to make it similar to the existing or planned production system in terms of CPU, memory and storage layer. Then you will be able to understand the max load it can sustain and benchmark AlloyDB against it. But I would not try to test minimal basic configuration with 2 vCPU. If we return to the car analogy we know that a heavy truck or semi consumes more fuel just to carry itself because they are more complex and have to have a big heavy frame for example to be able to handle tons of load. The same might be applied to our case. AlloyDB internal services such as multi-tiered cache service, observability agents, health checks and others use CPU cycles and memory to run. So, in a small 2 vCPU instance that overhead might be noticeable. Also if your workload fits to a 2 CPU 16Gb instance you might consider other services like Cloud SQL but keep in mind AlloyDB other features such as vacuum management and intelligent buffering. Network is important too. If your plans are to place the application in the same region connected using AlloyDB Auth Proxy then use the same configuration for the test. If your client VM with the benchmark tool is in another region the network overhead can have a significant impact on the test results. Of course such configuration makes total sense when your future architecture represents such a layout. For example, if you plan to use an application on-prem working with AlloyDB in the cloud.

Speaking about read pools, we might also run some tests for read only load using for example parameter -S for pgbench on the primary instance or on the replica(s) on standalone system and read pools on AlloyDB. It might give you a better picture for mixed workloads.Don’t forget to enable columnar engine on AlloyDB for analytical workload tests. Keep in mind that it is disabled by default. You can test it in automated mode or manually execute tasks to populate the columnar storage in the memory.

And you probably would like to use the index advisor, so it makes sense to enable it too. And a blog about the index advisor and how to work with it is coming, keep your eyes on the cloud blogs.

Summary

Let’s summarize here what I’ve noted and maybe make life easier to those who don’t have time to read through the entire article. To properly run a benchmark testing on the AlloyDB follow these steps.

Define the tests goals

Use right configuration for the baseline system, not minimal available

The client (benchmark) machine(s) should have similar configuration on baseline and target systems

Use sufficiently large dataset or scale factor

Maximize workload on the baseline system

Run each test sufficiently long to get reliable results

Repeat the tests at least 3 times

Use right topology in the cloud, network is important

Enable columnar engine for analytical or hybrid workloads

Enable index advisor if you plan to use it for optimization and use your real workloads for the tests

This is just one of the possible approaches to benchmarking AlloyDB, and depending on your requirements, it may only be the first step in performance evaluation. If you need more guidance on the approach or technical details of benchmark testing, I recommend reading two published guides produced by our AlloyDB team:

AlloyDB Analytical (OLAP) Benchmarking Guide

As I’ve mentioned earlier the best tests are those that use real workload and real data. Benchmarks cannot replace real testing, but they can give you some idea of what you can expect from the system in certain conditions. Happy testing, and don’t hesitate to reach your Google customer representative if you have any questions.

Read More for the details.

2023 05 24

GCP – Four steps to managing your Cloud Logging costs on a budget

As part of our ongoing series on cost management for observability data in Google Cloud, we’re going to share four steps for getting the most out of your logs while on a budget. While we’ll focus on optimizing your costs within Google Cloud, we’ve found that this works with customers with infrastructure and logs on prem and in other clouds as well.

Step 1: Analyze your current spending on logging tools

To get started, create an itemized list of what volume of data is going where and what it costs. We’ll start with the billing report and the obvious line items including those under Operations Tools/Cloud Logging:

Log Volume – the cost to write log data to disk once (see our previous blog post for an explanation)

Log Storage Volume – the cost to retain logs for more than 30 days

If you’re using tools outside Cloud Logging, you’ll also need to include any costs related to these solutions. Here’s a list to get you started:

Log vendor and hardware costs — what are you paying to observability vendors? If you’re running your own logging solution, you’ll want to include the cost of compute and disk.

If you export logs within Google Cloud, include Cloud Storage and BigQuery costs

Processing costs — consider the costs for Kafka, Pub/Sub or Dataflow to process logs. Network egress charges may apply if you’re moving logs outside Google Cloud.

Engineering resources dedicated to managing your logging tools across your enterprise often are significant too!

Step 2: Eliminate waste — don’t pay for logs you don’t need

While not all costs scale directly with volume, optimizing your log volume is often the best way to reduce spend. Even if you are using a vendor with a contract that locks you into a fixed price for a period of time, you may still have costs in your pipeline that can be reduced by avoiding wasteful logs such as Kafka, Pub/Sub or Dataflow costs.

Finding chatty logs in Google Cloud

The easiest way to understand which sources are generating the highest volume of logs within Google Cloud is to start with our pre-built dashboards in Cloud Monitoring. To access the available dashboards:

Go to Monitoring -> Dashboards

Select “Sample Library” -> “Logging”

This blog post has some specific recommendations for optimizing logs for GKE and GCE using prebuilt dashboards.

As a second option, you can use Metrics Explorer and system metrics to analyze the volume of logs. For example, type “log bytes ingested” into the filter. This specific metric corresponds to the Cloud Logging “Log Volume” charge. There are many ways to filter this data. To get a big picture, we often start with grouping by both “resource_type” and “project_id”.

To narrow down the resource type in a particular project, add a “project_id” filter. Select “sum” under the Advanced Options -> Click on Aligner and select “sum”. Sort by volume to see the resources with the highest log volume.

While these rich metrics are great for understanding volumes, you’ll probably want to eventually look at the logs to see whether they’re critical to your observability strategy. In Logs Explorer, the log fields on the left side help you understand volumes and filter logs from a resource type.

Reducing log volume with the Logs Router

Now that we understand what types of logs are expensive, we can use the Log Router and our sink definitions to reduce these volumes. Your strategy will depend on your observability goals, but here are some general tools we’ve found to work well.

The most obvious way to reduce your log volume is not to send the same logs to multiple storage destinations. One common example of this is when a central security team uses an aggregated log sink to centralize their audit logs but individual projects still ingest these logs. Instead, use exclusion filters on the _Default log sink and any other log sinks in each project to avoid these logs. Exclusion filters also work on log sinks to BigQuery, Pub/Sub, or Cloud Storage.

Similarly, if you’re paying to store logs in an external log management tool, you don’t have to save these same logs to Cloud Logging. We recommend keeping a small set of system logs from GCP services such as GKE in Cloud Logging in case you need assistance from GCP support but what you store is up to you, and you can still export them to the destination of your choice!

Another powerful tool to reduce log volume is to sample a percentage of chatty logs. This can be particularly useful with 2XX log balancer logs, for example. This can be a powerful tool, but we recommend you design a sampling strategy based on your usage, security and compliance requirements and document it clearly.

Step 3: Optimize costs over the lifecycle of your logs

Another option to reduce costs is to avoid storing logs for more time than you need them. Cloud Logging charges based on the monthly log volume retained per month. There’s no need to switch between hot and cold storage in Cloud Logging; doubling the default amount of retention only increases the cost by 2%. You can change your custom log retention at any time.

If you are storing your logs outside of Cloud Logging, it is a good idea to compare the cost to retain logs and make a decision.

Step 4: Setup alerts to avoid surprise bills

Once you are confident that the volume of logs being routed through log sinks fit in your budget, set up alerts so that you can detect any spikes before you get a large bill. To alert based on the volume of logs ingested into Cloud Logging:

Go to the Logs-based metrics page. Scroll down to the bottom of the page and click the three dots on “billing/bytes_ingested” under System-defined metrics.

Click “ Create alert from metric”

Add filters (For example: use resource_id or project_id. This is optional).

Select the logs based metric for the alert policy.

You can also set up similar alerts on the volume for log sinks to Pub/Sub, BigQuery or Cloud Storage.

Conclusion

One final way to stretch your observability budget is to use more Cloud Operations. We’re always working to bring our customers the most value possible for their budget such as our latest feature, Log Analytics, which adds querying capabilities but also makes the same data available for analytics, reducing the need for data silos. Many small customers can operate entirely on our free tier. Larger customers have expressed their appreciation for the scalable Log Router functionality available at no extra charge that would otherwise require an expensive event store to process data. So it’s no surprise that a 2022 IDC report showed that more than half of respondents surveyed stated that managing and monitoring tools from public cloud platforms provide more value compared to third-party tools. Get started with Cloud Logging and Monitoring today.

Read More for the details.

2023 05 24

GCP – What’s new in PostgreSQL 15

PostgreSQL 15.2 release notes

Cloud SQL is excited to announce support for PostgreSQL 15. PostgreSQL 15 has many new and valuable enhancements and we thought it would be great to deep dive into these for our users. One can broadly categorize the improvements into five areas:

Security

Developer experience

Performance

Tooling

Observability

Let’s deep dive.

Security

Schema `public` is now private by default

While historically PostgreSQL has had good security defaults when it comes to connecting to the database, the default for new databases was to have the schema ‘public’ open to everybody. Also the PostgreSQL superuser who created the cluster was the owner of the schema ‘public’, so if the database owner was not a superuser they were unable to change the schema access grants or drop the schema.

Starting PostgreSQL 15 this default access is set to “database owner only” and also the schema is owned by the database owner. This is done using the role pg_database_owner which always resolves to the owner of the current database and not through changing the owner of the schema “public” during database creation.

If the database is upgraded from the older version via pg_upgrade or loaded from a pg_dump archive, the access rights stay as they were in the previous version of PostgreSQL.

Also, nothing has changed for the pseudo-role ‘public’ which still means “all users”.

SECURITY INVOKER Views

Another security-related improvement is a new option for a view to *not* act as a security provider where any user who has SELECT rights on the view will automatically have the rights of the owner of the view for any data accessed through this view.

This option is activated using option ‘security_invoker’ when creating the view.

CREATE VIEW … WITH ( security_invoker ) AS <query>;

Conceptually this is similar to SECURITY INVOKER functions, except that the default for functions in PostgreSQL has always been SECURITY INVOKER and SECURITY DEFINER had to be explicitly specified. For views the default is reversed.

Another difference from functions is that in the function call of a SECURITY DEFINER function, the system fully switches to the security context of the definer / owner of the function and anything accessed inside the function will have privileges of the definer.

For the views the behavior is different — even if the main view is not defined as ‘security_invoker’ any sub-views used by that view can still be and so any object access in such views will be checked against callers privileges.

More flexible security in logical replication

Logical replication can now also run as the subscription owner, in which case some extra checks are performed against this user’s privileges. When running this way the replication user must be either a superuser, table owner, or have bypassrls set to be able to replicate into a table with row-level security activated. The user also must have the SELECT privilege for a table in which it tries to replicate UPDATE or DELETE events.

New role to run CHECKPOINT manually

Before PostgreSQL 15, only superusers could run the CHECKPOINT; command. Now any user granted the pre-defined role pg_checkpoint can too.

This is part of an on-going push for more fine-grained control of who can do what.

Other similar predefined roles added over last few versions granting previously superuser-only abilities are pg_read_all_settings, pg_read_all_stats, pg_stat_scan_tables and pg_signal_backend.

Developer experience

MERGE

PostgreSQL now provides the SQL standard command MERGE for delegating to the database the decision of whether to INSERT a new or UPDATE an existing row.

As an example let’s do the famous FizzBuzz exercise using MERGE.

1. Create a Table with all numbers from 0 to 15 which are multiples of 3 with line number in field “i” and string ‘fizz’ as value of field “say”

code_block[StructValue([(u’code’, u”CREATE TABLE fizzbuzz ASrnSELECT i, (CASE WHEN i % 3 = 0 THEN ‘fizz’ ELSE i::text END) as say rn FROM generate_series(0,15,3) f(i);rnrnrnSELECT * FROM fizzbuzz ORDER BY 1;rni | say rn—-+——rn 0 | fizzrn 3 | fizzrn 6 | fizzrn 9 | fizzrn12 | fizzrn15 | fizzrn(6 rows)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e70d0baedd0>)])]

2. Next let’s MERGE in new lines with line number and ‘buzz’ in the “say” field

And let’s remove any lines where i <= 0

code_block[StructValue([(u’code’, u”WITH buzzes AS (rn SELECT i,rn (CASE WHEN i % 5 = 0 THEN ‘buzz’ ELSE i::text END) as sayrn FROM generate_series(0,15) f(i)rn )rn MERGE INTO fizzbuzz frn USING buzzes brn ON f.i = b.irn WHEN NOT MATCHED THENrn INSERT (i, say)rn VALUES(i, say)rn WHEN MATCHED AND f.i <= 0 THENrn DELETErn WHEN MATCHED AND b.say = ‘buzz’ THENrn UPDATE SET say = f.say || b.sayrn ;rnrnrnSELECT * FROM fizzbuzz ORDER BY 1;rn i | say rn—+———-rn 1 | 1rn 2 | 2rn 3 | fizzrn 4 | 4rn 5 | buzzrn 6 | fizzrn 7 | 7rn 8 | 8rn 9 | fizzrn10 | buzzrn11 | 11rn12 | fizzrn13 | 13rn14 | 14rn15 | fizzbuzzrn(15 rows)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e70d0bae1d0>)])]

PostgreSQL already has a way to do some of this in the form of INSERT … ON CONFLICT DO INSTEAD … but this is a non-standard PostgreSQL-specific extension and it also has some “interesting” handling of transaction isolation levels where the behavior there does not follow exactly any of the SQL-standard isolation levels.

Also it depends on Unique Key violations, and it can not do multiple conditional actions.

So even after adding the PRIMARY KEY to the table, we still can’t remove the row 0 :

code_block[StructValue([(u’code’, u”CREATE TABLE oldfizzbuzz ASrnSELECT i, (CASE WHEN i % 3 = 0 THEN ‘fizz’ ELSE i::text END) as say FROM generate_series(0,15,3) f(i);rnrnrnALTER TABLE oldfizzbuzz ADD PRIMARY KEY (i);rnrnrnWITH buzzes AS (rn SELECT i,rn (CASE WHEN i % 5 = 0 THEN ‘buzz’ ELSE i::text END) as sayrn FROM generate_series(0,15) f(i)rn)rnINSERT INTO oldfizzbuzzrnSELECT * FROM buzzesrnON CONFLICT (i) DOrnUPDATE SET say = ‘fizzbuzz’rnWHERE EXCLUDED.say = ‘buzz’;rnSELECT * FROM oldfizzbuzz ORDER BY i;rn i | say rn—-+———-rn 0 | fizzbuzzrn 1 | 1rn 2 | 2rn 3 | fizzrn 4 | 4rn 5 | buzzrn 6 | fizzrn 7 | 7rn 8 | 8rn 9 | fizzrn 10 | buzzrn 11 | 11rn 12 | fizzrn 13 | 13rn 14 | 14rn 15 | fizzbuzzrn(16 rows)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e70d0baee50>)])]

Conclusion: MERGE is much more versatile and powerful than the old way!

Multirange improvements

One of the big improvements in PostgreSQL 14 was support for multirange types. Multirange is a set of non-overlapping ranges and they were needed to make support for range types complete.

For example before multiranges you could not add range(1,3) + range(4,7) as the result was not a single range. With multirange you can. And you can then add range(2,5) to the previous range to get back to a single range(1,7).

When multiranges were added to PostgreSQL 14, somehow the support for the range union aggregator function rang_agg() was left out.

This was fixed in PostgreSQL 15, so now you can:

code_block[StructValue([(u’code’, u”WITH data(name, rval) AS rn(VALUES (‘Bob’,'[1,3)’::int4range)rn , (‘Bob’,'[4,7)’)rn , (‘Jim’,'[4,7)’)rn , (‘Jim’,'[1,6)’)rn , (‘Tom’,'[1,3)’)rn , (‘Tom’,'[3,5)’)rn , (‘Tom’,'[5,7)’)rn)rnSELECT name, range_agg(rval)rn FROM datarn GROUP BY 1;rnrnrnname | range_agg rn——+—————rnTom | {[1,7)}rnBob | {[1,3),[4,7)}rnJim | {[1,7)}rn(3 rows)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e70d0baecd0>)])]

ICU collations can be set as the default for clusters and databases

Previously, only libc-based collations could be selected at the cluster and database levels. ICU collations could only be used via explicit COLLATE clauses.

Now you can do the following to have new database use a specified ICU locale:

code_block[StructValue([(u’code’, u”CREATE DATABASE test_icu_collationrn LOCALE_PROVIDER ‘icu’rn ICU_LOCALE ‘fr-LU-x-icu’rn TEMPLATE template0rn;rnrnrn\l test_icu_collationrnrnrnList of databasesrn-[ RECORD 1 ]—–+——————-rnName | test_icu_collationrnOwner | postgresrnEncoding | UTF8rnCollate | en_US.UTF8rnCtype | en_US.UTF8rnICU Locale | fr-LU-x-icurnLocale Provider | icurnAccess privileges |”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e70c70f1950>)])]

New implementation of CREATE DATABASE

The CREATE DATABASE command was rewritten to WAL-log all the writes it does when it makes a new database as a copy of the template database.

It does much more WAL writing than the old version but as it avoids the CHECKPOINT at the start and end of the command, it is in most cases faster and has less impact on concurrent workloads.

This can be slower than the old version in case of a very large template database — for example in a multi-tenant cluster where the template has lot of schemas, tables and initial data — so the old way of doing it is still available and can be selected by specifying STRATEGY = FILE_COPY in the CREATE DATABASE command. The default of STRATEGY = WAL_LOG is the better one to use in most cases.

Performance

There are more but these are the most interesting new features.

Faster sorting

First, the handling of cases where the sorted data did not fit in work_mem is improved by switching to disk-based sorting with more sort streams.

More cases where sorting can be avoided

Second, improvement for sorting is the ability to allow ordered scans of partitions to avoid sorting in more cases than before so sorting can be replaced by already pre-ordered index scans.

Previously, a partitioned table with a DEFAULT partition or a LIST partition containing multiple values could not be used for ordered partition scans. Now they can be used if such partitions are pruned during planning.

Smarter postgres_fdw

Postgres_fdw is a “foreign data wrapper” which allows exposing tables from other PostgreSQL databases as local tables.

In PostgreSQL 15 there are a few new options:

First, now the query optimizer can send CASE expressions to be executed in the foreign database, lowering the need to fetch more data or even more rows for local processing.

There already was support for pushdown of simpler filters and joins when the wrapper could prove that it was possible to process them fully on the remote side. This, together with the ability to have foreign tables as partitions of local partitioned tables, opens up more ways to use PostgreSQL with distributed data.

Another new feature related to above is the ability to do commits in all foreign servers involved in a transaction in parallel. This will be really helpful in cases of large numbers of foreign tables, which can easily happen in the case of partitioned tables with foreign partitions. This is enabled with the CREATE SERVER option parallel_commit.

Yet another new option, this time not performance related, for foreign tables is postgres_fdw.application_name, which allows setting the application_name used when establishing connections to foreign servers. This lets DBAs and users easily see which connections are opened by postgres_fdw. There are even escape sequences available for customization of the application_name used. Previously the remote session’s application_name could only be set on the remote server or via a postgres_fdw connection specification.

New options in logical replication

Native logical replication has been improved in multiple ways.

First, it now has support for row filtering and column lists.

While row filtering has a set of rules you have to follow for different replication strategies, at a high level, it is specified the same way as one would do for a query:

CREATE PUBLICATION pub2 FOR TABLE table1 WHERE (name like ‘TX%’);

And just rows who have TX in their name will be replicated.

Column lists work in a similar way, allowing one to specify a subset of table columns that are replicated:

CREATE PUBLICATION pub1 FOR TABLE table1 (id, a, c);

Also new is the option FOR TABLES IN SCHEMA, which publishes all current and future tables in a specified schema. Earlier the ALL option was available only database-wide.

And we now have support for proper two-phase commits. For this the replication slot needs to be created with an option called TWO_PHASE.

One sample user of this is pg_recvlogical, which has added a –two-phase option to be used during slot creation.

Logical replication also no longer sends empty transactions. When it finds that there are no DML statements in a decoded transaction for a certain slot, it sends nothing and moves directly on to the next transaction.

It also now detects the case of a partially streamed transaction which has crashed on source and sends info about this to the subscriber. Before, this case caused subscriber to keep such transactions open until the subscriber restarted.

There are now functions to monitor the directory contents of logical replication slots:

pg_ls_logicalsnapdir(), pg_ls_logicalmapdir(), and pg_ls_replslotdir().

They can be run by members of the predefined pg_monitor role.

And although partitioned tables can have foreign tables as partitions, replicating into such a partition isn’t currently supported. The logical replication worker used to crash if it was attempted. Now, an error is thrown.

Comparison with the pglogical extension

While there have been lots of improvements, there are still cases where the pglogical extension is needed.

Native replication has no support for filtering by replication origin, meaning that setting up a bi-directional replication will fail, either resulting in an infinite loop for UPDATE, or in case of INSERT into a table with Primary Key, the replication stops with key violation when trying to replicate the same insert back. In case of insert-only publication, replication will keep inserting the new row over and over again, resulting in unlimited table growth.

(See Setting bi-directional replication for Cloud SQL for PostgreSQL | Google Cloud Blog for how you can do this in Cloud SQL with pglogical)

You can’t define primary to have multiple IP addresses (pg_logical has the concept of “interfaces” for this).

Bi-directional replication support is the most useful of the three. The others are for really rare use cases but perhaps worth mentioning in case you happen to have one of them.

And of course if you need some of the new options when replicating *into* PostgreSQL 15 from an older version, you also still need to use pg_logical, as PostgreSQL core only gets a new feature in latest version.This is different from extensions, where you can often use the latest extension version on many PostgreSQL versions.

Tooling

Improvements to pgbench

The bundled performance testing tool pgbench can now retry serializability errors, including deadlocks. This is good news if you want to test workloads, which occasionally do deadlock or have other serialization violations that could be fixed by re-running the transaction.

For example the standard TPC-C tests define that 10% of transactions are aborted.

Now this should be possible to be tested using pgbench with custom scripts.

Improved psql experience

While psql is already quite amazing, PostgreSQL 15 managed to add even more features for advanced users.

Multi-statement commands

Now psql will return results for all statements in a multi-statement query string.

Pre-15 versions of psql emulated the behavior of sending the whole string to the server and returned only the result of the last statement, even though psql does parse the strings given to it and sends them as separate statements. Now results for each individual statement are returned. To get old behavior, set SHOW_ALL_RESULTS psql variable to off.

(The only way to ask psql to send “select 1; select 2; select 3;” as a single string is to escape the ;, so “select 1; select 2; select 3;” will be sent as a single string)

Faster copy

Now the copy command in psql uses larger chunks to send data thus improving the speed of the copy.

Easier way to show a set of server variables

A new command dconfig is added to show server variables.

This can also handle wildcards, so now dconfig *log* shows all variables with ‘log’ in their names.

Earlier you had to manually run

SELECT name, settings unit FROM pg_settings WHERE name like ‘%log%’

to get this.

Observability

New statistics collection subsystem

The Cumulative Statistics System was rewritten to use shared memory.

In earlier versions there was a special statistics collector process that got the stats from individual back-ends via UDP packets. And the collected stats became available to back-ends after transferring them via file system.

The new system should:

be faster

need less configuration

not randomly lose some collected statistics in case of high workloads (UDP is by design lossy), so counts in pg_stats_* views should be more trustworthy

Monitoring and new monitoring roles

A new statistics view pg_stat_subscription_stats is added for monitoring subscriptions.

Also a view pg_stat_recovery_prefetch which tracks pre-fetching in recovery.

Now pg_stat_statements has new fields for temporary file I/O and JIT counters.

And lastly there are two new server variables:

shared_memory_size to check the size of allocated shared memory

shared_memory_size_in_huge_pages for the number of huge memory pages required

Preparing for larger data volumes

As an interesting feature, the functions to pg_size_pretty() and pg_size_bytes() were updated to be able to convert to Petabytes. Before version 15 the largest unit they knew about was Terabytes :

code_block[StructValue([(u’code’, u’select pg_size_pretty((2^54)::bigint);rnpg_size_prettyrn—————-rn16 PB’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e70c70f1490>)])]

Summary

PostgreSQL continues to innovate and deliver much needed features for the most demanding applications. PostgreSQL 15 is no different and we’re very excited to support it in Cloud SQL.

If you want to learn more about Cloud SQL go to Cloud SQL For PostgreSQL Managed Database.

Links

PostgreSQL 15 release notes – this has the main info for new functionality.

PostgreSQL 15.1 release notes – this and the next link for 15.2 list bug fixes and smaller improvements, like allowing more than 100 columns in column list for logical replications.

Read More for the details.

2023 05 24

GCP – Introducing reCAPTCHA Enterprise Fraud Prevention

Today, we are putting the power of Google’s insights and intelligence in the hands of risk, fraud, and security teams everywhere. We are pleased to announce the general availability of reCAPTCHA Enterprise Fraud Prevention, a new product that uses Google’s own fraud models, machine learning, and intelligence from protecting more than 6 million websites to help stop payment fraud.

reCAPTCHA Enterprise Fraud Prevention can help protect payment transactions by identifying targeted manual attacks and large-scale fraud attempts. It automatically trains fraud models based on behavior and transaction data to identify events that are likely fraudulent and could cause a dispute or chargeback if accepted.

When threat actors are identified and blocked on any site in the reCAPTCHA network, the intelligence is made available to help protect other organizations from those same attackers. Organizations can then use the scores to send the transaction for manual review or directly block suspicious transactions, drive increased trust in their legitimate transactions, reduce the amount of friction for good users, and reduce erroneous rejection rates.

“Being a safe and trusted place to give and receive help is our top priority. We are constantly innovating, refining and enhancing our fraud defenses and Google’s reCAPTCHA Enterprise Fraud Prevention adds another layer of industry-leading protection for our users. Combining Google’s rich security expertise with GoFundMe’s focus on fraud prevention is already showing promising results as we strive to keep our platform the safest place to give online,” said Matthew Murray, director of risk, GoFundMe.

According to data obtained from our customers using reCAPTCHA Enterprise Fraud Prevention in private Preview, the new solution can help drive comprehensive value for organizations by:

Reducing fraud losses: Get behavior modeling (not just mouse movements), stop financial damage created by account takeovers (ATOs), and connect fraud and abuse intelligence across your site and application.

Increasing legitimate sales: Fraud Prevention can help businesses increase their revenue by providing insights to understand who good users are, who may otherwise be caught in an overly aggressive, customer restrictive fraud model.*

These benefits are enabled for organizations by:

Preventing advanced attacks: Stopping sophisticated attacks in an environment where attackers use cloud fraud infrastructure, and data leaks, virtual cards and VPN usage are on the rise

Increasing trust in good transactions: Focus on good approval rates through a unique combination of supervised and unsupervised models

Improving customer experience: By preventing fraudulent transactions with zero user friction, Fraud Prevention can help businesses protect legitimate users and increase customer retention

Our customers have used these new capabilities in preview in customer experiences that include online checkouts and payment transactions. Fraud Prevention has been used by merchants and payment processors to block millions of dollars in fraudulent transactions in a frictionless way.

reCAPTCHA Enterprise provides a comprehensive online fraud detection platform that helps prevent fraudulent, spammy, and abusive digital client activity across your web and application footprint. This platform includes Account Defender, which protects users from account takeovers (ATO), frictionless bot management capabilities, and password leak detection to identify compromised accounts. With an additional API call for Fraud Prevention, organizations can leverage a comprehensive solution to secure financial transactions along with core reCAPTCHA Enterprise capabilities.

To get started, organizations can install a score-based site key on each part of the payment user flow front end and send transaction data when a purchase occurs. This site key helps train site-specific fraud models and start returning fraud scores for each transaction.

Ready to put reCAPTCHA Enterprise Fraud Prevention to work? Contact our fraud specialists or your customer success CSM team and sign up with the free tier. You learn more about these new capabilities in reCAPTCHA Enterprise in our product documentation. You can also register for our webinar where our experts will be discussing this capability in great detail on June 1, 2023.

Coming up at our Security Summit on June 13-14, you can hear how reCAPTCHA Enterprise helped GoFundMe secure their donors from fraud. Sign up here for free Security Summit registration.

Read More for the details.

2023 05 24

GCP – How an open data cloud is enabling Airports of Thailand and EVme to reshape the future of travel

Aviation and accommodation play a big role in impacting the tourism economy, but analysis of recent data also highlights tourism’s impact on other sectors, from financial services to healthcare, to retail and transportation.

With travel recovery in full swing post pandemic, Google search queries related to “travel insurance” and “medical tourism” in Thailand have increased by more than 900% and 500% respectively. Financial institutions and healthcare providers must therefore find ways to deliver tailored offerings to travelers who are seeking peace of mind from unexpected changes or visiting the country to receive specialized medical treatment.

Interest in visiting Thailand for “gastronomy tourism” is also growing, with online searches increasing by more than 110% year-on-year. Players in the food and beverage industry should therefore be looking at ways to better engage tourists keen on authentic Thai cuisine.

Most importantly, digital services will play an integral role in travel recovery. More than one in two consumers in Thailand are already using online travel services, with this category expected to grow 22% year-on-year and contribute US$9 billion to Thailand’s digital economy by 2025. To seize growth opportunities amidst the country’s tourism rebound, businesses cannot afford to overlook the importance of offering always-on, simple, personalized, and secure digital services.

That is why Airports of Thailand (AOT), SKY ICT (SKY) and EVME PLUS (EVme) are adopting Google Cloud’s open data cloud to deliver sustainable, digital-first travel experiences.

Improving the passenger experience in the cloud

With Thailand reopening its borders, there has been an upturn in both inbound and outbound air travel. To accommodate these spikes in passenger traffic across its six international airports, AOT migrated its entire IT footprint to Google Cloud, which offers an open, scalable, and secure data platform, with implementation support from its partner SKY, an aviation technology solutions provider.

Tapping on Google Cloud’s dynamic autoscaling capabilities, the IT systems underpinning AOT’s ground aviation services and the SAWASDEE by AOT app can now accommodate up to 10 times their usual workloads. AOT can also automatically scale down its resources to reduce costs when they are no longer in use. Using the database management services of Google Cloud to eliminate data silos, the organization is able to enhance its capacity to deliver real-time airport and flight information to millions of passengers. As a result, travelers enjoy a smoother passenger experience, from check-in to baggage collection.

At the same time, SKY uses Google Kubernetes Engine (GKE) to transform SAWASDEE by AOT into an essential, all-in-one travel app that offers a full range of tourism-related services. GKE allows AOT to automate application deployment and upgrades without causing downtime. This frees up time for the tech team to accelerate the launch of new in-app features, such as a baggage tracker service, airport loyalty programs, curated travel recommendations, an e-payment system, and more.

EVme drives sustainable travel with data

Being able to travel more efficiently is only one part of the future of travel. More than ever, sustainability is becoming a priority for consumers when they plan their travel itineraries. For instance, search queries related to “sustainable tourism” in Thailand have increased by more than 200% in the past year, with close to four in 10 consumers sharing that they are willing to pay more for a sustainable product or service.

To meet this increasing demand and support Thailand’s national efforts to become a low-carbon society, EVme, a subsidiary of PTT Group, is building its electric vehicle lifestyle app on Google Cloud, the industry’s cleanest cloud. It has also deployed the advanced analytics and business intelligence tools of Google Cloud to offer its employees improved access to data-driven insights, which helps them better understand customer needs and deliver personalized interactions. These insights have helped EVme determine the range of electric vehicle models it offers for rental via its app, so as to cater to different preferences. At the same time, the app can also share crucial information, such as the availability of public electric vehicle charging stations, while providing timely support and 24-hour emergency assistance to customers.

As we empower organizations across industries with intelligent, data-driven capabilities to make smarter business decisions and be part of an integrated ecosystem that delivers world-class visitor experiences, our collaborations with AOT, SKY, and EVme will enhance their ability to serve travelers with personalized, digital-first offerings powered by our secure and scalable open data cloud.

Read More for the details.

2023 05 24

GCP – Making Chrome better for your enterprise users with Chrome Browser Cloud Management

If you’re an IT or security professional, you’re probably very familiar with the advantages of the browser when it comes to your tech stack. It allows your workforce to get more done, making it easier than ever to be productive. But cloud computing, an increase in hybrid work and re-emerging BYOD scenarios can introduce unique challenges for IT and security teams. So how do you keep your workforce productive while keeping your corporate data secure? The browser is a key piece of the puzzle, and the management of that browser is imperative.

Chrome Browser Cloud Management allows you to configure and manage browser policies, settings, apps and extensions across your Chrome browsers, and do it all from a single console—even if your workforce uses multiple operating systems and devices. It also gives you better visibility into the browser and browser versioning, so you can better enable and protect your end users. Using Chrome Browser Cloud Management not only helps you keep your organization more secure, but it also enables you to customize Chrome to meet the needs of your business.

We have a few new capabilities that just launched in Chrome Browser Cloud Management that you should know about:

Get visibility into which browsers need attention

The first new feature is the Chrome Insights report, which alerts admins to browser attributes that may need further inspection. This includes browsers that have not had recent activity, are newly enrolled, and active browsers that have a pending install. This is extremely helpful for troubleshooting, compliance and keeping your browsers up to date.

More insights into the extensions in your environment

We also recently launched extension risk assessments. While extensions can be extremely helpful in powering end user productivity and adding customizations to Chrome, they can also be risky to organizations. We partnered with Spin.AI and CRXcavator to surface the risk level of specific extensions directly through Chrome Browser Cloud Management.

For additional insights into extensions, you can also use the Security Investigation Tool in the Google Admin console. IT and security teams can get alerts when an extension is installed in their environment. We’re also working on notifications for extension permissions changes and ownership changes (coming later this year), so teams can ensure the extensions in their environment are secure.

Finally, we have an exciting improvement to our Extension Workflow. This capability is a favorite among customers and allows end users to request specific extensions, which then go to an approval queue for their admin. From there, the admin can approve or deny the request. Now, we’ve added a prompt for business justification for why users are requesting the extension. This gives admins more information as to why the extension may or may not be helpful for their workforce, and can help speed approvals for business users.

Easier onboarding with Chrome Guides

Chrome Browser Cloud Management is in the Google admin console – you can now find us right on the home page for easier navigation! And there’s a ton of helpful guides available directly in the console to give admins the information they need to complete important steps, like enrolling their browsers, setting policies and viewing reports.

Our team is always working on ways to make Chrome Browser Cloud Management even more helpful for admins. Here’s a few things we have in the works:

Increased visibility into soon-to-be deprecated web features

We prioritized this next report based on customer feedback – the Legacy Technologies Report. It’s coming soon and will proactively report websites (both internal and external) that are using technology that will be deprecated, for example Samesite cookie changes, or older security protocols like TLS 1.1/1.1. This gives admins the ability to work with developers to plan required tech migrations before the deprecation goes into effect. If you’re interested in helping us test this feature, you can sign up for our Trusted Tester program here.

Deleting Chrome browsers at scale

To provide enterprises with more granular control around data retention, we’re working on a new policy that will automatically delete inactive browser information from Google servers. If you’re interested in helping us test this feature, you can sign up for our Trusted Tester program here.

Stay tuned to see what other cool new capabilities we’ll be launching later this year! To get started with Chrome Browser Cloud Management at no cost, sign up here. If you’re already using Chrome Browser Cloud Management, but want to see how you can get the most out of it, check out our newly released Beyond Browsing demo series where Chrome experts share demos on some of our frequently asked management questions. Happy browsing!

Read More for the details.

2023 05 23

AWS – Amazon EC2 X2idn and X2iedn instances now available in (Hyderabad) region

Starting today, memory optimized Amazon Compute Cloud (Amazon EC2) X2idn and X2iedn instances are available in Asia Pacific South (Hyderabad) region. These instances, powered by 3rd generation Intel Xeon Scalable Processors and built with AWS Nitro System, are designed for memory-intensive workloads. They deliver improvements in performance, price performance, and cost per GiB of memory compared to previous generation X1 instances. These instances are SAP-certified for running Business Suite on HANA, SAP S/4HANA, Data Mart Solutions on HANA, Business Warehouse on HANA, SAP BW/4HANA, and SAP NetWeaver workloads on any database.

Read More for the details.

2023 05 23

Azure – Preview: Red Hat Enterprise Linux (RHEL) 9.2 support for AMD confidential VMs