AWS – AWS Transit Gateway Connect is now available in two additional AWS Regions
Starting today, AWS Transit Gateway Connect is available in Asia Pacific (Osaka) and Asia Pacific (Jakarta) Regions.
Read More for the details.
Starting today, AWS Transit Gateway Connect is available in Asia Pacific (Osaka) and Asia Pacific (Jakarta) Regions.
Read More for the details.
Starting today AWS Transit Gateway supports internet group management protocol (IGMP) multicast in Asia Pacific (Osaka) and Asia Pacific (Jakarta) AWS Regions.
Read More for the details.
AWS Backup now provides you a way to centrally view your Amazon CloudWatch metrics for your data protection jobs, directly in the AWS Backup console. With this launch, you can monitor your data protection metrics (of backup, copy, and restore jobs) for all the AWS Backup supported services, spanning compute, storage, databases, and third-party applications. To drill down to a custom view, you can add your tracked metrics to a custom CloudWatch Dashboard using the “Add to dashboard” capability.
Read More for the details.
AWS CloudFormation announces the general availability of a new transform supporting extensions to the CloudFormation template language. AWS CloudFormation is an Infrastructure as Code (IaC) service that lets you model, provision, and manage AWS and third-party resources by authoring templates which are formatted text files in JSON or YAML. This release introduces a language transform called ‘AWS::LanguageExtensions.’ When declared in a template, the transform enables extensions to the template language. At launch, these include: new intrinsic functions for length (Fn::Length) and JSON string conversion (Fn::ToJsonString), and support for intrinsic functions and pseudo-parameter references in update and deletion policies.
Read More for the details.
Amazon SageMaker Studio is a fully integrated development environment (IDE) for machine learning that enables data scientists and developers to perform every step of the machine learning workflow, from preparing data to building, training, tuning, and deploying models. SageMaker Studio is integrated with AWS CloudTrail to enable administrators to monitor and audit user activity and API calls from Studio notebooks, SageMaker Data Wrangler and SageMaker Canvas. Starting today, you can configure SageMaker Studio to also record the user identity (specifically, user profile name) in CloudTrail events thereby enabling administrators to attribute those events to specific users, thus improving their organization’s security and governance posture.
Read More for the details.
QuickSight Q is now available in four new regions (in addition to six existing regions) – Asia Pacific (Mumbai, Singapore, Sydney) and Canada (Central). AWS customers can signup for QuickSight in these four new regions in addition to existing regions, details can also be found at AWS QuickSight Q regions. Get started with a free trial of Amazon QuickSight Q.
Read More for the details.
Starting today, Amazon EC2 C6id, M6id and R6id instances are available in AWS Region Asia Pacific (Tokyo). C6id, M6id and R6id instances are powered by 3rd generation Intel Xeon Scalable Ice Lake processors, with an all-core turbo frequency of 3.5 GHz, up to 7.6 TB of local NVMe-based SSD block-level storage, and up to 15% better price performance than their 5th generation counterpart instances.
Read More for the details.
Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning practitioners get started on training and deploying machine learning models quickly. These algorithms and models can be used for both supervised and unsupervised learning. They can process various types of input data including tabular, image, and text.
Read More for the details.
As we talk to customers large and small, we are seeing more and more “data-rich” workloads moving to the cloud. Customers are collecting more valuable data than ever before, and they want that data from different sources to be centralized and normalized before running analytics on it. Storage is becoming the common substrate for enabling higher-value services like data lakes, modeling and simulation, big data, and AI and machine learning. These applications demand the flexibility of object storage, the manageability of file storage, and the performance of block storage — all on one platform.
As your needs evolve, we’ve committed to delivering products that deliver enterprise-ready performance and scale, that support data-driven applications, that enable business insights while being easy to manage, and that protect your data from loss or disaster.
Last year, we made continental-scale application availability and data centralization easier by expanding the number of unique Cloud Storage Dual Regions, and adding the Turbo Replication feature, which is available across nine regions and three continents. This feature gives you a single, continent-sized bucket, effectively delivering a RTO of zero and optional RPO of less than 15 minutes. This also makes app design easier with high availability and a single set of APIs regardless of where data is stored.
At today’s digital customer event, A Spotlight on Storage, we announced a number of storage innovations — here are a few that highlight our commitments to you.
Advancing our enterprise-readiness, we announced Google Cloud Hyperdisk, the next generation of Persistent Disk, bringing you the ability to easily and dynamically tune the performance of your block storage to your workload. With Hyperdisk, you can provision IOPS and throughput independently for applications and adapt to changing application performance needs over time.
We also launched Filestore Enterprise multishare for Google Kubernetes Engine (GKE). This new service enables administrators to seamlessly create a Filestore instance and carve out portions of the storage to be used simultaneously across one or thousands of GKE clusters. It also offers nondisruptive storage upgrades in the background while GKE is running, and a 99.99% regional storage availability SLA. This, combined with Backup for GKE, truly enables enterprises to modernize by bringing their stateful workloads into GKE.
Based on your input, we continue to evolve our storage to support your data-driven applications. To make it easier to manage storage and help you optimize your costs, we’ve developed a new Cloud Storage feature called Autoclass, which automatically moves objects based on last access time, by policy, to colder or warmer storage classes. We have seen many of you do this manually, and are excited to offer this easier and automated policy-based option to optimize Cloud Storage costs.
“Not only would it cost valuable engineering resources to build cost-optimization ourselves, but it would open us up to potentially costly mistakes in which we incur retrieval charges for prematurely archived data. Autoclass helps us reduce storage costs and achieve price predictability in a simple and automated way.” —Ian Mathews, co-founder, Redivis
We’re focused on delivering you more business insights from your storage choices, making it easier to manage and optimize your stored data. With the release of the new Storage Insights, you gain actionable insights about the objects stored in Cloud Storage. Whether you’re managing millions or trillions of objects, you have the information you need to make informed storage management decisions, and easily answer questions like, “How many objects are there? Which bucket are they located in?” Then, when paired with products like BigQuery, you can imagine organizations building unique dashboards to visualize insights about their stored data. The possibilities are truly exciting.
Lastly, to help you protect your most valuable applications and data we announced Google Cloud Backup and DR. This service is a fully integrated data-protection solution for critical applications and databases (e.g., Google Cloud VMware Engine, Compute Engine, and databases like SAP HANA) that lets you centrally manage data protection and disaster recovery policies directly within the Google Cloud console, and fully protect databases and applications with a few mouse clicks.
Choosing to build your business on Google Cloud is choosing the same foundation that Google uses for planet-scale applications like Photos, YouTube, and Gmail. This approach, built over the last 20 years, allows us to deliver exabyte-scale and performant services to enterprises and digital-first organizations. This storage infrastructure is based on Colossus, a cluster-level global file system that stores and manages your data while providing the availability, performance, and durability to Google Cloud storage services such as Cloud Storage, Persistent Disk, Hyperdisk, and Filestore.
Bring in our state-of-the-art dedicated Google Cloud backbone network (which has nearly 3x the throughput of AWS and Azure1) and 173 network edge locations, and you start to see how our infrastructure is fundamentally different: It’s our global network paired with disaggregated compute and storage built on Colossus that brings you the benefits of speed and resilience to your applications.
To learn more about our latest product innovations, watch our 75-minute Spotlight on Storage, and visit our storage pages to learn more about our products.
Read More for the details.
Backup is a fundamental aspect of application protection. As such, the need for a seamlessly integrated, centralized backup service is vital when seeking to ensure resilience and recoverability for data generated by Google Cloud services or on-premises infrastructure.
Regardless of whether the need to restore data is triggered by a user error, malicious activity, or some other reason, the ability to execute reliable, fast recovery from backups is a critical aspect of a resilient infrastructure. A comprehensive backup capability should have the following characteristics: 1) centralized backup management across workloads, 2) efficient use of storage to minimize costs, and 3) minimal recovery times.
To effectively address these requirements backup service providers must deliver efficiency at the workload level, while also supporting a diverse spectrum of customer environments, applications, and use cases. Consequently, the implementation of a truly effective, user-friendly backup experience is no small feat.
And that’s why, today, we’re excited to announce the availability of Google Cloud Backup and DR, enabling centralized backup management directly from the Google Cloud console.
At Google Cloud we have a unique opportunity to solve backup challenges in ways that fully maximize the value you achieve. By building a product with our customers firmly in mind, we’ve made sure that Google Cloud Backup and DR makes it easy to set up, manage, and restore backups.
As an example, we placed a high priority on delivering an intuitive, centralized backup management experience. With Google Cloud Backup and DR, administrators can effectively manage backups spanning multiple workloads. Admins can generate application- and crash-consistent backups for VMs on Compute Engine, VMware Engine or on-premises VMware, databases (such as SAP, MySQL and SQL Server), and file systems. Having a holistic view of your backups across multiple workloads means you spend less time on management and can be sure you have consistency and completeness in your data protection coverage.
Even better, Google Cloud Backup and DR stores backup data in its original, application-readable format. As a result, backup data for many workloads can be made available directly from long-term backup storage (e.g., leveraging cost-effective Cloud Storage), with no need for time-consuming data movement or translation. This accelerates recovery of critical files and supports rapid resumption of critical business operations.
Similarly, we also took care to help you minimize total cost of ownership (TCO) of your backups. With this objective in mind, we designed Google Cloud Backup and DR to implement space-efficient “incremental forever” storage technology to ensure that you pay only for what you truly need. With “incremental forever” backup, after Google Cloud Backup and DR takes an initial backup, subsequent backups only store data associated with changes relative to the prior backup. This allows backups to be captured more quickly and reduces the network bandwidth required to transmit the associated data. It also minimizes the amount of storage consumed by the backups, which benefits you via reduced storage consumption costs.
In addition, there is flexibility built in to allow you to strike your desired balance between storage cost and data retention time. For example, when choosing to store backups on Google Cloud Storage, you can select an appropriate Cloud Storage class in alignment with your needs.
The introduction of Google Cloud Backup and DR is a reflection of our broader commitment to make cloud infrastructure easier to manage, faster, and less expensive, while also helping you build a more resilient business. By centralizing backup administration and applying cutting-edge storage and data management technologies, we’ve eliminated much of the complexity, time, and cost traditionally associated with enterprise data protection.
But don’t take our word for it. See for yourself in Google Cloud Console. Take advantage of $300 in free Google Cloud credits, give Google Cloud Backup and DR a try starting in late September 2022, and enjoy the benefits of cloud-integrated backup and recovery.
Read More for the details.
While businesses can’t expect the unexpected, they can prepare for it. Events of the last few years have proven that circumstances can change at any time, with little warning. But flexible, dynamic culture and tools are one way that organizations can stay on course through unsteady waters. And one way to achieve dynamic flexibility is through composable business practices.
First identified by Gartner, a composable business practice merges accessible, easy-to-use digital technology with a tech-savvy, flexible work culture. Picture your business as a collection of different components, like interlocking structures made of small building blocks. Every time a new challenge arises, you can move these components around and snap them into place to reconfigure how your company responds. These practices are made possible through cloud-based technology that allows organizations to change direction quickly and efficiently.
Critical to composable business is a concept known as digital dexterity—the ability for organizations and employees to pivot quickly using digital tools. Digitally dexterous organizations are collaborative, think analytically, and make creative use of technology that can open up new possibilities for how work gets done. For example, organizations with high digital dexterity were able to pivot quickly from in-person onboarding of employees to remote onboarding during the pandemic. They used systems already in place, like video conferencing, virtual chat spaces, and online learning tools and adapted them to the new normal. And this is essential for a future of work where unpredictability is the rule, not the exception. In fact, Gartner finds that employees and organizations with high levels of digital dexterity are about three times more likely to launch, complete, and succeed in digital initiatives.
Digitally dexterous companies typically adopt simple, intuitive, people-first collaboration tools that empower employees to work quickly and seamlessly, whether they’re collaborating across teams or working independently. They also have workplace cultures that are highly responsive to new demands and changes.
Here are two ways to promote digital dexterity in your organization.
Composable businesses require two key components: a digitally-enabled environment and a flexible workplace culture that consistently adapts to the changes within it. Digital dexterity creates agility from the inside out, empowering every employee to make smart use of technology to solve problems that impact them directly in service of the larger goals of the organization. It might be launching a no-code application that streamlines the budget approval process, or setting up connected spreadsheetsto access back-end data without specialized programming, or launching a web intake form to prioritize team work requests. The more intuitive the technology, the faster employees can build the muscle of digital dexterity. And the more flexible the technology, the easier it is to harness it to drive the composable business practice that adapts to change.
Digital dexterity also enables innovation. During the early pandemic, millions of businesses experienced a crash course in digital dexterity out of necessity. They adopted new technologies and solutions to overcome the challenges of the moment. In addition to the onboarding example above, some retailers were able to adapt to changing consumer behavior and make the pivot to curbside pickup and delivery services. Real estate agents leaned into virtual showings, 3D tours, and electronic contract signing. When stay-at-home orders lifted, restaurants mitigated crowding by using online waitlists and reservations. As manufacturers experienced supply chain issues, they adjusted their purchasing to have materials on hand for production, instead of relying on expedited delivery from vendors. And in staying adaptable, professionals gained stability within a highly uncertain environment.
Creating a flexible workplace culture takes time. When evangelizing the benefits of digital dexterity within your organization, it’s important to emphasize that it’s a journey, not a single training initiative. Even with the right tools and mindset, it’s an ongoing process that involves education, outreach, experimentation, and an open dialogue across your organization. But when employees feel like they can easily navigate the technology around them to solve top business challenges, digital dexterity becomes a major strategic advantage.
Reducing digital friction happens in countless small ways, from giving employees secure, one-time login access to all the apps and workflows they need to be productive, to reducing context shifts for teams that are already collaborating together. Achieving digital dexterity is a lot easier when we ask technology to simply get out of the way.
Inflexible or outdated technology often gets in the way of further development. A new employee might be adept at navigating various digital tools in their own personal life, only to find a maze of workplace tools that introduce digital friction when they want to collaborate or get something done. Whether it’s the need to continuously re-authenticate while working away from the office, or being forced to change apps all day long, digital friction is a drain on productivity and collaboration.
Intuitive, cloud-based tools that are location agnostic help employees collaborate more seamlessly. The goal of technology should be to keep people productive and collaborating in the places they’re already working together. For example, the ability to pivot directly from a brainstorming doc into a virtual meeting eliminates the need for teammates to switch tabs/apps or schedule a separate meeting. And getting AI-based recommendations for people, files, and events to include in a meeting invite or spreadsheet means that people spend less time searching for the right way forward and more time collaborating with the right ingredients and people at hand.
Digital dexterity is also enhanced when employees have access to automated templates and workflows that streamline their workday. In Google Workspace, for example, we’re working on a series of auto-generated document and communication summaries — powered by AI — that provide a digest of long documents or chat conversations so employees aren’t weighed down by information overload. And in Calendar, meeting organizers can quickly add a meeting notes template directly to the meeting invite, saving time when it’s time to bring everyone together.
Composable businesses thrive at the intersection of technology and culture. There are two key ingredients — flexible technology that can adapt to new challenges and a culture that empowers employees to build digital dexterity. If a composable business is a collection of interlocking components that can be moved around and snapped back together, then it’s agility in both the tools and the culture that enable the new configuration. As the events of the last few years have reminded us, the future will always be highly uncertain and organizations will need to shift from planning for the most probable scenario, to planning for a multitude of possibilities. Only when flexibility is a core part of the foundation — in the tools and the culture — can an organization navigate the next bend in the road.
Read More for the details.
Editor’s note: Learn how mobile app experience company Airship uses Cloud Bigtable to deliver high performance throughput and speed, while freeing app developers to focus on high-value feature development.
Airship is a company that helps marketers, product owners, and developers create and adapt powerful mobile app experiences.
The mobile app experience is becoming the digital center of the customer experience, the place where the exchange of value between customers and brands is most respected and rewarded. Yet some brands treat their mobile apps as just another promotional channel. They’re driving customers to the app but not holding onto them. The truth is, keeping customers is hard. It calls for in-app experiences tailored to the individual, so you can build loyalty and generate revenue.
Leading brands understand there’s a new practice they need to master. We call it mobile app experience—MAX for short. And MAX is transforming how businesses manage relationships with customers in every conceivable way. We deliver MAX through our Airship App Experience Platform (AXP), the only enterprise SaaS platform focused 100% on helping brands master mobile app experiences. AXP enables marketers and product owners to create, edit, and manage full-screen, interactive walkthroughs that showcase an app’s latest features, how the app will make customers’ lives better, and how to get started. They can also take advantage of no-code solutions and analytics to activate, retain, and monetize app audiences. As a result, your development teams are free to focus on the more innovative aspects of building apps.
To deliver benefits and capabilities to our customers, AXP relies on everything from operational data to machine learning to real-time data streaming. With the help of Cloud Bigtable, Google’s scalable, fully managed NoSQL database, we’re able to deliver high performance throughput and speed to our customers. For everything from data storage and retrieval, to segmentation of large audiences, to reading multiple tables at the same time without locking, Bigtable has helped transform our customers’ experiences. The result is much lower latency and faster data access, which means our customers can each deliver a smooth and seamless mobile app experience to their users.
With the kind of data we capture and deliver, we take a NoSQL approach to data storage and management. As we’ve grown, we’ve evolved our architecture to grow with us. We started with Cassandra and later pivoted to Apache HBase. However, we found ourselves spending untold hours on HBase operations — at every step of the software lifecycle, operating HBase was a labor-intensive process. We needed people to allocate and configure storage nodes; people to monitor zookeeper, namenode, datanode, and regionserver processes; and people to balance regions. We’d find ourselves spending a whole day reading ioutil graphs because read latency was up 200% on one node. Additionally, we would regularly experience hard drive failure, regularly have to rebalance our hardware nodes and once even lost a namenode and had to recover it from backup journals. As a result, we began considering moving to the fully managed database service Bigtable.
Because Bigtable has an HBase client wrapper, it became our number one database choice. In addition, Bigtable’s seamless scaling makes it easy for us to add capacity with a push of a button and without any downtime.
Because of the HBase client wrappers, migrating to Bigtable was very smooth. We found that when spinning up databases in Google Cloud, it was very easy to write a generic job to load a SequenceFile of rows snapshotted from HBase into Bigtable. All of the work was in application code, enabling consistent writes in both locations during the move. We had to simultaneously load the snapshot while writing data and at the same time ensure that results were correct. As a result we had to manage the ‘version’ of a cell manually in application code and in the backfiller. We did this so that a write generated by a backfill would not overwrite a newer writer from live events.
Since moving to Bigtable, the days spent configuring nodes, balancing regions, and troubleshooting latency issues are gone, and we can use that time for revenue-generating and customer growth initiatives. We’ve also been able to free our application developers from time-consuming operations tasks to focus on CI/CD and tooling, creating high-value features our customers love.
With Bigtable, we have the dynamic, responsive scaling we need to meet our customers’ performance expectations. One of our clusters is writing one million row operations per second, while reading another 700,000 rows per second.
As Airship grows, we plan to scale all Google Cloud services to keep our system both performant and cost effective. We consider Google Cloud a key partner in technology that helps brands master mobile app experiences.
To learn more about HBase to Bigtable Live Migrations and how to get started, please visit this documentation page.
Read More for the details.
An essential aspect of operating any application is the ability to observe the health and performance of that application and of the underlying infrastructure to quickly resolve issues as they arise. Google Kubernetes Engine (GKE) already provides audit logs, operational logs, and metrics along with out-of-the-box dashboards and automatic error reporting to facilitate running reliable applications at scale. Using these logs and metrics, Cloud Operations provides the alerts, monitoring dashboards and a Logs Explorer to quickly detect, troubleshoot and resolve issues.
In addition to these existing sources of telemetry data, we are excited to announce that we are now exposing Kubernetes control plane metrics, which are now Generally Available. With GKE, Google fully manages the Kubernetes control plane; however, when troubleshooting issues it can be helpful to have access to certain metrics emitted by the Kubernetes control plane.
As part of our vision to make Kubernetes easier to use and easier to operate, these control plane metrics are directly integrated with Cloud Monitoring, so you don’t need to manage any metric collection or scrape config.
For example, to understand the health of the API server, you can use metrics like apiserver_request_total and apiserver_request_duration_seconds to track the load that the API server is experiencing, the fraction of API server requests that return errors, and the response latency for requests received by the API server. Also, apiserver_storage_objects can be very useful to monitor the saturation of the API server, especially if you’re using custom controllers. Breakdown this metric by the resource label to find out which Kubernetes custom resource or controller is problematic.
When a pod is created it is initially placed in a “pending” state, indicating it hasn’t yet been scheduled on a node. In a healthy cluster, pending pods are relatively quickly scheduled on a node, providing the workload the resources it needs to run. However, a sustained increase in the number of pending pods may indicate a problem scheduling those pods, which may be caused by insufficient resources or inappropriate configuration. Metrics like scheduler_pending_pods, scheduler_schedule_attempts_total, scheduler_preemption_attempts_total, scheduler_preemption_victims , and scheduler_scheduling_attempt_duration_seconds can alert you to potential scheduling issues, so you can act quickly to ensure sufficient resources are available for your pods. Using these metrics in combination will help you better understand the health of your cluster. For instance, if scheduler_preemption_attempts_total goes up, it means that there are higher priority pods available to be scheduled and the Scheduler is preempting some running pods. However, if the value of scheduler_pending_pods is also increasing, this may indicate that you don’t have enough resources to allocate the higher priority pods.
If the Kubernetes scheduler is still unable to find a suitable node for a pod, then the pod will eventually be marked as unschedulable. Kubernetes control plane metrics provide you visibility into pod scheduling errors and unschedulable pods. A spike in either means that the Kubernetes scheduler isn’t able to find an appropriate node on which to run many of your pods, which may ultimately impair the performance of your application. In many cases, a high rate of unschedulable pods will not resolve itself until you take some action to address the underlying cause. A good first place to start troubleshooting the issue is to look for recent FailedScheduling events. (If you have GKE system logs enabled, then all Kubernetes events are available in Cloud Logging.) These FailedScheduling events include a message (for instance, “0/6 nodes are available: 6 Insufficient cpu.”) that very helpfully describes exactly why the pod wasn’t able to be scheduled on any nodes, giving you guidance on how to address the problem.
A final example: If you see scheduling jobs is very slow, then one possible cause is that a third-party webhooks might be introducing significant latency, causing the API server to take a long time to schedule a job. Kubernetes control plane metrics such as apiserver_admission_webhook_admission_duration_seconds can expose the admission webhook latency, helping you identify the root cause of slow job scheduling and mitigate the issue.
Not only are we making these additional Kubernetes control plane metrics available, we’re also excited to announce that all of these metrics are displayed in the Kubernetes Engine section of the Cloud Console, making it easy to identify and investigate issues in-context as you’re managing your GKE clusters.
To view these control plane metrics, go to the Kubernetes clusters section of the Cloud Console, select the “Observability” tab, and select “Control plane”:
Since all Kubernetes control plane metrics are ingested into Cloud Monitoring, you can create alerting policies in Cloud Alerting so you’re notified as soon as something needs your attention.
When you enable Kubernetes control plane metrics for your GKE clusters, all metrics are collected using Google Cloud Managed Service for Prometheus. This means the metrics are sent to Cloud Monitoring in the same GCP project as your Kubernetes cluster and can be queried using PromQL via the Cloud Monitoring API and Metrics explorer.
For example, you can monitor any spikes in the 99th percentile API server response latency using this PromQL query:
If you monitor your GKE cluster using popular third party observability tools, any third party observability tool can ingest these Kubernetes control plane metrics using the Cloud Monitoring API.
For example, if you’re a Datadog customer and you’ve enabled Kubernetes control plane metrics for your GKE cluster, then Datadog provides enhanced visualizations that include Kubernetes control plane metrics from the API server, scheduler, and controller manager.
All Kubernetes control plane metrics are charged at the standard price for metrics ingested from Google Cloud Managed Service for Prometheus.
GKE clusters running control plane version 1.23.6 or later can now access metrics from the Kubernetes API server, Scheduler, and Controller Manager. Kubernetes control plane metrics are not available for GKE Autopilot clusters.
The following gcloud command will update a cluster to enable the collection of metrics from the API server, scheduler, and controller manager:
Kubernetes control plane metrics can also be configured using Terraform.
Learn more about configuring the collection of control plane metrics.
Read More for the details.
Amazon Managed Blockchain (AMB) Hyperledger Fabric is now generally available in AWS Govcloud (US-West) Region, allowing customers in both the public and commercial sectors to create as well as manage production-grade blockchain infrastructure with just a few clicks.
Read More for the details.
Starting today, Amazon MemoryDB for Redis is generally available in two additional AWS Regions: Europe (Paris) and Europe (Milan).
Read More for the details.
AWS Snowball devices are now available in the AWS Asia Pacific (Jakarta) Region.
Read More for the details.
We are excited to announce that you can now add up to 10 measures and 10 dimensions when setting up your detector for Amazon Lookout for Metrics. With this launch you can now include more measures and dimensions in a single detector, which allows you to get insights on root causes and causality across all the measures and dimensions that you have selected.
Read More for the details.
On October 1, 2022, Microsoft will implement significant upgrades to our outsourcing and hosting terms that will benefit customers worldwide.
Read More for the details.
Google Cloud allows you to move your PostgreSQL databases to Cloud SQL with Database Migration Service (DMS). DMS gives you the ability to replicate data continuously to the destination database, while the source is live in production, enabling you to migrate with minimum downtime.
However, terabyte-scale migrations can be complex. For instance, if your PostgreSQL database hasLarge Objects, then you will require some downtime to migrate them manually as that is a limitation of DMS. There are few more such limitations – check outknown limitations of DMS. If not handled carefully, these steps can extend the downtime during cutover, lead to performance impact on the source instance, or even delay the project delivery date. All this may mean significant business impact.
Searce is a technology consulting company, specializing in modernizing application and database infrastructure by leveraging cloud, data and AI. We empower our clients to accelerate towards the future of their business. In our journey, we have helped dozens of clients migrate to Cloud SQL, and have found terabyte-scale migrations to be the toughest for the reasons mentioned earlier.
This blog centers around our work in supporting an enterprise client whose objective was to migrate dozens of terabyte scale, mission-critical PostgreSQL databases to Cloud SQL with minimum downtime. Their largest database was 20TB in size and all the databases had tables with large objects and some tables did not have primary keys. Note that DMS had a limitation of not supporting migration of tables without a primary key during the time of this project. In June 2022, DMS released an enhancement to support the migration of tables without a primary key.
In this blog, we share with you our learnings about how we simplified and optimized this migration, so that you can incorporate our best practices into your own migrations. We explore mechanisms to reduce the downtime required for operations not handled by DMS by ~98% with the use of automation scripts. We also explore database flags in PostgreSQL to optimize DMS performance and minimize the overall migration time by ~15%.
Once the customer made the decision to migrate PostgreSQL databases to Google Cloud SQL, we considered two key factors that would decide business impact – migration effort and migration time. To minimize effort for the migration of PostgreSQL databases, we leveraged Google Cloud’s DMS (Database Migration Service) as it is very easy to use and it does the heavy lifting by continuously replicating data from the source database to the destination Cloud SQL instance, while the source database is live in production.
How about migration time? For a terabyte-scale database, depending on the database structure, migration time can be considerably longer. Historically, we observed that DMS took around 3 hours to migrate a 1 TB database. In other cases, where the customer database structure was more complex, migration took longer. Thankfully, DMS takes care of this replication while the source database is live in production, so no downtime is required during this time. Nevertheless, our client would have to bear the cost of both the source and destination databases which for large databases, might be substantial. Meanwhile, if the database size increased, then replication could take even longer, increasing the risk of missing the customer’s maintenance window for the downtime incurred during cutover operations. Since the customer’s maintenance window was monthly, we would have to wait for 30 more days for the next maintenance window, requiring the customer to bear the cost of both the databases for another 30 days. Furthermore, from a risk management standpoint, the longer the migration timeframe, the greater the risk that something could go wrong. Hence, we started exploring options to reduce the migration time. Even the slightest reduction in migration time could significantly reduce the cost and risk.
We explored options around tuning PostgreSQL’s database flags on the source database. While DMS has its own set of prerequisite flags for the source instance and database, we also found that flags like shared_buffers, wal_buffers and maintenance_work_memhelped accelerate the replication process through DMS. These flags needed to be set to a specific value to get the maximum benefit out of each of them. Once set, their cumulative impact was a reduction in time for DMS to replicate a 1 TB database by 4 hours, that is, reduction of 3.5 days for a 20 TB database. Let’s dive into each of them.
PostgreSQL uses two buffers – its own internal buffer and the kernel buffered IO. In other words, that data is stored in memory twice. The internal buffer is called shared_buffers, and it determines the amount of memory used by the database for the operating system cache. By default this value is set conservatively low. However, increasing this value on the source database to fit our use case helped increase the performance of read heavy operations, which is exactly what DMS does once a job has been initialized.
After multiple iterations, we found that if the value was set to 55% of the database instance RAM, it boosted the replication performance (a read heavy operation) by a considerable amount and in turn reduced the time required to replicate the data.
PostgreSQL relies on Write-Ahead Logging (WAL) to ensure data integrity. WAL records are written to buffers and then flushed to disk. The flag wal_buffers, determines the amount of shared memory used for WAL data that has not yet been written to disk – records that are yet to be flushed. We found that increasing the value for wal_buffers from the default value of 16MB to about 3% of the database instance’s RAM significantly improved the write performance by writing fewer but larger files to the disk at each transaction commit.
PostgreSQL maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY, consume their own specific memory. This memory is referred to as maintenance_work_mem. Unlike other operations, PostgreSQL maintenance operations can only be performed sequentially by the database. Setting a value significantly higher than the default value of 64 MB meant that no maintenance operation would block the DMS job. We found that maintenance_work_mem worked best at the value of 1 GB.
Each of these three flags tune how PostgreSQL utilizes memory resources. Hence, it was imperative that before setting these flags, we needed to upsize the source database instance to accommodate them. Without upsizing the database instances, we could have caused application performance degradation, as more than half of the total database memory would be allocated to the processes managed by these flags.
We calculated the memory required by the flags mentioned above, and found that each flag needed to be set to a specific percentage of the source instance’s memory, irrespective of the existing values that might be set for the flags:
shared_buffers: 55% of source instance’s memory
wal_buffers: 3% of source instance’s memory
maintenance_work_mem: 1 GB
We added the individual memory requirements by the flags, and found that 58% of the RAM at least will be taken up by these memory flags. For example, if a source instance used 100GB of memory, 58GB would be taken up by shared_buffers and wal_buffers, and an additional 1GB by maintenance_work_mem. As the original value of these flags was very low (~200MB), we upsized the RAM of the source database instance by 60% in order to ensure that the migration did not impact source performance on the application live in production.
While using Google Cloud’s DMS, if the connection is terminated between DMS and the Cloud SQL instance during the ‘Full Dump in Progress’ phase of the DMS job, the DMS job fails and needs to be reinitiated. Encountering timeouts, especially while migrating a terabyte-scale database, would mean multiple days’ worth of migration being lost and a delay in the cutover plan. For example, if the connection of the DMS job for a 20TB database migration is lost after 10 days, the DMS job will have to be restarted from the beginning, leading to 10 days’ worth of migration effort being lost.
Adjusting the WAL sender timeout flag (wal_sender_timeout) helped us avoid terminating replication connections that were inactive for a long time during the full dump phase. The default value for this flag is 60 seconds. To avoid these connections from terminating, and to avoid such high impact failures, we set the value of this flag to 0 for the duration of database migration. This would avoid connections getting terminated and allowed for smoother replication through the DMS jobs.
Generally, for all the database flags we talked about here, we advised our customer to restore the default flag values once the migration completed.
While DMS does the majority of database migration through continuous replication when the source database instance is live in production, DMS has certain migration limitations that cannot be addressed when the database is live. For PostgreSQL, the known limitations of DMS include:
Any new tables created on the source PostgreSQL database after the DMS job has been initialized are not replicated to the destination PostgreSQL database.
Tables without primary keys on the source PostgreSQL database are not migrated. For those tables, DMS migrated only the schema. This is no longer a limitation after the June 2022 product update.
The large object (LOB) data type is not supported by DMS.
Only the schema for Materialized Views is migrated; the data is not migrated.
All data migrated is created under the ownership of cloudsqlexternalsync.
We had to address these aspects of the database migration manually. Since our client’s database had data with the large object data type, tables without primary keys, and frequently changing table structures that cannot be migrated by DMS, we had to manually export and import that data after DMS did most of the rest of the data migration. This part of database migration required downtime to avoid data loss. For a terabyte-scale database, this data can be in the hundreds of GBs, which means higher migration time and hence higher downtime. Furthermore, when you have dozens of databases to migrate, it can be stressful and error-prone for a human to perform these operations while on the clock during the cutover window!
This is where automation helped save the day! Automating the migration operations during the downtime period not only reduced the manual effort and error risk, but also provided a scalable solution that could be leveraged for the migration of 100s of PostgreSQL database instances to Cloud SQL. Furthermore, by leveraging multiprocessing and multithreading, we were able to reduce the total migration downtime for 100s of GBs of data by 98%, thereby reducing the business impact for our client.
We laid out all the steps that need to be executed during the downtime – that is, after the DMS job has completed its replication from source to destination and before cutting over the application to the migrated database. You can see a chart mapping out the sequence of operations that are performed during the downtime period in Fig 1.
By automating all the downtime operations in this sequential approach, we observed that it took 13 hours for the entire downtime flow to execute for a 1 TB database. This included the migration of 250 MB in new tables, 60 GB in tables without primary keys and 150 GB in large objects.
One key observation we made was that, out of all the steps, only three steps took most of the time: migrating new tables, migrating tables without primary keys, and migrating large objects. These took the longest time because they all required dump and restore operations for their respective tables. However, these three steps did not have a hard dependency on each other as they individually targeted different tables. So we tried to run them in parallel as you can see in Fig 2. But the steps following them – ‘Refresh Materialized View’ and ‘Recover Ownership’ – had to be performed sequentially as they targeted the entire database.
However, running these three steps in parallel required upsizing the Cloud SQL instances, as we wanted to have sufficient resources available for each step. This led us to increase the Cloud SQL instances’ vCPU by 50% and memory by 40%, since the export and import operations depended heavily on vCPU consumption as opposed to memory consumption.
Migrating the new tables (created after the DMS job was initiated) and tables without primary keys was straightforward as we were able to leverage the native utilities offered by PostgreSQL – pg_dump and pg_restore. Both utilities process tables in parallel by using multiple threads– the higher the table count, the higher the number of threads that could be executed in parallel, allowing faster migration. With this revised approach, for the same 1 TB database, it still took 12.5 hours for the entire downtime flow to execute.
This improvement reduced the cutover downtime, but we still found that we needed a 12.5 hour window to complete all the steps. We then discovered that 99% of the time of downtime was taken up by just one step: exporting and importing 150 GB of large objects. It turned out that multiple threads could not be used to accelerate the dump and restore large objects in PostgreSQL. Hence, migrating the large objects single handedly extended the downtime for migration by hours. Fortunately, we were able to come up with a workaround for that.
PostgreSQL contains a large objects facility that provides stream-style access to data stored in a special large-object structure. When large objects are stored, they are broken down into multiple chunks and stored in different rows of the database, but are connected under a single Object Identifier (OID). This OID can thus be used to access any stored Large Object. Although users can add large objects to any table in the database, under the hood, PostgreSQL physically stores all large objects within a database in a single table called pg_largeobjects.
While leveraging pg_dump and pg_restore for export and import of large objects, this single table – pg_largeobject, becomes a bottleneck as the PostgreSQL utilities cannot execute multiple threads for parallel processing, since it’s just one table. Typically, the order of operations for these utilities looks something like this:
1. pg_dump reads the data to be exported from the source database
2. pg_dump writes that data into the memory of the client where pg_dump is being executed
3. pg_dump writes from memory to the disk of the the client (a second write operation)
4. pg_restore reads the data from the client’s disk
5. pg_restore writes the data to the destination database
Normally, these utilities would need to be executed sequentially to avoid data loss or data corruption due to conflicting processes. This leads to further increase in migration time for large objects.
Our workaround for this single-threaded process involved two elements. First, with our solution, we eliminated the second write operation – write from memory to disk (point #3). Instead, once the data was read and written into memory, our program would begin the import process and write data to the destination database. Second, since pg_dump and pg_restore could not use multiple threads to process the large objects in just the pg_largeobjects table, we took it upon ourselves to develop a solution that could use multiple threads. The thread count was based on the number of OIDs in the table – pg_largeobjects, and break that single table into smaller chunks for parallel execution.
This approach brought down Large Object migration operation from hours to minutes, therefore bringing down the downtime needed for all operations to be completed that DMS cannot handle, for the same 1 TB database, from 13 hours to just 18 minutes. A reduction of ~98% in the required downtime.
After multiple optimizations and dry runs, we were able to develop a procedure for our client to migrate dozens of terabyte-scale PostgreSQL databases to Google Cloud SQL with a minimal business impact. We developed practices to optimize DMS-based migration by 15% using database flags and reduce downtime by 98% with the help of automation and innovation. These practices can be leveraged for any terabyte-scale migration of PostgreSQL databases to Google Cloud SQL to accelerate migration, minimize downtime and avoid performance impact on mission critical applications.
Read More for the details.
If you work in compliance, privacy, or risk, you know that regulatory developments have continued to accelerate this year. As part of our commitment to be the most trusted cloud, we continue to pursue global industry standards, frameworks, and codes of conduct that tackle our customers’ foundational need for a documented baseline of addressable requirements.
We have seen key updates across all regions and have worked to help organizations address these new and evolving requirements. Let’s look at the significant updates from around the world, hot topics, and the requirements we’ve recently addressed.
Google Cloud meets or suprasses the standards for a number of frameworks including ISO/IEC 22301 for business continuity management and the Minimum Viable Secure Product(MVSP), developed with industry partners such as Salesforce, Okta, and Slack. Globally, we continue to address the areas of focus we know are most critical to organizations including operational resiliency, DPIA support, and international data transfers.
Consistent with what we have observed historically, EMEA remains a region full of ample developments that expand the regulatory landscape.
Digital Operational Resilience Act (DORA) adopted for financial services organizations: One of our most recent critical announcements was our preparations for addressing DORA, which will harmonize how EU financial entities must report cybersecurity incidents, test their digital operational resilience, manage Information and Communications Technology (ICT) third-party risk, and allow financial regulators to directly oversee critical ICT providers.
Second annual declaration of adherence to SWIPO: As presented in our SWIPO Transparency Statement, Google Cloud continues to demonstrate our commitment to enabling data portability and interoperability. Our customers always fully control their own data – including when they need to view, delete, download, and transfer their content.
Supporting our EU education customers’ privacy assessments: The recent Datatilsynet (the Danish Data Protection Authority) ruling on proper due diligence of cloud services is a helpful reminder for customers to conduct thorough risk assessments of third parties. Our latest blog reaffirms Google Cloud’s commitment to helping Education customers and the rest of our current and potential customer base conduct due diligence, including supporting privacy assessments and independent third-party attestations.
We continue to monitor the rapidly evolving regulatory landscape in Asia Pacific that has been rich with new developments and the introduction of several laws so far this year.
Addressed compliance for Australia’s DTA HCF: To help support Australian government customers with data residency and local customer support capabilities, Google Cloud is now ‘certified strategic’ under the Hosting Certification Framework (HCF) administered by Australia’s Digital Transformation Agency.
Privacy requirements in Japan, New Zealand, and Taiwan: Meeting privacy obligations remain a top priority for many organizations. To help, we’ve built compliance support for Japan’s Act on the Protection of Personal Information (APPI) along with New Zealand’s Privacy Act and Taiwan’s Personal Data Protection Act (PDPA).
In the United States, we continue to seek effective and efficient mechanisms to help our customers address their privacy and security needs. As with every region, customers can view our compliance offerings and mapping in our filterable Compliance Resource Center.
Welcoming theTrans-Atlantic Data Privacy Framework: Following the framework implementation, Google Cloud reaffirmed our commitment to helping customers meet stringent data protection requirements. This includes making the protections offered by the E.U.-U.S. data transfer framework available to customers when available.
New U.S. industry compliance mappings: From public sector (DISA), to health care (MARS-E), energy (NERC) and criminal justice (CJIS), we have reviewed U.S. industry requirements and released new materials outlining how we can help customers address compliance.
Latin America remains a focus this year, with Google’s June announcement committing $1.2 billion USD over 5 years to projects in the region. Later in July, Google Cloud built on these initiatives by announcing that a new Google Cloud region is coming to Mexico.
For those in one of the most heavily regulated industries like financial services, we remain focused on demonstrating our commitment to regulations in that sector.
Meeting outsourcing requirements in financial services: We have new and updated compliance mappings for banking requirements in Brazil, Peru, and Colombia. Each new mapping is designed to support risk and compliance leaders’ need for compliance and reporting documentation.
We know developments are impactful not only for organizations that seek to meet requirements, but also for those team members tasked with ensuring their service providers adapt their approaches in response to critical industry developments. Many Google Cloud customers are already using our trust and compliance resources to facilitate internal and external conversations with their key customers, business partners, and regulators. Visit our Compliance Resource Center or continue the conversation with our sales team by visiting our Sales Center today.
Read More for the details.