Google Cloud

2025 09 12

GCP – AlloyDB on Axion-powered C4A instances is generally available

At Google Cloud Next ’25, we announced the preview of AlloyDB on C4A virtual machines, powered by Google Axion processors, our custom Arm-based CPUs. Today, we’re glad to announce that C4A virtual machines are generally available!

For transactional workloads, leveraging C4A, AlloyDB provides nearly 50% better price-performance compared to N series machines for transactional workloads, and up to 2x better throughput performance compared to Amazon’s Graviton4-based offerings, bringing substantial efficiency gains to your database operations.

Let’s dive into the benefits of running high-performance databases and analytics engines on C4A.

Meeting the demands of modern data workloads

Business-critical applications require infrastructure that delivers high performance, cost-efficiency, high availability, robust security, scalability, and strong data protection. However, the relentless growth of data volumes and increasingly complex processing requirements, coupled with slowing traditional CPU advancements, can present significant hurdles. Furthermore, the rise of sophisticated analytics and vector search for generative AI introduces new challenges to maintaining performance and efficiency. These trends necessitate greater computing power, but existing CPU architectures don’t always align perfectly with database operational needs, potentially leading to performance bottlenecks and inefficiencies.

C4A virtual machines are engineered to address these challenges head-on, bringing compelling advantages to your managed Google Cloud database workloads. They deliver significantly improved price-performance compared to N-series instances, translating directly into more cost-effective database operations. Purpose-built to handle data-intensive workloads that require real-time processing, C4A instances are well-suited for high-performance databases and demanding analytics engines. Integrating Axion-based C4A VMs for AlloyDB underscores our commitment to adding this technology across our database portfolio, for a diverse range of database deployments.

AlloyDB on Axion: Pushing performance boundaries

AlloyDB, our PostgreSQL-compatible database service designed for the most demanding enterprise workloads, sees remarkable gains when running on Axion C4A instances, delivering:

3 million transactions per minute
up to 45% better price-performance than N series VMs for transactional workloads
up to 2x higher throughput and 3x better price-performance than Amazon Aurora PostgreSQL running on Graviton4

Gigaom independently validated performance and price-performance between AlloyDB (C4A) and Amazon Aurora for PostgreSQL, and published a detailed performance evaluation report.

AlloyDB on Axion supports vCPUs from 1vCPU up to 72vCPUs. It also includes a frequently requested intermediate 48 vCPU shape between 32 and 64vCPUs.

In particular, AlloyDB added support for 1 vCPU shape with 8GB of memory with C4A, a cost-effective option for development and sandbox environments. This new shape lowers the entry cost for AlloyDB by 50% compared to the earlier 2vCPU option. The 1 vCPU shape is intended for smaller databases, typically in the tens of gigabytes, and does not come with uptime SLAs, even when configured for high availability.

Availability and getting started

Already, hundreds of customers have adopted C4A machine types, citing increased performance for their applications and reduced costs. They also love the fact that they can lower their costs by 50% using 1vCPU, fueling widespread adoption.. And now that C4A machine types are generally available, customers are excited to leverage them for their production workloads.

Axion-based C4A machines for AlloyDB are now generally available in select Google Cloud regions.

To deploy a new C4A machine for your new instance, or to upgrade your existing instance to C4A, simply choose C4A from the drop-down.

For more technical details, please consult the documentation pages for AlloyDB. For detailed pricing information, please refer to the AlloyDB pricing page. C4A vCPU and memory are priced the same as N2 virtual machines and provide better price-performance. For more information about Google Axion, please refer to Google Axion Processors.

Ready to get started? If you’re new to Google Cloud, sign up for an account via the AlloyDB free trial link. If you’re already a Google Cloud user, head over to the AlloyDB console. Once you’re in the console, click “Create a trial cluster” and we’ll guide you through migrating your PostgreSQL data to a new AlloyDB database instance

Read More for the details.

2025 09 12

GCP – OpenTelemetry Protocol comes to Google Cloud Observability

Tibor Kiss Cloud, Google Cloud gcp

OpenTelemetry Protocol (OTLP) is a data exchange protocol designed to transport telemetry from a source to a destination in a vendor-agnostic fashion. Today, we’re pleased to announce that Cloud Trace, part of Google Cloud Observability, now supports users sending trace data using OTLP via telemetry.googleapis.com.

Fig 1: Both in-process and collector based configurations can use native OTLP exporters to transmit telemetry data

Using OTLP to send telemetry data to observability tooling with these benefits:

Vendor-agnostic telemetry pipelines: Use native OTLP exporters from in-process or collectors. This eliminates the need to use vendor-specific exporters in your telemetry pipelines.
Strong telemetry data integrity: Ensure your telemetry data preserves the OTel data model during transmission and storage and avoid transformations into proprietary formats.
Interoperability with your choice of observability tooling: Easily send telemetry to one or more observability backends that support native OTLP without any additional OTel exporters
Reduced client-side complexity and resource usage: Move your telemetry processing logic such as applying filters to the observability backend, reducing the need for custom rules and thus client-side processing overhead.

Let’s take a quick look at how to use OTLP from Cloud Trace.

Cloud Trace and OTLP in action

Sending trace data using OTLP via telemetry.googleapis.com is now the recommended best practice for both new and existing users — especially for those who expect to send high volumes of trace data.

Fig 2: Trace explore page in Cloud Trace highlighting fields that leverage OpenTelemetry semantic conventions

The Trace explorer page makes extensive use of OpenTelemetry conventions to offer a rich user experience when filtering and finding traces of interest. For example,

The OpenTelemetry convention service.name is used to indicate which services a span is originating from.
The status of the span is indicated by the OpenTelemetry’s span status.

Cloud Trace’s internal storage system now uses the OpenTelemetry data model natively for organizing and storing your trace data. The new storage system enables much higher limits when trace data is sent through telemetry.googleapis.com. Key changes include:

Attribute sizes: Attribute keys can now be up to 512 bytes (from 128 bytes), and values up to 64 KiB (from 256 bytes).
Span details: Span names can be up to 1024 bytes (from 128 bytes), and spans can have up to 1024 attributes (from 32).
Event and link counts: Events per span increase to 256 (from 128), and links per span are now 128.

We believe sending your trace data using OTLP will result in an better user experience in the trace explorer UI and Observability Analytics, along with the above storage limit increases.

Google Cloud’s vision for OTLP

Providing OTLP support for Cloud Trace is just the beginning. Our vision is to leverage OpenTelemetry to generate, collect, and access telemetry across Google Cloud. Our commitment to OpenTelemetry extends across all telemetry types — traces, metrics, and logs — and is a cornerstone of our strategy to simplify telemetry management and foster an open cloud environment.

We understand that in today’s complex cloud environments, managing telemetry data across disparate systems, inconsistent data formats, and vast volumes of information can lead to observability gaps and increased operational overhead. We are dedicated to streamlining your telemetry pipeline, starting with focusing on native OTLP ingestion for all telemetry types so you can seamlessly send your data to Google Cloud Observability. This will help foster true vendor neutrality and interoperability, eliminating the need for complex conversions or vendor-specific agents.

Beyond seamless ingestion, we’re also building capabilities for managed server-side processing, flexible routing to various destinations, and unified management and control over your telemetry across environments. This will further our observability experience with advanced processing and routing capabilities all in one place.

The introduction of OTLP trace ingestion with telemetry.googleapis.com is a significant first step in this journey. We’re continually working to expand our OpenTelemetry support across all telemetry types with additional processing and routing capabilities to provide you with a unified and streamlined observability experience on Google Cloud.

Get started today

We encourage you to begin using telemetry.googleapis.com for your trace data by following this migration guide. This new endpoint offers enhanced capabilities, including higher storage limits and an improved user experience within Cloud Trace Explorer and Observability Analytics.

Read More for the details.

2025 09 12

GCP – How Rent the Runway supercharges developer speed and insights with Cloud SQL

Tibor Kiss Cloud, Google Cloud gcp

Editor’s note: Rent the Runway is redefining how consumers engage with fashion, offering on-demand access to designer clothing through a unique blend of e-commerce and reverse logistics. As customer expectations around speed, personalization, and reliability continue to rise, Rent the Runway turned to Google Cloud’s fully managed database services to modernize its data infrastructure. By migrating from a complex, self-managed MySQL environment to Cloud SQL, the company saved more than $180,000 annually in operational overhead. With Cloud SQL now supporting real-time inventory decisions, Rent the Runway has built a scalable foundation for its next chapter of growth and innovation.

Closet in the cloud, clutter in the stack

Rent the Runway gives customers a “Closet in the Cloud” — on-demand access to designer clothing without the need for ownership. We offer both a subscription-based model and one-time rentals, providing flexibility to serve a broad range of needs and lifestyles. We like to say Rent the Runway runs on two things: fashion and logistics. When a customer rents one of our items, it kicks off a chain of events most online retailers never have to think about.

That complex e-commerce and reverse logistics model involves cleaning, repairing, and restocking garments between uses. To address both our operational complexity and the high expectations of our customers, we’re investing heavily in building modern, data-driven services that support every touchpoint.

Out: Manual legacy processes

Our legacy database setup couldn’t keep up with our goals. We were running self-managed MySQL, and the environment had grown… let’s say organically. Disaster recovery relied on custom scripts. Monitoring was patchy. Performance tuning and scaling were manual, time-consuming, and error-prone.

Supporting it all required a dedicated DBA team and a third-party vendor providing 24/7 coverage — an expensive and brittle arrangement. Engineers didn’t have access to query performance metrics, which meant they were often flying blind during development and testing. Even small changes could feel risky.

All that friction added up. As our engineering teams moved faster, the database dragged behind. We needed something with more scale and visibility, something a lot less hands-on.

It’s called Cloud SQL, look it up

When we started looking for alternatives, we weren’t trying to modernize our database: We were trying to modernize how our teams worked with data. We wanted to reduce operational load, yes, but also to give our engineers more control and fewer blockers when building new services.

Cloud SQL checked all the boxes. It gave us the benefits of a managed service — automated backups, simplified disaster recovery, no more patching – while preserving compatibility with the MySQL stack our systems already relied on.

But the real win was the developer experience. With built-in query insights and tight integration with Google Cloud, Cloud SQL made it easier for engineers to own what they built. It fit perfectly with where we were headed: faster cycles; infrastructure as code; and a platform that let us scale up and out, team by team.

aside_block: <ListValue: [StructValue([(‘title’, ‘Build smarter with Google Cloud databases!’), (‘body’, <wagtail.rich_text.RichText object at 0x3dff943d2490>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Measure twice and cut over once

We knew the migration had to be smooth because our platform didn’t exactly have quiet hours. (Fashion never sleeps, apparently.) So we approached it as a full-scale engineering project, with phases for everything from testing to dry runs.

While we planned to use Google’s Database Migration Service across the board, one database required a manual workaround. That turned out to be a good thing, though: It pushed us to document and validate every step. We spent weeks running Cloud SQL in lower environments, stress-testing setups, refining our rollout plan, and simulating rollback scenarios. When it came time to cut over, the entire process fit within a tight three-hour window. One issue did come up – a configuration flag that affected performance — but Google’s support team jumped in fast, and we were able to fix it on the spot.

Overall, minimal downtime and no surprises. Our kind of boring.

Less queues and more runway

One of the biggest changes we’ve noticed post-migration was how our teams worked. With our old setup, making a schema change meant routing through DBAs, coordinating with external support, then holding your breath during rollout. Now, engineers can breathe a little easier and own those changes end to end. Cloud SQL gives our teams access to IAM-controlled environments where they can safely test and deploy. That has let us move toward real CI/CD for database changes — no bottlenecks or surprises.

With better visibility and guardrails, our teams are shipping higher-quality code and catching issues earlier in the lifecycle. Meanwhile, our DBAs can focus on strategic initiatives — things like automation and platform-wide improvements — rather than being stuck in a ticket queue.

Our most cost-effective look yet

Within weeks of moving to Cloud SQL, we were able to offboard our third-party MySQL support vendor, cutting over $180,000 in annual costs. Even better, our internal DBA team got time back to work on higher-value initiatives instead of handling every performance issue and schema request.

Cloud SQL also gave us a clearer picture of how our systems were running. Engineers started identifying slow queries and fixing them proactively, something that used to be reactive and time-consuming. With tuning and observability included, we optimized instance sizing and reduced infrastructure spend, all without compromising performance. And with regional DR configurations now baked in, we’ve simplified our disaster recovery setup while improving resilience.

In: Cloud SQL

We’re building toward a platform where engineering teams can move fast without trading off safety or quality. That means more automation, more ownership, and fewer handoffs. With Cloud SQL, we’re aiming for a world where schema updates are rolled out as seamlessly as application code.

This shift is technical, but it’s also cultural. And it’s a big part of how we’ll continue to scale the business and support our expansion into AI. The foundation is there. Now, we’re just dressing it up.

Learn more:

Discover how Cloud SQL can transform your business! Start a free trial today!
Download this IDC report to learn how migrating to Cloud SQL can lower costs, boost agility, and speed up deployments.
Learn how Ford and Yahoo gained high performance and cut costs by modernizing with Cloud SQL.

Read More for the details.

2025 09 11

GCP – Three-part framework to measure the impact of your AI use case

Tibor Kiss Cloud, Google Cloud gcp

Generative AI is no longer just an experiment. The real challenge now is quantifying its value. For leaders, the path is clear: make AI projects drive business growth, not just incur costs. Today, we’ll share a simple three-part plan to help you measure the effect and see the true worth of your AI initiatives.

This methodology connects your technology solution to a concrete business outcome. It creates a logical narrative that justifies investment and measures success.

1. Define what success looks like (the value)

The first step is to define the project’s desired outcome by identifying its “value drivers.” For any AI initiative, these drivers typically fall into four universal business categories:

Operational efficiency & cost savings: This involves quantifying improvements to core business processes. Value is measured by reducing manual effort, optimizing resource allocation, lowering error rates in production or operations, or streamlining complex supply chains.
Revenue & growth acceleration: While many organizations initially focus on efficiency, true market leadership is achieved through growth. This category of value drivers is the critical differentiator, as it focuses on top-line impact. Value can come from accelerating time-to-market for new products, identifying new revenue streams through data analysis, or improving sales effectiveness and customer lifetime value.
Experience & engagement: This captures the enhancement of human interaction with technology. It applies broadly to improving customer satisfaction (CX), boosting employee productivity and morale with intelligent tools (EX), or creating more seamless partner experiences.
Strategic advancement & risk mitigation: This covers long-term competitive advantages and downside protection. Value drivers include accelerating R&D cycles, gaining market-differentiating insights from proprietary data, strengthening operational resiliency, or ensuring regulatory compliance and reducing fraud.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e52975fffa0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

2. Specify what it costs to succeed (your investment)

The second part of the framework demands transparency regarding the investment. This requires a complete view of the Total Cost of Ownership (TCO), which extends beyond service fees to include model training, infrastructure, and the operational support needed to maintain the system. For a detailed guide, we encourage a review of our post, How to calculate your AI costs on Google Cloud.

3. State the ROI

This is the synthesis of the first two steps. The ROI calculation makes the business case explicit by stating the time required to pay back the initial investment and the ongoing financial return the project will generate.

The framework in action: An AI chatbot for customer service

Now, let’s apply the universal framework to a specific use case. Consider an e-commerce company implementing an AI chatbot. Here, the four general value drivers become tailored to the world of customer service.

Step 1: Define success (the value)
The team uses the customer-service-specific quadrants to build a comprehensive value estimate.

Quadrant 1: Operational efficiency
- Reduced agent handling time: By automating 60% of routine inquiries, the company frees up thousands of agent hours. This enables agents to serve more customers or perhaps provide better quality service to premium customers.
  - Estimated hours saved: ~725 hrs (lets say this equate to $15,660 in value)
- Lower onboarding & training costs: New agents become productive faster as the AI handles the most common questions, reducing the burden of repetitive training.
  - Estimated monthly value: $1,000
Quadrant 2: Revenue growth
- 24/7 Sales & support: The chatbot assists customers and captures sales leads around the clock, converting shoppers who would otherwise leave.
  - Estimated mMonthly vValue: $5,000
- Improved customer retention: Faster resolution and a better experience lead to a small, measurable increase in customer loyalty and repeat purchases.
  - Estimated monthly value: $1,000
Quadrant 3: Customer and employee experience
- Enhanced agent experience & retention: Human agents are freed from monotonous tasks to focus on complex, rewarding problems. This improves morale and reduces costly agent turnover.
  - Estimated monthly value: $500
Quadrant 4: Strategic enablement
- Expanding business to more languages: Enabling human agents to provide support in 15+ additional languages, thanks to the translation service built into the system.
  - Estimated revenue increase: $1,750
  - Total estimated monthly value = $15,660 + $1,000 + $5,000 + $1,000 + $500 + $1,750 = $24,910

Step 2: Define the cost (the investment)
Following a TCO analysis from our earlier blog post, we calculated the total ongoing monthly cost for the fully managed AI solution on Google Cloud would be approximately $2,700.

Step 3: State the ROI
The final story was simple and powerful. With a monthly value of around $25,000 and a cost of only $2,700, the project generated significant positive cash flow. The initial setup cost was paid back in less than two weeks, securing an instant “yes” from leadership.

Get started

Read More for the details.

2025 09 11

GCP – Building scalable, resilient enterprise networks with Network Connectivity Center

Tibor Kiss Cloud, Google Cloud gcp

For large enterprises adopting a cloud platform, managing network connectivity across VPCs, on-premises data centers, and other clouds is critical. However, traditional models often lack scalability and increase management overhead. Google Cloud’s Network Connectivity Center is a compelling alternative.

As a centralized hub-and-spoke service for connecting and managing network resources, Network Connectivity Center offers a scalable and resilient network foundation. In this post, we explore Network Connectivity Center’s architecture, availability model, and design principles, highlighting its value and design considerations for maximizing resilience and minimizing the “blast radius” of issues. Armed with this information, you’ll be better able to evaluate how Network Connectivity Center fits within your organization, and to get started.

The challenges of large-scale enterprise networks

Large-scale VPC networks consistently face three core challenges: scalability, complexity, and the need for centralized management. Network Connectivity Center is engineered specifically to address these pain points head-on, thanks to:

Massively scalable connectivity: Scale far beyond traditional limits and VPC Peering quotas. Network Connectivity Center supports up to 250 VPC spokes per hub and millions of VMs, while enhanced cross-cloud connectivity upcoming features like firewall insertion will help ensure your network is prepared for future demands.
Smooth workload mobility and service networking: Easily migrate workloads between VPCs. Network Connectivity Center natively solves transitivity challenges through features like producer VPC spoke integration to support private service access (PSA) and Private Service Connect (PSC) propagation, streamlining service sharing across your organization.
Reduced operational overhead: Network Connectivity Center offers a single control point for VPC and on-premises connections, automating full-mesh connectivity between spokes to dramatically reduce operational burdens.

Under the hood: Architected for resilience

Let’s home in on how Network Connectivity Center stays resilient. A key part of that is its architecture, which is built on three distinct, decoupled planes.

A very simplified view of the Network Connectivity Center & Google Cloud networking stack

Management plane: This is your interaction layer — the APIs, gcloud commands, and Google Cloud console actions you use to configure your network. It’s where you create hubs, attach spokes, and manage settings.
Control plane: This is the brains of the operation. It takes your configuration from the management plane and programs the underlying network. It’s a distributed, sharded system responsible for the software-defined networking (SDN) that makes everything work.
Data plane: This is where your actual traffic flows. It’s the collection of network hardware and individual hosts that move packets based on the instructions programmed by the control plane.

A core principle that Network Connectivity Center uses across this architecture is fail-static behavior. This means that if a higher-level plane (like the management or control plane) experiences an issue, the planes below it continue to operate based on the last known good configuration, and existing traffic flows are preserved. This helps ensure that, say, a control plane issue doesn’t bring down your entire network.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 to try Google Cloud networking’), (‘body’, <wagtail.rich_text.RichText object at 0x3e52981be580>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

How Network Connectivity Center handles failures

A network’s strength is revealed by how it behaves under pressure. Network Connectivity Center’s design is fundamentally geared towards stability, so that potential issues are contained and their impact is minimized. Consider the following Network Connectivity Center design points:

Contained infrastructure impact: An underlying infrastructure issue such as a regional outage only affects resources within that specific scope. Because Network Connectivity Center hubs are global resources, a single regional failure won’t bring down your entire network hub. Connectivity between all other unaffected spokes remains intact.
Isolated configuration faults: We intentionally limit the “blast radius” of a configuration error with careful fault isolation. A mistake made on one spoke or hub is isolated and will not cascade to cause failures in other parts of your network. This fault isolation is a crucial advantage over intricate VPC peering topologies, where a single routing misconfiguration can have far-reaching consequences.
Uninterrupted data flows: The fail-static principle ensures that existing data flows are highly insulated from management or control plane disruptions. In the event of a failure, the network continues to forward traffic based on the last successfully programmed state, maintaining stability and continuity for your applications.

Managing the blast radius of configuration changes

Even if an infrastructure outage affects resources in its scope, Network Connectivity Center connectivity in other zones or regions remains functional. Critically, Network Connectivity Center configuration errors are isolated to the specific hub or spoke being changed and don’t cascade to unrelated parts of the network — a key advantage over complex VPC peering approaches.

To further enhance stability and operational efficiency, we also streamlined configuration management in Network Connectivity Center. Updates are handled dynamically by the underlying SDN, eliminating the need for traditional maintenance windows for configuration changes. Changes are applied transparently at the API level and are designed to be backward-compatible, for smooth and non-disruptive network evolution.

Connecting multiple regional hubs

Network Connectivity Center hub is a global resource. A multi-region resilient design may involve regional deployments with a dedicated hub per region. This requires connectivity across multiple hubs. Though Network Connectivity Center does not offer native hub-to-hub connectivity, alternative methods allow communication across Network Connectivity Center hubs, fulfilling specific controlled-access needs:

Cloud VPN or Cloud Interconnect: Use dedicated HA VPN tunnels or VLAN attachments to connect Network Connectivity Center hubs.
Private Service Connect (PSC): Leverage a producer/consumer model with PSC to provide controlled, service-specific access across Network Connectivity Center hubs.
Multi-NIC VMs: Route traffic between Network Connectivity Center hubs using VMs with network interfaces in spokes of different hubs.
Full-mesh VPC Peering: For specific use cases like database synchronization, establish peering between spokes of different Network Connectivity Center hubs.

Frequently asked questions

What happens to traffic if the Network Connectivity Center control plane fails?
Due to the fail-static design, existing data flows continue to function based on the last known successful configuration. Dynamic routing updates will stop, but existing routes remain active.

Does adding a new VPC spoke impact existing connections?
No. When a new spoke is added, the process is dynamic and existing data flows should not be interrupted.

Is there a performance penalty for traffic traversing between VPCs via Network Connectivity Center?
No. Traffic between VPCs connected by Network Connectivity Center will experience the same performance compared to VPC peering.

Best practices for resilience

While Network Connectivity Center is a powerful and resilient platform, designing a network for maximum availability requires careful planning on your part. Consider the following best practices:

Leverage redundancy: Data plane availability is localized. To survive a localized infrastructure failure, be sure to deploy critical applications across multiple zones and regions.
Plan your topology carefully: Choosing your hub topology is a critical design decision. A single global hub offers operational simplicity and is the preferred approach for most use cases. Consider multiple regional hubs only if strict regional isolation or minimizing control plane blast radius is a primary requirement, and be aware of the added complexity. Finally, even though they are regional, Network Connectivity Center hubs are still “global resources” — that means in the event of global outages, the management plane operations may be impacted independent of regional availability.
Choose Network Connectivity Center for transitive connectivity: For large-scale networks that require transitive connectivity for shared services, choosing Network Connectivity Center over traditional VPC peering can simplify operations and allow you to leverage features like PSC/PSA propagation.
Embrace infrastructure-as-code: Use tools like Terraform to manage your Network Connectivity Center configuration, which reduces the risk of manual errors and makes your network deployments repeatable and reliable.
Monitor network health: Regularly use the Google Cloud Service Health dashboard and your Personalized Service Health dashboard to stay informed about the status of Network Connectivity Center and other services.
Plan for scale: Be aware of Network Connectivity Center’s high, but finite, scale limits (e.g., 250 VPC spokes per hub) and plan your network growth accordingly.

A simple approach to scalable, resilient networking

Network Connectivity Center removes much of the complexity from enterprise networking, providing a simple, scalable and resilient foundation for your organization. By understanding its layered architecture, fail-static behavior, and design principles, you can build a network that not only meets your needs today but is ready for the challenges of tomorrow.

To get started, review the design considerations and Network Connectivity Center official documentation or contact Google Cloud teams for guidance on complex, multi-hub network designs.

Read More for the details.

2025 09 11

GCP – Prove your expertise with our Professional Security Operations Engineer certification

Tibor Kiss Cloud, Google Cloud gcp

Security leaders are clear about their priorities: After AI, cloud security is the top training topic for decision-makers. As threats against cloud workloads become more sophisticated, organizations are looking for highly-skilled professionals to help defend against these attacks.

To help organizations meet their need for experts who can manage a modern security team’s advanced tools, Google Cloud’s new Professional Security Operations Engineer (PSOE) certification can help train specialists to detect and respond to new and emerging threats.

Unlock your potential as a security operations expert

Earning a Google Cloud certification can be a powerful catalyst for career advancement. Eight in 10 learners report that having a Google Cloud certification contributes to faster career advancement, and 85% say that cloud certifications equip them with the skills to fill in-demand roles, according to an Ipsos study published in 2025 and commissioned by Google Cloud.

Foresite, a leading Google Cloud managed security service provider (MSSP), said that the certification has been instrumental in helping them provide security excellence to their clients.

“As a leader at Foresite, our commitment is to deliver unparalleled security outcomes for our clients using the power of Google Cloud. The Google Cloud Professional Security Operations Engineer (PSOE) certification is fundamental to that mission. For us, it’s the definitive validation that our engineers have mastered the advanced Google Security Operations platform we use to protect our clients’ businesses. Having a team of PSOE-certified experts provides our clients with direct assurance of our capabilities and expertise. It solidifies our credibility as a premier Google Cloud MSSP and gives us a decisive edge in the market. Ultimately, it’s a benchmark of the excellence we deliver daily,” said Brad Thomas, director, Security Engineering.

The PSOE certification can help validate practical skills needed to protect a company’s data and infrastructure in real-world scenarios, a key ingredient for professional success. It also can help security operations engineers demonstrate their ability to directly address evolving and daily challenges.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5299c5ef10>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Gain a decisive edge with certified security talent

For organizations, including MSSPs and other Google Cloud partners, this certification is a powerful way to help ensure that your security professionals are qualified to effectively implement, respond to, and remediate security events using Google Cloud’s suite of solutions.

Hiring managers are increasingly looking for a specific skill set. The Ipsos study also found that eight in 10 leaders prefer to recruit and hire professionals who hold cloud certifications, seeing them as a strong indicator of expertise.

“We are excited about Google’s new Professional Security Operations Engineer certification, which will help Accenture demonstrate our leading expertise in security engineering and operations to clients. This validation is important because it gives our clients confidence in knowing Accenture has certified professionals with structured training as they choose the best service partner for their security transformations. For our teams, this new certification offers a clear path for professional development and career advancement. Google’s Professional Security Operations Engineer certification will enable Accenture to support clients better as they successfully adopt and get the most out of the Google Security Operations and Security Command Center platforms,” said Rex Thexton, chief technology officer, Accenture Cybersecurity.

Demonstrate comprehensive expertise across Google Cloud Security tools

A Google Cloud-certified PSOE can effectively use Google Cloud security solutions to detect, monitor, investigate, and respond to security threats across an enterprise environment. This role encompasses identity, workloads, services, infrastructure, and more.

Plus, PSOEs can perform critical tasks such as writing detection rules, remediating misconfigurations, investigating threats, and developing orchestration workflows. The PSOE certification validates the candidate’s abilities with Google Cloud security tools and services, including:

Google Security Operations
Google Threat Intelligence
Security Command Center

Specifically, the exam assesses ability across six key domains:

Platform operations (~14%): Enhancing detection and response with the right telemetry sources and tools, and configuring access authorization.
Data management (~14%): Ingesting logs for security tooling and identifying a baseline of user, asset, and entity context.
Threat hunting (~19%): Performing threat hunting across environments and using threat intelligence for threat hunting.
Detection engineering (~22%): Developing and implementing mechanisms to detect risks and identify threats, and using threat intelligence for detection.
Incident response (~21%): Containing and investigating security incidents; building, implementing, and using response playbooks; and implementing the case management lifecycle.
Observability (~10%): Building and maintaining dashboards and reports to provide insights, and configuring health monitoring and alerting.

While there are no formal prerequisites to take the exam, we recommend that candidates have:

At least three years of security industry experience.
At least one year of hands-on experience using Google Cloud security tooling.

The certification is relevant for experienced professionals, including those in advanced career stages and roles, such as security architects.

Your path to security operations starts here

To prepare for the exam, Google Cloud offers resources that include online training and hands-on labs. The official Professional Security Operations Engineer Exam Guide provides a complete list of topics covered, helping candidates align their skills with the exam content. Candidates can also start preparing through the recommended learning path.

You can learn more and register for the Professional Security Operations Engineer certification today.

Read More for the details.

2025 09 10

GCP – Scaling high-performance inference cost-effectively

Tibor Kiss Cloud, Google Cloud gcp

At Google Cloud Next 2025, we announced new inference capabilities with GKE Inference Gateway, including support for vLLM on TPUs, Ironwood TPUs, and Anywhere Cache.

Our inference solution is based on AI Hypercomputer, a system built on our experience running models like Gemini and Veo 3, which serve over 980 trillion tokens a month to more than 450 million users. AI Hypercomputer services provide intelligent and optimized inferencing, including resource management, workload optimization and routing, and advanced storage for scale and performance, all co-designed to work together with industry leading GPU and TPU accelerators.

Today, GKE Inference Gateway is generally available, and we are launching new capabilities that deliver even more value. This underscores our commitment to helping companies deliver more intelligence, with increased performance and optimized costs for both training and serving.

Let’s take a look at the new capabilities we are announcing.

Efficient model serving and load balancing

A user’s experience of a generative AI application highly depends on both a fast initial response to a request and a smooth streaming of the response through to completion. With these new features, we’ve improved time-to-first-token (TTFT) and time-per-output-token (TPOT) on AI Hypercomputer. TTFT is based on the prefill phase, a compute-bound process where a full pass through the model creates a key-value (KV) cache. TPOT is based on the decode phase, a memory-bound process where tokens are generated using the KV cache from the prefill stage.

We improve both of these in a variety of ways. Generative AI applications like chatbots and code generation often reuse the same prefix in API calls. To optimize for this, GKE Inference Gateway now offers prefix-aware load balancing. This new, generally available feature improves TTFT latency by up to 96% at peak throughput for prefix-heavy workloads over other clouds by intelligently routing requests with the same prefix to the same accelerators, while balancing the load to prevent hotspots and latency spikes.

Consider a chatbot for a financial services company that helps users with account inquiries. A user starts a conversation to ask about a recent credit card transaction. Without prefix-aware routing, when the user asks follow up questions, such as the date of the charge or the confirmation number, the LLM has to re-read and re-process the entire initial query before it can answer the follow up question. The re-computation of the prefill phase is very inefficient and adds unnecessary latency, with the user experiencing delays between each question. With prefix-aware routing, the system intelligently reuses the data from the initial query by routing the request back to the same KV cache. This bypasses the prefill phase, allowing the model to answer almost instantly. Less computation also means fewer accelerators for the same workload, providing significant cost savings.

To further optimize inference performance, you can now also run disaggregated serving using AI Hypercomputer, which can improve throughput by 60%. Enhancements in GKE Inference Gateway, llm-d, and vLLM, work together to enable dynamic selection of prefill and decode nodes based on query size. This significantly improves both TTFT and TPOT by increasing the utilization of compute and memory resources at scale.

Take an example of an AI-based code completion application, which needs to provide low-latency responses to maintain interactivity. When a developer submits a completion request, the application must first process the input codebase; this is referred to as the prefill phase. Next, the application generates a code suggestion token by token; this is referred to as the decode phase. These tasks have dramatically different demands on accelerator resources — compute-intensive vs. memory-intensive processing. Running both phases on a single node results in neither being fully optimized, causing higher latency and poor response times. Disaggregated serving assigns these phases to separate nodes, allowing for independent scaling and optimization of each phase. For example, if the user base of developers submit a lot of requests based on large codebases, you can scale the prefill nodes. This improves latency and throughput, making the entire system more efficient.

Just as prefix-aware routing optimizes the reuse of conversational context, and disaggregated serving enhances performance by intelligently separating the computational demands of model prefill and token decoding, we have also addressed the fundamental challenge of getting these massive models running in the first place. As generative AI models grow to hundreds of gigabytes in size, they can often take over ten minutes to load, leading to slow startup and scaling. To solve this, we now support the Run:ai model streamer with Google Cloud Storage and Anywhere Cache for vLLM, with support for SGLang coming soon. This enables 5.4 GiB/s of direct throughput to accelerator memory, reducing model load times by over 4.9x, resulting in a better end user experience.

Get started faster with data-driven decisions

Finding the ideal technology stack for serving AI models is a significant industry challenge. Historically, customers have had to navigate rapidly evolving technologies, the switching costs that impact hardware choices, and hundreds of thousands of possible deployment architectures. This inherent complexity makes it difficult to quickly achieve the best price-performance for your inference environment.

The GKE Inference Quickstart, now generally available, can save you time, improve performance, and reduce costs when deploying AI workloads by helping determine the right accelerator for your workloads in the right configuration, suggesting the best accelerators, model server, and scaling configuration for your AI/ML inference applications. New improvements to GKE Inference Quickstart include cost insights and benchmarked performance best practices, so you can easily compare costs and understand latency profiles, saving you months on evaluation and qualification.

GKE Inference Quickstart’s recommendations are grounded in a living repository of model and accelerator performance data that we generate by benchmarking our GPU and TPU accelerators against leading large language models like Llama, Mixtral, and Gemma more than 100 times per week. This extensive performance data is then enriched with the same storage, network, and software optimizations that power AI inferencing on Google’s global-scale services like Gemini, Search, and YouTube.

Let’s say you’re tasked with deploying a new, public-facing chatbot. The goal is to provide fast, high-quality responses at the lowest cost. Until now, finding the most optimal and cost-effective solution for deploying AI models was a significant challenge. Developers and engineers had to rely on a painstaking process of trial and error. This involved manually benchmarking countless combinations of different models, accelerators, and serving architectures, with all the data logged into a spreadsheet to calculate the cost per query for each scenario. This manual, weeks-long, or even months-long, project was prone to human error and offered no guarantee that the best possible solution was ever found.

Using Google Colab and the built-in optimizations in the Google Cloud console, GKE Inference Quickstart lets you choose the most cost-effective accelerators for, say, serving a Llama 3-based chatbot application that needs a TTFT of less than 500ms. These recommendations are deployable manifests, making it easy to choose a technology stack that you can provision from GKE in your Google Cloud environment. With GKE Inference Quickstart, your evaluation and qualification effort has gone from months to days.

Views from the Google Colab that helps the engineer with their evaluation.

Try these new capabilities for yourself. To get started with GKE Inference QuickStart, from the Google Cloud console, go to Kubernetes Engine > AI/ML, and select “+ Deploy Models” near the top of the screen. Use the Filter to select Optimized > Values = True. This will show you all of the models that have price/performance optimization to select from. Once you select a model, you’ll see a sliding bar to select latency. The compatible accelerators from the drop-down will change to ones that match the performance of the latency you are selecting. You will notice that the cost/million output token will also change based on your selections.

Then, via Google Colab, you can plot and view the price/performance of leading AI models on Google Cloud. Chatbot Arena ratings are integrated to help you determine the best model for your needs based on model size, rating, and price per million tokens. You can also pull in your organization’s in-house quality measures into the colab to join with Google’s comprehensive benchmarks to make data-driven decisions.

Dedicated to optimizing inference

At Google Cloud, we are committed to helping companies deploy and improve their AI inference workloads at scale. Our focus is on providing a comprehensive platform that delivers unmatched performance and cost-efficiency for serving large language models and other generative AI applications. By leveraging a codesigned stack of industry-leading hardware and software innovations — including the AI Hypercomputer, GKE Inference Gateway, and purpose-built optimizations like prefix-aware routing, disaggregated serving, and model streaming — we ensure that businesses can deliver more intelligence with faster, more responsive user experiences and lower total cost of ownership. Our solutions are designed to address the unique challenges of inference, from model loading times to resource utilization, enabling you to deliver on the promise of generative AI. To learn more and get started, visit our AI Hypercomputer site.

Read More for the details.

2025 09 10

GCP – Fast and efficient AI inference with new NVIDIA Dynamo recipe on AI Hypercomputer

Tibor Kiss Cloud, Google Cloud gcp

As generative AI becomes more widespread, it’s important for developers and ML engineers to be able to easily configure infrastructure that supports efficient AI inference, i.e., using a trained AI model to make predictions or decisions based on new, unseen data. While great at training models, traditional GPU-based serving architectures struggle with the “multi-turn” nature of inference, characterized by back-and-forth conversations where the model must maintain context and understand user intent. Further, deploying large generative AI models can be both complex and resource-intensive.

At Google Cloud, we’re committed to providing customers with the best choices for their AI needs. That’s why we are excited to announce a new recipe for disaggregated inferencing with NVIDIA Dynamo, a high-performance, low-latency platform for a variety of AI models. Disaggregated inference separates out model processing phases, offering a significant leap in performance and cost-efficiency.

Specifically, this recipe makes it easy to deploy NVIDIA Dynamo on Google Cloud’s AI Hypercomputer, including Google Kubernetes Engine (GKE), vLLM inference engine, and A3 Ultra GPU-accelerated instances powered by NVIDIA H200 GPUs. By running the recipe on Google Cloud, you can achieve higher performance and greater inference efficiency while meeting your AI applications’ latency requirements. You can find this recipe, along with other resources, in our growing AI Hypercomputer resources repository on GitHub.

Let’s take a look at how to deploy it.

The two phases of inference

LLM inference is not a monolithic task; it’s a tale of two distinct computational phases. First is the prefill (or context) phase, where the input prompt is processed. Because this stage is compute-bound, it benefits from access to massive parallel processing power. Following prefill is the decode (or generation) phase, which generates a response, token by token, in an autoregressive loop. This stage is bound by memory bandwidth, requiring extremely fast access to the model’s weights and the KV cache.

In traditional architectures, these two phases run on the same GPU, creating resource contention. A long, compute-heavy prefill can block the rapid, iterative decode steps, leading to poor GPU utilization, higher inference costs, and increased latency for all users.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9b92743d60>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

A specialized, disaggregated inference architecture

Our new solution tackles this challenge head-on by disaggregating, or physically separating, the prefill and decode stages across distinct, independently managed GPU pools.

Here’s how the components work in concert:

A3 Ultra instances and GKE: The recipe uses GKE to orchestrate separate node pools of A3 Ultra instances, powered by NVIDIA H200 GPUs. This creates specialized resource pools — one optimized for compute-heavy prefill tasks and another for memory-bound decode tasks.
NVIDIA Dynamo: Acting as the inference server, NVIDIA Dynamo’s modular front end and KV cache-aware router processes incoming requests. It then pairs GPUs from the prefill and decode GKE node pools and orchestrates workload execution between them, transferring the KV cache that’s generated in the prefill pool to the decode pool to begin token generation.
vLLM: Running on pods within each GKE pool, the vLLM inference engine helps ensure best-in-class performance for the actual computation, using innovations like PagedAttention to maximize throughput on each individual node.

This disaggregated approach allows each phase to scale independently based on real-time demand, helping to ensure that compute-intensive prompt processing doesn’t interfere with fast token generation. Dynamo supports popular inference engines including SGLang, TensorRT-LLM and vLLM. The result is a dramatic boost in overall throughput and maximized utilization of every GPU.

Experiment with Dynamo Recipes for Google Cloud

The reproducible recipe shows the steps to deploy disaggregated inference with NVIDIA Dynamo on the A3 Ultra (H200) VMs on Google Cloud using GKE for orchestration and vLLM as the inference engine. The single node recipe demonstrates disaggregated inference with one node of A3 Ultra using four GPUs for prefill and four GPUs for decode. The multi-node recipe demonstrates disaggregated inference with one node of A3 Ultra for prefill and one node of A3 Ultra for decode for the Llama-3.3-70B-Instruct Model.

Future recipes will provide support for additional NVIDIA GPUs (e.g. A4, A4X) and inference engines with expanded coverage of models.

The recipe highlights the following key steps:

Perform initial setup – This sets up environment variables and secrets; this needs to be done one-time only.
Install Dynamo Platform and CRDs – This sets up the various Dynamo Kubernetes components; this needs to be done one-time only.
Deploy inference backend for a specific model workload – This deploys vLLM/SGLang as the inference backend for Dynamo disaggregated inference for a specific model workload. Repeat this step for every new model inference workload deployment.
Process inference requests – Once the model is deployed for inference, incoming queries are processed to provide responses to users.

Once the server is up, you will see the prefill and decode workers along with the frontend pod which acts as the primary interface to serve the requests.

We can verify if everything works as intended by sending a request to the server like this. The response is generated and truncated to max_tokens.

code_block: <ListValue: [StructValue([(‘code’, ‘curl -s localhost:8000/v1/chat/completions \rn -H “Content-Type: application/json” \rn -d ‘{rn “model”: “meta-llama/Llama-3.3-70B-Instruct”,rn “messages”: [rn {rn “role”: “user”,rn “content”: “what is the meaning of life ?”rn }rn ],rn “stream”:false,rn “max_tokens”: 30rn }’ | jq -r ‘.choices[0].message.content’rnrnrn—rnThe question of the meaning of life is a complex and deeply philosophical one that has been debated by scholars, theologians, philosophers, and scientists for’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9b92743130>)])]>

Get started today

By moving beyond the constraints of traditional serving, the new disaggregated inference recipe represents the future of efficient, scalable LLM inference. It enables you to right-size resources for each specific task, unlocking new performance paradigms and significant cost savings for your most demanding generative AI applications. We are excited to see how you will leverage this recipe to build the next wave of AI-powered services. We encourage you to try out our Dynamo Disaggregated Inference Recipe which provides a starting point with recommended configurations and easy steps. We hope you have fun experimenting and share your feedback!

Read More for the details.

2025 09 10

GCP – Our approach to carbon-aware data centers: Central data center fleet management

Tibor Kiss Cloud, Google Cloud gcp

Data centers are the engines of the cloud, processing and storing the information that powers our daily lives. As digital services grow, so do our data centers and we are working to responsibly manage them. Google thinks of infrastructure at the full stack level, not just as hardware but as hardware abstracted through software, allowing us to innovate.

We have previously shared how we’re working to reduce the embodied carbon impact at our data centers by optimizing our technical infrastructure hardware. In this post, we shine a spotlight on our “central fleet” program, which has helped us shift our internal resource management system from a machine economy to a more sustainable resource and performance economy.

What is Central Fleet?

At its core, our central fleet program is a resource distribution approach that allows us to manage and allocate computing resources, like processing power, memory, and storage in a more efficient and sustainable way. Instead of individual teams or product teams within Google ordering and managing their own physical machines, our central fleet acts as a centralized pool of resources that can be dynamically distributed to where they are needed most.

Think of it like a shared car service. Rather than each person owning a car they might only use for a couple of hours a day, a shared fleet allows for fewer cars to be used more efficiently by many people. Similarly, our central fleet program ensures our computing resources are constantly in use, minimizing waste and reducing the need to procure new machines.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7beef4a2e0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

How it works: A shift to a resource economy

The central fleet approach fundamentally changes how we provision and manage resources. When a team needs more computing power, instead of ordering specific hardware, they place an order for “quota” from the central fleet. This makes the computing resources fungible, that is, interchangeable and flexible. For instance, a team will ask for a certain amount of processing power or storage capacity, not a particular server model.

This “intent-based” ordering system provides flexibility in how demand is fulfilled. Our central fleet can intelligently fulfill requests using either existing inventory or procure at scale, which can lower cost and environmental impact. It also facilitates the return of unneeded resources that can then be reallocated to other teams, further reducing waste.

All of this is possible with our full-stack infrastructure and built on the Borg cluster management system to abstract away the physical hardware into a single, fungible resource pool. This software-level intelligence allows us to treat our infrastructure as a fluid, optimizable system rather than a collection of static machines, unlocking massive efficiency gains.

The sustainability benefits of central fleet

The central fleet approach aligns with Google’s broader dedication to sustainability and a circular economy. By optimizing the use of our existing hardware, we can achieve carbon savings. For example, in 2024, our central fleet program helped avoid procurement of new components and machines with an embodied impact equivalent to approximately 260,000 metric tons of CO2e. This roughly equates to avoiding 660 million miles driven by an average gasoline-powered passenger vehicle.¹

This fulfillment flexibility leads to greater resource efficiency and a reduced carbon footprint in several ways:

Reduced electronic waste: By extending the life of our machines through reallocation and reuse, we minimize the need to manufacture new hardware and reduce the amount of electronic waste.
Lower embodied carbon: The manufacturing of new servers carries an embodied carbon footprint. By avoiding the creation of new machines, we avoid these associated CO2e emissions.
Increased energy efficiency: Central fleet allows for the strategic placement of workloads on the most power-efficient hardware available, optimizing energy consumption across our data centers.
Promote a circular economy: This model is a prime example of circular economy principles in action, shifting from a linear “take-make-dispose” model to one that emphasizes reuse and longevity.

The central fleet initiative is more than an internal efficiency project; it’s a tangible demonstration of embedding sustainability into our core business decisions. By rethinking how we manage our infrastructure, we can meet growing AI and cloud demand while simultaneously paving the way for a more sustainable future. Learn more at sustainability.google.

^{1. Estimated avoided emissions were calculated by applying internal LCA emissions factors to machines and component resources saved through our central fleet initiative in 2024. We input the estimated avoided emissions into the EPA’s Greenhouse Gas Equivalencies Calculator to calculate the equivalent number of miles driven by an average gasoline-powered passenger vehicle (accessed August 2025). The data and claims have not been verified by an independent third-party.}

Read More for the details.

2025 09 10

GCP – Deliver intuitive shopping experiences with Conversational Commerce agent

Tibor Kiss Cloud, Google Cloud gcp

Consumer search behavior is shifting, with users now entering longer, more complex questions into search bars in pursuit of more relevant results. For instance, instead of a simple “best kids snacks,” queries have evolved to “What are some nutritious snack options for a 7-year-old’s birthday party?”

However, many digital platforms have yet to adapt to this new era of discovery, leaving shoppers frustrated as they find themselves sifting through extensive catalogs and manually applying filters. This results in quick abandonment and lost transactions, including an estimated annual global loss of $2 trillion.

We are excited to announce the general availability of Google Cloud’s Conversational Commerce agent designed to engage shoppers in natural, human-like conversations to guide them from initial intent to a completed purchase. Companies like Albertsons Cos., who was a marquee collaborator on this product and is using Conversational Commerce agent within their Ask AI tool, are already seeing an impact. Early results show customers using Ask AI often add one or more additional items to their cart, uncovering products they might not have found otherwise.

You can access Conversational Commerce agent today in the Vertex AI console.

Shoppers can ask complex questions in their own words and find exactly what they’re looking for through back-and-forth conversation that drives them to purchase.

Introducing the next generation of retail experiences

Go beyond traditional keyword search to deliver a personalized and streamlined shopping experience to drive revenue. Conversational Commerce agent integrates easily into your website and applications, guiding customers from discovery to purchase.

Conversational Commerce agent turns e-commerce challenges into opportunities through a more intuitive shopping experience:

Turn your search into a sales associate: Unlike generic chatbots, our agent is built to sell. Its intelligent intent classifier understands how your customers are shopping and tailors their experience. Just browsing? Guide them with personalized, conversational search that inspires them to find—and buy—items they wouldn’t have found otherwise. Know exactly what they want? The agent defaults to traditional search results for simple queries.
Drive revenue with natural conversation: Our agent leverages the power of Gemini to understand complex and ambiguous requests, suggest relevant products from your catalog, answer questions on product details, and even provide helpful details such as store hours.
Re-engage returning shoppers: The agent retains context across site interactions and devices. This allows returning customers to pick up exactly where they left off, creating a simplified journey that reduces friction and guides them back to their cart.
Safety and responsibility built-in: You have complete control to boost, bury, or restrict products and categories from conversations. There are also safety controls in place, ensuring all interactions are helpful and brand-appropriate.

We believe our ability to provide transformative business impact for e-commerce is affirmed by our recent positioning as a Leader in the 2025 Gartner® Magic Quadrant™ for Search and Product Discovery.

Coming soon: Unlock new methods of discovery for your customers. Shoppers can soon search with images and video, locate in-store products, find store hours, and connect with customer support.

Albertsons Cos. is leading the way in AI-powered product discovery

Albertsons Cos., is redefining how customers discover, plan and shop for groceries with Conversational Commerce agent. When Albertsons Cos. customers interacted with the Ask AI platform, more than 85% of conversations started with open-ended or exploratory questions demonstrating the need for personalized guidance.

“At Albertsons Cos., we are focused on giving our customers the best experience possible for when and how they choose to shop,” said Jill Pavlovich, SVP, Digital Customer Experience for Albertsons Cos. “By collaborating with Google Cloud to bring Conversational Commerce agent to market, we are delivering a more personalized interaction to help make our customers’ lives easier. Now they can digitally shop across aisles, plan quick meal ideas, discover new products, and even get recommendations for unexpected items that pair well together.”

The Ask AI tool is accessible now via the search bar in all Albertsons Cos. banner apps, to help customers build smarter, faster baskets through simplified product discovery, personalized recommendations and a more intuitive shopping experience.

Get started

Conversational Commerce agent guides customers to purchase, is optimized for revenue-per-visitor, and is available 24/7. Built on Vertex AI, onboarding is quick and easy, requiring minimal development effort.

To get started with Vertex AI today, contact sales.

^{Gartner, Magic Quadrant for Search and Product Discovery – Mike Lowndes, Noam Dorros, Sandy Shen, Aditya Vasudevan, June 24, 2025}

^{Disclaimer: Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Google.}

^{GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved.}

Read More for the details.

2025 09 10

GCP – Automate app deployment and security analysis with new Gemini CLI extensions

Tibor Kiss Cloud, Google Cloud gcp

Find and fix security vulnerabilities. Deploy your app to the cloud. All without leaving your command-line.

Today, we’re closing the gap between your terminal and the cloud with a first look at the future of Gemini CLI, delivered through two new extensions: security extension and Cloud Run extension. These extensions are designed to handle critical parts of your workflows with simple, intuitive commands:

1) /security:analyze performs a comprehensive scan right in your local repository, with support for GitHub pull requests coming soon. This makes security a natural part of your development cycle.

2) /deploy deploys your application to Cloud Run, our fully managed serverless platform, in just a few minutes.

These commands are the first expression of a new extensibility framework for Gemini CLI. While we’ll be sharing more about the full Gemini CLI extension world soon, we couldn’t wait to get these capabilities into your hands. Consider this a sneak peak of what’s coming next!

Security extension: automate security analysis with /security:analyze

To help teams address software vulnerabilities early in the development lifecycle, we are launching the Gemini CLI Security extension. This new open-source tool automates security analysis, enabling you to proactively catch and fix issues using the /security:analyze command at the terminal or through a soon-coming GitHub Actions integration.

Integrated directly into your local development workflow and CI/CD pipeline, this extension:

Analyzes code changes: When triggered, the extension automatically takes the git diff of your local changes or pull request.
Identifies vulnerabilities: Using a specialized prompt and tools, Gemini CLI analyzes the changes for a wide range of potential vulnerabilities, such as hardcoded-secrets, injection vulnerabilities, broken access control, and insecure data handling.
Provides actionable feedback: Gemini returns a detailed, easy-to-understand report directly in your terminal or as a comment on your pull request. This report doesn’t just flag issues; it explains the potential risks and provides concrete suggestions for remediation, helping you fix issues quickly and learn as you go.

And after the report is generated, you can also ask Gemini CLI to save it to disk or even implement fixes for each issue.

1 Gemini CLI Security Extension Terminal Gif

Getting started with /security:analyze

Integrating security analysis into your workflow is simple. First, download the Gemini CLI and install the extension (requires Gemini CLI v0.4.0+):

code_block: <ListValue: [StructValue([(‘code’, ‘gemini extensions install https://github.com/google-gemini/gemini-cli-security’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1ff17679d0>)])]>

Then you can start run your first scan:

Locally: After making local changes, simply run /security:analyze in the Gemini CLI.
In CI/CD (Coming Soon): We’re bringing security analysis directly into your CI/CD workflow. Soon, you’ll be able to configure the GitHub Action to automatically review pull requests as they are opened.

This is just the beginning. The team is actively working on further enhancing the extension’s capabilities, and we are also inviting the community to contribute to this open source project by reporting bugs, suggesting features, continuously improving security practices and submitting code improvements.

For complete documentation and to contribute, visit the official GitHub repository.

Cloud Run extension: automate deployment with /deploy

The /deploy command in Gemini CLI automates the entire deployment pipeline for your web applications. You can now deploy a project directly from your local workspace. Once you issue the command, Gemini returns a public URL for your live application.

The /deploy command automates a full CI/CD pipeline to deploy web applications and cloud services from the command line using the Cloud Run MCP server. What used to be a multi-step process of building, containerizing, pushing, and configuring is now a single, intuitive command from within the Gemini CLI.

You can access this feature across three different surfaces – in Gemini CLI in the terminal, in VS Code via Gemini Code Assist agent mode, and in Gemini CLI in Cloud Shell.

Use /deploy command in Gemini CLI at the terminal to deploy application to Cloud Run

Get started with /deploy:

For existing Google Cloud users, getting started with /deploy is straightforward in Gemini CLI at the terminal:

Prerequisites: You’ll need the gcloud CLI installed and configured on your machine and have an existing app or use Gemini CLI to create one.

Step 1: Install the Cloud Run extension
The /deploy command is enabled through a Model Context Protocol (MCP) server, which is included in the Cloud Run extension. To install the Cloud Run extension (Requires Gemini CLI v0.4.0+), run this command:

code_block: <ListValue: [StructValue([(‘code’, ‘gemini extensions install https://github.com/GoogleCloudPlatform/cloud-run-mcp’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1ff1767d90>)])]>

Step 2: Authenticate with Google Cloud
Ensure your local environment is authenticated to your Google Cloud account by running:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud auth loginrngcloud auth application-default login’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2008019e20>)])]>

Step 3: Deploy your app
Navigate to your application’s root directory in your terminal and type gemini to launch Gemini CLI. Once inside, type /deploy to deploy your app to Cloud Run.

That’s it! In a few moments, Gemini CLI will return a public URL where you can access your newly deployed application. You can also visit the Google Cloud Console to see your new service running in Cloud Run.

Besides Gemini CLI at the terminal, this feature can also be accessed in VS Code via Gemini Code Assist agent mode, powered by Gemini CLI, and in Gemini CLI in Cloud Shell, where the authentication step will be automatically handled out of the box.

3 deploy-agentmode — Use /deploy command to deploy application to Cloud Run in VS Code via Gemini Code Assist agent mode.

Building a robust extension ecosystem

The Security and Cloud Run extensions are two of the first extensions from Google built on our new framework, which is designed to create a rich and open ecosystem for the Gemini CLI. We are building a platform that will allow any developer to extend and customize the CLI’s capabilities, and this is just an early preview of the full platform’s potential. We will be sharing a more comprehensive look at our extensions platform soon, including how you can start building and sharing your own.

Try Gemini CLI today, visit the GitHub here.

Read More for the details.

2025 09 10

GCP – Introducing no-cost, multicloud Data Transfer Essentials for EU and U.K. customers

Tibor Kiss Cloud, Google Cloud gcp

At Google Cloud, our services are built with interoperability and openness in mind to enable customer choice and multicloud strategies. We pioneered a multicloud data warehouse, enabling workloads to run across clouds. We were the first company to provide digital sovereignty solutions for European governments and to waive exit fees for customers who stop using Google Cloud.

We continue this open approach with the launch today of our new Data Transfer Essentials service for customers in the European Union and the United Kingdom. Built in response to the principles of cloud interoperability and choice outlined in the EU Data Act, Data Transfer Essentials is a new, simple solution for data transfers between Google Cloud and other cloud service providers. Although the Act allows cloud providers to pass through costs to customers, Data Transfer Essentials is available today at no cost to customers.

Designed for “in-parallel” processing of workloads belonging to the same organization that are distributed across two or more cloud providers, Data Transfer Essentials enables you to build flexible, multicloud strategies and use the best-of-breed solutions across different cloud providers. This can foster greater digital operational resilience – without incurring outbound data transfer costs from Google Cloud.

To get started, please read our configuration guide to learn how to opt in and specify your multicloud traffic. Qualifying multicloud traffic will be metered separately, and will appear on your bill at a zero charge, while all other traffic will continue to be billed at existing Network Service Tier rates.

The original promise of the cloud is one that is open, elastic, and free from artificial lock-ins. Google Cloud continues to embrace this openness and the ability for customers to choose the cloud service provider that works best for their workload needs. Read more about Data Transfer Essentials here.

Read More for the details.

2025 09 09

GCP – Now available: Rust SDK for Google Cloud

Tibor Kiss Cloud, Google Cloud gcp

Rust is gaining momentum across the cloud developer community for good reason. It’s fast, memory-safe, and built for modern systems. Until now, however, your options for integrating Rust with Google Cloud were limited to unofficial Rust SDKs, which made things harder to maintain, less secure, or were missing features.

We’re excited to announce the official Rust SDK for Google Cloud, giving you a supported, idiomatic way to use Rust across more than 140+ Google Cloud APIs — no wrappers, no hacks. This SDK is built with the same commitment to consistency, security, and platform integration as our other official Google Cloud client libraries.

If you’re building high-performance backend services in Rust, writing secure, reliable data processing or real-time analytics pipelines, or maintaining internal tools that call Google Cloud APIs, this SDK makes your experience on Google Cloud much smoother.

What’s in the box?

The Google Cloud Rust SDK comes packed with features to streamline your development:

140+ APIs: Access client libraries covering core Google Cloud services. This includes capabilities for Vertex AI for machine learning workflows, Cloud Key Management Service for managing cryptographic keys, and Identity and Access Management for fine-grained control over your resources.
See the full list of packages.
Built-in auth: Support for Application Default Credentials (ADC), OAuth2, API Keys, service accounts, and (soon) Workload Identity Federation.
Helpful documentation and code samples: We’ve included guides, examples, and docs made for Rust developers.

Try the Rust SDK now!

You can find the SDK on crates.io and the source code on GitHub.

To learn more and start building, explore the official Rust SDK user guide.

Let us know what you think

We’re incredibly excited to see what you’ll build with the Rust SDK on Google Cloud! We welcome your feedback and contributions as we’re actively improving the Rust SDK.

Got feature requests? Found a bug? Want to shape what comes next? Join the discussion on GitHub.

Read More for the details.

2025 09 09

GCP – Accelerate data science with new Dataproc multi-tenant clusters

Tibor Kiss Cloud, Google Cloud gcp

With the rapid growth of AI/ML, data science teams need a better notebook experience to meet the growing demand for and importance of their work to drive innovation. Additionally, scaling data science workloads also creates new challenges for infrastructure management. Allocating compute resources per user provides strong isolation (the technical separation of workloads, processes, and data from one another), but may cause inefficiencies due to siloed resources. Shared compute resources offer more opportunities for efficiencies, but with a sacrifice in isolation. The benefit of one comes at the expense of the other. There has to be a better way…

We are announcing a new Dataproc capability: multi-tenant clusters. This new feature provides a Dataproc cluster deployment model suitable for many data scientists running their notebook workloads at the same time. The shared cluster model allows infrastructure administrators to improve compute resource efficiency and cost optimization without compromising granular, per-user authorization to data resources, such as Google Cloud Storage (GCS) buckets.

This isn’t just about optimizing infrastructure; it’s about accelerating the entire cycle of innovation that your business depends on. When your data science platform operates with less friction, your teams can move directly from hypothesis to insight to production faster. This allows your organization to answer critical business questions faster, iterate on machine learning models more frequently, and ultimately, deliver data-powered features and improved experiences to your customers ahead of the competition. It helps evolve your data platform from a necessary cost center into a strategic engine for growth.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3ed6201f1520>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

How it works

This new feature builds upon Dataproc’s previously established service account multi-tenancy. For clusters in this configuration, only a restricted set of users declared by the administrator may submit their workloads. Administrators also declare a mapping of users to service accounts. When a user runs a workload, all access to Google Cloud resources is authenticated only as their specific mapped service account. Administrators control authorization in Identity Access Management (IAM), such as granting one service account access to a set of Cloud Storage buckets and another service account access to a different set of buckets.

As part of this launch, we’ve made several key usability improvements to service account multi-tenancy. Previously, the mapping of users to service accounts was established at cluster creation time and unmodifiable. We now support changing the mapping on a running cluster, so that administrators can adapt more quickly to changing organizational requirements. We’ve also added the ability to externalize the mapping to a YAML file for easier management of a large user base.

Jupyter notebooks establish connections to the cluster via the Jupyter Kernel Gateway. The gateway launches each user’s Jupyter kernels, distributed across the cluster’s worker nodes. Administrators can horizontally scale the worker nodes to meet end user demands either by manually adjusting the number of worker nodes or by using an autoscaling policy.

Notebook users can choose Vertex AI Workbench for a fully managed Google Cloud experience or bring their own third-party JupyterLab deployment. In either model, the BigQuery JupyterLab Extension integrates with Dataproc cluster resources. Vertex AI Workbench instances can deploy the extension automatically, or users can install it manually in their third-party JupyterLab deployments.

Under the hood

Dataproc multi-tenant clusters are automatically configured with additional hardening to isolate independent user workloads:

All containers launched by YARN run as a dedicated operating system user that matches the authenticated Google Cloud user.
Each OS user also has a dedicated Kerberos principal for authentication to Hadoop-based Remote Procedure Call (RPC) services, such as YARN.
Each OS user is restricted to accessing only the Google Cloud credentials of their mapped service account. The cluster’s compute service account credentials are inaccessible to end user notebook workloads.
Administrators use IAM policies to define least-privilege access authorization for each mapped service account.

How to use it

Step 1: Create a service account multi-tenancy mapping
Prepare a YAML file containing your user service account mapping, and store it in a Cloud Storage bucket. For example:

code_block: <ListValue: [StructValue([(‘code’, ‘user_service_account_mapping:rn bob@my-company.com: service-account-for-bob@iam.gserviceaccount.comrn alice@my-company.com: service-account-for-alice@iam.gserviceaccount.com’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed622fd5520>)])]>

Step 2: Create a Dataproc multi-tenant cluster
Create a new multi-tenant Dataproc cluster using the user mapping file and the new JUPYTER_KERNEL_GATEWAY optional component.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud dataproc clusters create my-cluster \rn –identity-config-file=gs://bucket/path/to/identity-config-file \rn –service-account=cluster-service-account@iam.gserviceaccount.com \rn –region=region \rn –optional-components=JUPYTER_KERNEL_GATEWAY \rn other args …’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed622fd55e0>)])]>

If you need to change the user service account mapping later, you can do so by updating the cluster:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud dataproc clusters update my-cluster \rn –identity-config-file=gs://bucket/path/to/identity-config-file \rn –region=region’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed622fd5250>)])]>

Step 3: Create a Vertex AI Workbench instance with Dataproc kernels enabled
For users of VertexAI Workbench, create an instance with Dataproc kernels enabled. This automatically installs the BigQuery JupyterLab extension.

Step 4: Install the BigQuery JupyterLab extension in third-party deployments
For users of third-party JupyterLab deployments, such as running on a local laptop, install the BigQuery JupyterLab extension manually.

Step 5: Launch kernels in the Dataproc cluster
Open the JupyterLab application either from a Vertex AI Workbench instance or on your local machine.

The JupyterLab Launcher page opens in your browser. It shows the Dataproc Cluster Notebooks sections if you have access to Dataproc clusters with the Jupyter Optional component or Jupyter Kernel Gateway component.

To change the region and project:

Select Settings > Cloud Dataproc Settings.
On the Setup Config tab, under Project Info, change the Project ID and Region, and then click Save.
Restart JupyterLab to make the changes take effect.

Select the kernel spec corresponding to your multi-tenant cluster. Once the kernel spec is selected, the kernel is launched and it takes about 30-50 seconds for the kernel to go from Initializing to Idle state. Once the kernel is in Idle state, it is ready for execution.

Get started with multi-tenant clusters

Stop choosing between security and efficiency. With Dataproc’s new multi-tenant clusters, you can empower your data science teams with a fast, collaborative environment while maintaining centralized control and optimizing costs. This new capability is more than just an infrastructure update; it’s a way to accelerate your innovation lifecycle.

This feature is now available in public preview. Get started today by exploring the technical documentation and creating your first multi-tenant cluster. Your feedback is crucial as we continue to evolve the platform, so please share your thoughts with us at dataproc-feedback@google.com.

Read More for the details.

2025 09 09

GCP – Introducing the Agentic SOC Workshops for security professionals

Tibor Kiss Cloud, Google Cloud gcp

The security operations centers of the future will use agentic AI to enable intelligent automation of routine tasks, augment human decision-making, and streamline workflows. At Google Cloud, we want to help prepare today’s security professionals to get the most out of tomorrow’s AI agents.

As we build our agentic vision, we’re also excited to invite you to the first Agentic SOC Workshop: Practical AI for Today’s Security Teams. This complimentary, half-day event series is designed for security practitioners looking to level up their AI skills and move beyond the marketing to unlock AI’s true potential. Ultimately, we believe that agentic AI will empower security professionals to focus more on complex investigations and strategic initiatives, and drive better security outcomes and operational efficiency.

Our vision is a future where every customer has a virtual security assistant — trained by the world’s leading security experts — that anticipates threats and recommends the best path to deliver on security goals. We are building the next class of security experts empowered by AI, and these workshops are your opportunity to become one of them.

How the Agentic SOC Workshop can boost your security skills

The Agentic SOC Workshop combines foundational security capabilities with AI to help security professionals develop the necessary skills for successful AI use. Attendees will:

Explore the agentic SOC future: Learn about Google Cloud’s vision for the future of security operations, where agent AI systems automate complex workflows and empower analysts to focus on high-impact tasks.
Learn by doing: Dive into a practical, real-world AI workshop tailored for security practitioners. Learn how to use Google Cloud’s AI and threat intelligence to automate repetitive tasks, reduce alert fatigue, and improve your security skills.
Participate in a dynamic Capture the Flag challenge: Put your new skills to the test in an interactive game where you use the power of AI to solve challenges and race to the finish line.
Meet and network with peers: Gain valuable insights from industry peers and hear from other customers on their journey to modernize security operations. Connect with peers, partners, and Google experts during networking breaks concluding with a happy hour.
Discover practical uses for AI: Learn how to use Gemini in Google Security Operations to respond to threats faster and more effectively..

Join us in a city near you

These free, half-day workshops are specifically designed for security professionals, including security architects, SOC managers, analysts, and security engineers, as well as security IT decision-makers including CISOs and VPs of security.

We’ll be holding Agentic SOC Workshops starting in Los Angeles on Wednesday, Sept. 17, and Chicago on Friday, Sept. 19. Workshops will continue in October in New York City and Toronto, with more cities to come. To register for a workshop near you, please check out our registration page.

Read More for the details.

2025 09 09

GCP – Announcing partner-built AI security innovations on Google Cloud

Tibor Kiss Cloud, Google Cloud gcp

Securing AI systems is a fundamental requirement for business continuity and customer trust, and Google Cloud is at the forefront of driving secure AI innovations and working with partners to meet the evolving needs of customers.

Our secure-by-design cloud platform and built-in security solutions are continuously updated with the latest defenses, helping security-conscious organizations confidently build and deploy AI workloads. We are also collaborating with our partners to give our customers choice and flexibility to secure their AI workloads, encompassing models, data, apps, and agents.

Many of our partners are using Google Cloud’s AI to build new defenses and automate security operations, transforming the security landscape. That’s hardly surprising, given that agentic AI is poised to create a nearly $1 trillion global market, according to a new study by the Boston Consulting Group for Google Cloud.

Today, we’re excited to announce new security solutions from our partners, also available in the Google Cloud Marketplace.

Apiiro has introduced its AutoFix AI Agent to streamline developer workflows. It integrates with Gemini Code Assist to automatically generate remediated code based on an organization’s software development lifecycle context.
Broadcom has integrated Gemini across its Symantec Security products for threat detection, incident summarization, prediction, natural language query and policy summarization.
Exabeam has provided unified visibility into both human and agent activity by applying behavioral analytics to data from Google Agentspace and Model Armor. Security teams can use it to gain a comprehensive understanding of how AI agents are operating and quickly identify any anomalous behavior that could signal a threat.
Fortinet has enhanced application security with FortiWeb, which can protect applications from intentional attacks while also incorporating Data Loss Prevention to safeguard personally identifiable information used in AI applications. Fortinet has also unveiled how its product ecosystem can help secure AI workloads on Google Cloud.
F5 has worked to address critical areas of API security for generative AI applications and large language models, including prompt injection and API sprawl, with its Application Delivery and Security Platform (ADSP).
Menlo Security has offered gen AI analysis enabled by Gemini with its HEAT Shield AI. This technology expands protection against social engineering and brand impersonation attacks that target users in the browser.
Netskope has introduced DLP On Demand, designed to expand data-loss prevention directly to AI-enabled applications. Integrated into Google Cloud’s Vertex AI and Gemini ecosystem, DLP On Demand offers sensitive data protection, can prevent data leakage, and can enable customers to safely adopt AI without compromising their security or compliance posture.
Palo Alto Networks has designed Prisma AIRS to protect AI and agentic workloads on Google Cloud, including Vertex AI and Agentspace. With Cloud WAN, Prisma Access can provide high-bandwidth and performant connectivity to AI and other cloud-based applications.
Ping Identity is releasing Gemini-based models on the Vertex platform to build identity-focused agents and applications that use Google’s powerful generative AI services. These models support reasoning as well as embedding models in RAG (retrieval-augmented generation) applications that assist administrators and users with contextually-relevant answers to their prompts.
Transmit Security has used Google Cloud’s AI in Mosaic platform to deliver identity and fraud prevention for the era of consumer AI agents. It can help enterprises improve the human–agent relationship across the full identity lifecycle, from login to high-risk account activity.
Wiz has added support for Gemini Code Assist, tackling the critical gap between AI-enabled development workflows and real-time security intelligence.

In addition, to make it easier to discover and deploy agentic tools, we’ve added the new Agent Tools category in the Google Cloud Marketplace.

Collaborating across cybersecurity, including security operations, application and agent security, identity and data protection, our partners are extending the reach of AI-driven defense. They’re actively developing and integrating solutions that use our AI-native tools and platforms, and enabling end-to-end security throughout the entire AI lifecycle, from model to agent development.

Together, we invite you to explore these innovations further and join us in shaping a more secure, AI-driven future. You can learn more about our security partner ecosystem here.

Read More for the details.

2025 09 08

GCP – BigQuery under the hood: The power of the Column Metadata index aka CMETA

Tibor Kiss Cloud, Google Cloud gcp

While petabyte-scale data warehouses are becoming more common, getting the performance you need without escalating costs and effort remain key challenges, even in a modern cloud data warehouse. While many data warehouse platform providers continue to work on these challenges, BigQuery has already moved past petabyte-scale data warehouses to petabyte-scale tables. In fact, some BigQuery users have single tables in excess of 200 petabytes and over 70 trillion rows.

At this scale, even metadata is big data that requires an (almost) infinitely scalable design and high performance. In 2021, we presented the Column Metadata (CMETA) index in a 2021 VLDB paper, which, as the name implies, acts like an index for metadata. Compared to existing techniques, CMETA proved to be superior, meeting both our scalability and performance requirements. Further, BigQuery’s implementation thereof requires no user effort to maintain, and in addition to transparently improving query performance, CMETA may also reduce overall slot usage.

In this blog, we take a look at how CMETA works, the impact it can have on your workloads, and how to maximize its benefits. Let’s jump in.

How BigQuery stores data

All data in BigQuery tables is stored as data blocks that are organized in a columnar format. Data blocks also store metadata about all rows within the block. This includes min and max values for each column in the block and any other necessary properties that may be used for query optimization. This metadata allows BigQuery to perform fine-grained dynamic pruning to improve both query performance and resource efficiency.

This approach is well-known and commonly applied in the data management industry. However, as noted above, BigQuery operates on a vast scale, routinely handling tables that have over a hundred petabytes of data spread across billions of blocks in storage. Metadata for these tables frequently reach terabyte scale — larger than many organizations’ entire data warehouses!

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3ed55a935040>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Enter the Column Metadata index

To optimize queries, especially when large tables are involved, BigQuery now leverages CMETA. This system table is automatically created and managed by BigQuery to maintain a snapshot of metadata for all data blocks of user tables that may benefit from the index. This provides additional data to BigQuery’s planner, allowing it to apply additional fine-grained pruning of data blocks, reducing both resource consumption (slots usage and/or bytes scanned) and query execution time.

CMETA relies on a few key techniques.

Index generation
CMETA is automatically generated and refreshed in the background at no additional cost and does not impact user workloads. Creation and updates to the index occur automatically whenever BigQuery determines the table will benefit from the index based on size and/or volume of change to the data in an existing table. BigQuery ensures the index remains up-to-date with block statistics and column-level attributes with no need for any user action. Using efficient storage and horizontally scalable techniques, BigQuery can maintain these indexes at scale, even for some of our performance sensitive users with tables over 200 petabytes in size.

Query serving
To illustrate how the index serves queries in practice, let’s use the `natality` table from BigQuery’s public dataset. Imagine this table’s data is stored in three blocks (see Figure 1), committed at times 110, 120, and 130. Our column metadata index, with a snapshot taken at time 125, includes block- and column-level statistics for blocks 1 and 2.

code_block: <ListValue: [StructValue([(‘code’, ‘SELECT * FROM samples.natality WHERE weight_pounds >= 7 and is_male = false’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed558489970>)])]>

Considering the query above, BigQuery first scans the index to identify relevant blocks. Since the maximum value of `weight_pounds` in block 2 is 6.56 and the query filters on ‘weight_pounds’ >= 7, we know we can safely skip that block without even inspecting it. The original query then runs only against block 1 and any newer block(s) that haven’t been indexed yet — in this case, block 3. The results are combined and returned to the user.

With rich column-level attributes in the index, BigQuery can prune efficiently at the early stage of query processing. Without the index, pruning occurs at later stages when BigQuery opens the data blocks, which involves more computing resources. For large tables, skipping data blocks with this technique significantly benefits selective queries, enabling BigQuery to support much larger tables. Consider the above example but with a table that has billions of blocks. Imagine the time and slot usage savings from pruning unnecessary blocks without even needing to access the block’s header.

BigQuery’s CMETA index is unique in a few ways:

Zero maintenance cost or effort: The CMETA index is a fully automated background operation
Applicable to all data tables: CMETA works transparently to improve performance regardless of whether the table size is measured in gigabytes or petabytes
Integrated with other Google Cloud services: Works with BigQuery tables and BigLake External Tables
Safe: Always returns correct results regardless of whether CMETA is available or up-to-date

Measuring CMETA’s impact

Early adopters of CMETA have reported up to 60x improvement in query performance and up to 10x reduction in slot usage for some queries. The benefits are particularly pronounced for queries with more selective filters, especially for filters on clustering columns, as CMETA minimizes the amount of data processed by query workers.

Maximizing CMETA’s benefits

BigQuery currently automatically manages CMETA at no additional cost to users and allocates resources to create or refresh the index in a round robin way. If your tables grow or change very rapidly and you have strict performance requirements, you may choose to use your own resource pool (slots) for CMETA maintenance operations to maximize CMETA’s throughput. This will provide the most consistent experience in query performance improvement via CMETA. To do this, simply create a reservation assignment and allocate slots for background jobs, and CMETA maintenance jobs will automatically use it. More details are available in the documentation.

More to come

While this first iteration of CMETA is now generally available, we’re already working on future iterations to further improve BigQuery’s autonomous query processing capabilities, without any extra effort or cost on your part. Stay tuned for more to come.

Read More for the details.

2025 09 08

GCP – Registration now open: Our no-cost, generative AI training and certification program for veterans

Tibor Kiss Cloud, Google Cloud gcp

Growing up in a Navy family instilled a strong sense of purpose in me. My father’s remarkable 42 years of naval service not only shaped my values, but inspired me to join the Navy myself. Today, as the leader of Google Public Sector, I’m honored to continue that tradition of service in support of our federal government.

Over the past year, Google Public Sector has made significant strides in delivering AI and secure cloud infrastructure to government agencies. We introduced a Google Workspace discount through GSA’s OneGov Strategy in April, and just last month expanded that partnership with the introduction of a “Gemini for Government” offering to provide a comprehensive set of AI tools to federal agencies at less than $.50 per government agency for a year. In addition, Google Public Sector secured a $200 million contract with the Department of Defense’s (DoD) Chief Digital and Artificial Intelligence Office (CDAO) to accelerate the adoption of AI and cloud capabilities. These are just a few examples of Google’s commitment to the federal workforce.

In addition to these efforts, my role at Google Public Sector affords me the opportunity to work closely with a community I know well: our nation’s veterans. I’ve seen firsthand the immense value and leadership skills veterans bring to the table, yet the transition to civilian life can be difficult, with veterans facing challenges ranging from underemployment to difficulties securing meaningful work.

At Google Public Sector, we’ve long been committed to changing this narrative by empowering the veteran community with the skills and resources needed for successful career transitions.

Sign up today for Google Launchpad for Veterans

I’m excited to share that registration is officially open for the next cohort of Google Launchpad for Veterans. Introduced in 2024, this three-week, no-cost, virtual training program provides veterans with the foundational skills necessary to jump start rewarding careers with generative AI. The program is open to US and Canada military veterans and service members.

Last year, we trained over 4,000 veterans to help them transition into high-paying tech roles. This next cohort of learners will gain in-demand skills that enable them to thrive in both functional and technical positions, along with the knowledge to help drive digital transformation through AI within their organization.

Become a leader in AI digital transformation with this virtual, no-cost program

The Gen AI Leader training, which does not require previous technical experience, kicks off with a two-day virtual event on November 13th and 14th. You’ll enjoy interactive training sessions and a panel discussion with veterans from Google and learn:

Foundational generative AI knowledge: Grasp the core concepts of generative AI, including Large Language Models (LLMs), machine learning paradigms, and various data types.
AI ecosystem navigation: Learn to navigate the broader AI landscape, encompassing infrastructure, models, platforms, agents, and applications.
Practical business applications: Explore real-world uses of generative AI within business, with a focus on powerful Google Cloud tools like Gemini and NotebookLM.
Strategic perspective: Understand how generative AI agents can drive organizational transformation.

After completing the program, you’ll receive a complimentary voucher to take the Gen AI Leader exam. Attendees are encouraged to take the exam between November 21st and December 19th, 2025. After successful completion, you’ll receive Google’s industry-recognized Gen AI Leader certification, a valuable credential to help you advance your career.

If you’d like additional practice before taking the exam, we’re offering optional exam preparation sessions on November 17th, and 21st. As a bonus, the first 500 individuals to pass the exam will receive a voucher for their very own pair of Google socks!

Register today and get ready to translate your military experience to a powerful career where you can apply the latest in AI, security, and cloud technologies to public sector missions.

Learn more about Google Public Sector

To learn more about Google Public Sector partners, peers, and partners are building the future, join us for our Public Sector Summit, held in Washington, DC on October 29th. You can explore more content at publicsector.google.

Read More for the details.

2025 09 08

GCP – Driving for the Horizon: New Android Automotive solution on cloud offers faster builds

Tibor Kiss Cloud, Google Cloud gcp

The automotive industry is in the midst of a profound transformation, accelerating towards an era of software-defined vehicles (SDVs). This shift, however, presents significant challenges for manufacturers and suppliers alike. Their priority is making great vehicles, not great software, though the latter now contributes — and is increasingly a necessity — to achieve the former. These OEMs must find ways to bring greater efficiency and quality to their software delivery and establish new collaboration models, among other hurdles to achieving their visions for SDVs.

To help meet this moment, we’ve created Horizon, a new open-source software factory for platform development with Android Automotive OS — and beyond. With Horizon, we aim to support the software transformation of the automotive industry and tackle its most pressing challenges by providing a standardized development toolchain so OEMs can generate value by focussing on building products and experiences.

In early deployments at a half-dozen automotive partners, we’ve already seen between 10x to 50x faster feedback for developers, leading to high-frequency releases and higher build quality. In this post we will outline how Horizon helps overcome the key impediments to automotive software transformation.

The Roadblocks to Innovation in Automotive Software Development

Today, traditional automotive manufacturers (OEMs) often approach software development from a hardware-centric perspective that lacks agility and oftentimes struggles to scale. This approach makes software lifecycle support burdensome and is often accompanied by inconsistent and unreliable tools, slowing down development.

OEMs face exploding development costs, quality issues and slow innovation, making it difficult to keep pace with new market entrants and the increasing demand for advanced features. Furthermore, most customers expect frequent, high-quality over-the-air (OTA) software updates similar to what they receive on other devices such as on their smartphones, forcing most OEMs to mirror the consumer electronics experience.

But a car is not a television or refrigerator or even a rolling computer, as many now describe them. Vehicles are made up of many separate, highly complex systems, which typically require the integration of numerous components from multiple suppliers who often provide “closed box” solutions. Even as vehicles have become more connected, and dependent on these connective systems for everything from basic to advanced operations, the vehicle platform has actually become harder, not easier, to integrate and innovate with.

We knew there had to be a better way to keep up with the pace necessary to provide a great customer experience.

Introducing HORIZON: A Collaborative Path Forward

To tackle these pressing industry challenges, Google and Accenture have initiated Horizon. It is an open-source reference development platform designed to transform the automotive industry into a software-driven innovation market.

Our vision for Horizon is enabling automakers and OEMs to greatly accelerate their time to market and increase the agility of their teams while significantly reducing development costs. Horizon provides a holistic platform for the future of automotive software, enabling OEMs to invest more in innovation rather than just integration.

Key Capabilities Driving Software Excellence

Horizon offers a comprehensive suite of capabilities, establishing a developer-centric, cloud-powered, and easy-to-adopt open industry standard for embedded software.

1. Software-First Development with AAOS

Horizon champions a virtual-first approach to product design, deeply integrating with Android Automotive OS (AAOS) to empower software-led development cycles. This involves the effective use of the vehicle hardware abstraction layer (VHAL), virtio, and high-fidelity cloud-based virtual devices like Cuttlefish, which can scale to thousands of instances on demand. This approach allows for scalable automated software regression tests, elastic direct developer testing strategies, and can be seen as the initial step towards creating a complete digital twin of the vehicle.

2. Streamlined Code-Build-Test Pipeline

Horizon aims to introduce a standard for the entire software development lifecycle:

Code: It supports flexible and configurable code management using Gerrit, with the option to use GerritForge managed service via our Google Cloud Marketplace for productive deployments. With Gemini Code Assist, integrated in Cloud Workstations, you can supercharge development by leveraging code completion, bug identification, and test generation, while also aiding in explaining Android APIs.
Build: The platform features a scaled build process that leverages intelligent cloud usage and dynamic scaling. Key to this is the caching for AAOS platform builds based on warmed-up environments and the integration of the optimized Android Build File System (ABFS), which can reduce build times by more than 95% and allow full builds from scratch in one to two minutes with up to 100% cache hits. Horizon supports a wide variety of build targets, including Android 14 and 15, Cuttlefish, AVD, Raspberry Pi devices, and the Google Pixel Tablet. Build environments are containerized, ensuring reproducibility.
Test: Horizon enables scalable testing in Google Cloud with Android’s Compatibility Test Suite (CTS), utilizing Cuttlefish for virtualized runtime environments. Remote access to multiple physical build farms is facilitated by MTK Connect, which allows secure, low-latency interaction with hardware via a web browser, eliminating the need for hardware to be shipped to developers.

3. Cloud-Powered Infrastructure

Built on Google Cloud, Horizon ensures scalability and reliability. Deployment is simplified through tools like Terraform, GitOps and Helm charts, offering a plug-and-play toolchain and allowing for tracking the deployment of tools and applications to Kubernetes.

Unlocking Value for Auto OEMs and the Broader Industry

The Horizon reference platform delivers significant benefits for Auto OEMs:

Reduced costs: Horizon offers a reduction in hardware-related development costs and an overall decrease in rising development expenses.
Faster time to market: By accelerating development and enabling faster innovation cycles, Horizon helps OEMs reduce their time to market and feature cycle time.
Increased quality and productivity: The platform enables stable quality and boosts team productivity by providing standardized toolsets and fostering more effective team collaboration.
Enhanced customer experience: By enabling faster, more frequent and higher-quality builds, OEMs can change the way they develop vehicle software, thus offering enhanced customer experiences and unlocking new revenue streams through software-driven services.
Strategic focus: Horizon underpins the belief that efficient software development platforms should not be a point of differentiation for OEMs; instead, their innovation should be focused on the product itself. This allows OEMs to devote more time and resources to software development with greater quality, efficiency, and flexibility.
Robust ecosystem: To ensure scalable, secure, and future-ready deployments across diverse vehicle platforms, Horizon aims to foster collaboration between Google, integration partners, and platform adopters. While advancing the reference platform capabilities, Horizon also allows for tailored integration and compatibility with vehicle hardware, legacy systems and compliance standards.

The Horizon ecosystem

It’s been said that the best software is the one you don’t notice, so seamless and flawless is its functioning. This is especially true when it comes to the software-defined vehicle, where the focus should be on the road and the joy of the trip.

This is why we believe the platforms enabling efficient software development shouldn’t be differentiating for automakers — their vehicles should be. Like a solid set of tires or a good sound system, software is now essential, but it’s not the product itself. That is the full package put together by the combination of design, engineering, development, and production.

Because software development is now such an integral part of that process, we believe it should be an enabler, not a hindrance, for automakers. To that end, the Google Cloud, Android, and Accenture teams have continuously aimed to simplify access and the use of relevant toolchain components. The integration of OpenBSW and the Android Build File System (ABFS) are just the latest waypoints in a journey that started with GerritForge as providing a managed Gerrit offering, and continuing with additional partners in upcoming releases.

Please, join us on this journey. We invite you to become a part of the community to receive early insights, provide feedback, and actively participate in shaping the future direction of Horizon. You can also explore our open-source releases on Github to evaluate and customize the Horizon platform by deploying it in your Google Cloud environment and running reference workloads.

Horizon is a new dawn for the future of automotive software, though we can only get there together, through open collaboration and cloud-powered innovation.

^{A special thanks to a village of Googlers and Accenture who delivered this,}^{Mike Annau, Ulrich Gersch, Steve Basra, Taylor Santiago, Haamed Gheibi, James Brook, Ta’id Holmes, Sebastian Kunze, Philip Chen, Alistair Delva, Sam Lin, Femi Akinde, Casey Flynn, Milan Wiezorek, Marcel Gotza, Ram Krishnamoorthy, Achim Ramesohl, Olive Power,}^{Christoph Horn, Liam Friel, Stefan Beer, Colm Murphy, Robert Colbert, Sarah Kern, Wojciech Kowalski, Wojciech Kobryn, Dave M. Smith, Konstantin Weber, Claudine Laukant, Lisa Unterhauser}

^—

^{Opening image created using Imagen 4 with the prompt: Generate a blog post header image for the following blog post, illustrating the concept of a software-defined vehicle <insert the first six paragraphs>.}

Read More for the details.

2025 09 05

GCP – Investigate fast with AI: Gemini Cloud Assist for Dataproc & Serverless for Apache Spark

Tibor Kiss Cloud, Google Cloud gcp

Apache Spark is a fundamental part of most modern lakehouse architectures, and Google Cloud’s Dataproc provides a powerful, fully managed platform for running Spark applications. However, for data engineers and scientists, debugging failures and performance bottlenecks in distributed systems remains a universal challenge.

Manually troubleshooting a Spark job requires piecing together clues from disparate sources — driver and executor logs, Spark UI metrics, configuration files and infrastructure monitoring dashboards.

What if you had an expert assistant to perform this complex analysis for you in minutes?

Today, we are excited to introduce the public preview of Gemini Cloud Assist Investigations for troubleshooting Spark workloads. Available for both Dataproc on Google Compute Engine and Google Cloud Serverless for Apache Spark, Gemini Cloud Assist identifies underlying issues and provides clear, actionable recommendations.

Accessible directly in the Google Cloud console — either from the resource page (e.g., Serverless for Apache Spark Batch job list or Batch detail page) you are investigating or from the central Cloud Assist Investigations list — Gemini Cloud Assist offers several powerful capabilities:

For data engineers: Fix complex job failures faster. A prioritized list of intelligent summaries and cross-product root cause analyses helps in quickly narrowing down and resolving a problem.
For data scientists and ML engineers: Solve performance and environment issues without deep Spark knowledge. Gemini acts as your on-demand infrastructure and Spark expert so you can focus more on models.
For Site Reliability Engineers (SREs): Quickly determine if a failure is due to code or infrastructure. Gemini finds the root cause by correlating metrics and logs across different Google Cloud services, thereby reducing the time required to identify the problem.
For big data architects and technical managers: Boost team efficiency and platform reliability. Gemini helps new team members contribute faster, describe issues in natural language and easily create support cases.

Gemini Cloud Assist is also accessible through a direct API and other interfaces.

The inherent challenges of debugging Spark jobs

Debugging Spark applications is inherently complex because failures can stem from anywhere in a highly distributed system. These issues generally fall into two categories. First are the outright job failures. Then, there are the more insidious, subtle performance bottlenecks. Additionally, cloud infrastructure issues can cause workload failures, complicating investigations.

Gemini Cloud Assist is designed to tackle all these challenges head-on:

Problem Area	Common Issues	How Gemini Cloud Assist can help
Infrastructure Problems	Permission issues, networking errors, resource exhaustion	Gemini Cloud Assist analyzes and correlates a wide range of data, including metrics, configurations, and logs, across Google Cloud services and pinpoints the root cause of infrastructure issues and provides a clear resolution.
Configuration Problems	Resource under-provisioning, configuration missteps	Gemini Cloud Assist automatically identifies incorrect or insufficient Spark and cluster configurations, and recommends the right settings for your workload.
Application Problems	Application logic related problems, inefficient code and algorithms	Gemini Cloud Assist analyzes application logs, Spark metrics, and performance data and diagnoses code errors and performance bottlenecks, and provides actionable recommendations to fix them.
Data Problems	Stage/Task failures, data-related issues	Gemini Cloud Assist analyzes Spark metrics and logs and identifies data-related issues like data skew, and provides actionable recommendations to improve performance and stability.

Gemini Cloud Assist: Your AI-powered operational expert

Let’s explore how Gemini transforms the investigation process in common, real-world scenarios.

Example 1: The slow job with performance bottlenecks

Some of the most challenging issues are not outright failures but performance bottlenecks. A job that runs slowly can impact service-level objectives (SLOs) and increase costs, but without error logs, diagnosing the cause requires deep Spark expertise.

Say a critical batch job succeeds but takes much longer than expected. There are no failure messages, just poor performance.

Manual investigation requires a deep-dive analysis in the Spark UI. You would need to manually search for “straggler” tasks that are slowing down the job. The process also involves analyzing multiple task-level metrics to find signs of memory pressure or data skew.

With Gemini assistance

By clicking Investigate, Gemini automatically performs this complex analysis of performance metrics, presenting a summary of the bottleneck.

2 - image - potential-perofrmance-bottleneck

Gemini acts as an on-demand performance expert, augmenting a developer’s workflow and empowering them to tune workloads without needing to be a Spark internals specialist.

Example 2: The silent infrastructure failure

Sometimes, a Spark job or cluster fails due to issues in the underlying cloud infrastructure or integrated services. These problems are difficult to debug because the root cause is often not in the application logs but in a single, obscure log line from the underlying platform.

Say a cluster configured to use GPUs fails unexpectedly.

The manual investigation begins by checking the cluster logs for application errors. If no errors are found, the next step is to investigate other Google Cloud services. This involves searching Cloud Audit Logs and monitoring dashboards for platform issues, like exceeded resource quotas.

With Gemini assistance

A single click on the Investigate button triggers a cross-product analysis that looks beyond the cluster’s logs. Gemini quickly pinpoints the true root cause, such as an exhausted resource quota, and provides mitigation steps.

3 - image - insufficient-nvidia-l4-gpus-quota

Gemini bridges the gap between the application and the platform, saving hours of broad, multi-service investigation.

Get started today!

Spend less time debugging and more time building and innovating. Let Gemini Cloud Assist in Dataproc on Compute Engine and Google Cloud Serverless for Apache Spark be your expert assistant for big data operations.

Get Gemini Cloud Assist today!

Learn more about Gemini Cloud Assist in Dataproc on Compute Engine and Google Cloud Serverless for Apache Spark.

Read More for the details.