Cloud

2025 11 07

GCP – ADK architecture: When to use sub-agents versus agents as tools

At its simplest, an agent is an application that reasons on how to best achieve a goal based on inputs and tools at its disposal.

As you build sophisticated multi-agent AI systems with the Agent Development Kit (ADK), a key architectural decision involves choosing between a sub-agent and an agent as a tool. This choice fundamentally impacts your system’s design, how well it scales, and its efficiency. Choosing the wrong pattern can lead to massive overhead — either by constantly passing full conversational history to a simple function or by under-utilizing the context-sharing capabilities of a more complex system.

While both sub-agents and tools help break down complex problems, they serve different purposes. The key difference is how they handle control and context.

Agents as tools: The specialist on call

An agent as a tool is a self-contained expert agent packaged for a specific, discrete task, like a specialized function call. The main agent calls the tool with a clear input and gets a direct output, operating like a transactional API. The main agent doesn’t need to worry about how the tool works; it only needs a reliable result. This pattern is ideal for independent and reusable tasks.

Key characteristics:

Encapsulated and reusable: The internal logic is hidden, making the tool easy to reuse across different agents.
Isolated context: The tool runs in its own session and cannot access the calling agent’s conversation history or state.
Stateless: The interaction is stateless. The tool receives all the information it needs in a single request.
Strict input/output: It operates based on a well-defined contract.

Sub-agents: The delegated team member

A sub-agent is a delegated team member that handles a complex, multi-step process. This is a hierarchical and collaborative relationship where the sub-agent works within the broader context of the parent agent’s mission. Use sub-agents for tasks that require a chain of reasoning or a series of interactions.

Key characteristics:

Tightly coupled and integrated: Sub-agents are part of a larger, defined workflow.
Shared context: They operate within the same session and can access the parent’s conversation history and state, allowing for more nuanced collaboration.
Stateful processes: They are ideal for managing processes where the task requires several steps to complete.
Hierarchical delegation: The parent agent explicitly delegates a high-level task and lets the sub-agent manage the process.

Here is a simple decision matrix that you can use to guide your architectural decision based on the task:

Criterion	Agent as a tool	Sub-agent	Decision
Task complexity	Low to Medium	High	Use a tool for atomic functions. Use a sub-agent for complex workflows.
Context & state	Isolated/None	Shared	If the task is stateless, use a tool. If it requires conversational context, use a sub-agent.
Reusability	High	Low to Medium	For generic, widely applicable capabilities, build a tool. For specialized roles in a specific process, use a sub-agent.
Autonomy & control	Low	High	Use a tool for a simple request-response. Use a sub-agent for delegating a whole sub-problem.

Use cases in action

Let’s apply this framework to some real-world scenarios.

Use case 1: The data agent (NL2SQL and visualization)

A business user asks for the top 5 product sales in Q2 by region and wants a bar chart.

Root Agent : Receives the business user’s request (NL), determines the necessary steps (SQL generation → Execution → Visualization), and delegates/sequences the tasks, before returning the response to the user.
NL2SQL Agent: Use a tool. The task is a single, reusable function: convert natural language to a SQL string, using metadata & schema for grounding.
Database Executor: Use a tool. This is a simple, deterministic function to execute the query and return data.
Data Visualization Agent: Use a sub-agent. The task is complex and multi-step. It involves analyzing the data returned by the database tool, and the original user query, selecting the right chart type, generating the visualization code, and executing it. Delegating this to a sub-agent allows the main orchestrator agent to maintain a high-level view while the sub-agent independently manages its complex internal workflow.

Use case 2: The sophisticated travel planner

A user asks to plan a 5-day anniversary trip to Paris, with specific preferences for flights, hotels, and activities. This is an ambiguous, high-level goal that requires continuous context and planning.

Travel planner: Use a root agent, to maintain the overall goal (“5-day anniversary trip to Paris”),manage the flow between sub-agents, and aggregate the final itinerary.

Note: You could implement a Context/Memory Manager Tool accessible to all agents, potentially using a simple key-value store (like Redis or a simple database) to delegate the storage of immutable decisions.

Flight search: Use a sub-agent. The task is not a simple search; involving multiple back-and-forth interactions with the user (e.g., “Is a layover in Dubai okay?”) while managing the overall trip context (dates, destination, class).
Hotel booking: Use a sub-agent. It needs to maintain state and context (dates, location preference, 5-star rating) as it searches for and presents options.
Itinerary generation: Use a sub-agent to generate a logical, day-by-day itinerary. The agent must combine confirmed flights/hotels with user interests (e.g., art museums, fine dining), potentially using its own booking tools.

Using tools is inefficient; each call requires the full trip context, leading to redundancy and state loss. Sub-agents are better for these stateful, collaborative processes as they share session context.

Get started

The decision between sub-agents and agents as tools is fundamental to designing an effective and scalable agentic system in ADK. As a guiding principle, remember:

Use tools for discrete, stateless, and reusable capabilities.
Use sub-agents to manage complex, stateful, and context-dependent processes.

By mastering this architectural pattern, you can design multi-agent systems that are modular and capable of solving complex, real-world problems.

Check out these examples on GitHub to start building using ADK.
Here is a fantastic blogpost that will help you build your first multi-agent workflow.

Read More for the details.

2025 11 07

GCP – AlloyDB accelerates AI with automated vector indexing and embedding

Tibor Kiss Cloud, Google Cloud gcp

Modern applications store their most valuable data such as product catalogs or user profiles in operational databases. These data stores are excellent for applications that need to handle real-time transactions — and with their support for vector operations, they’ve also become an excellent foundation for modern search or gen AI application serving.

AlloyDB AI provides powerful, high-performance vector capabilities enabling you to generate embeddings inline and manually tune powerful vector indexes. While you can generate embeddings out of the box for in line search use cases, we also wanted AlloyDB to address the complexity of creating and maintaining huge numbers of vector embeddings.

To make this possible, we’re introducing two new features for AlloyDB AI, available in preview, that will empower you to transform your existing operational database into a powerful, AI-native database with just a few lines of SQL:

Auto vector embeddings
Auto vector index

Auto vector embeddings transform operational data into vector search ready data by vectorizing data stored inside of AlloyDB at scale. The auto vector index self-configures vector indexes optimized for customer’s workloads, ensuring high quality and performance.

Compare this to the traditional approach of creating the vectors and loading them into your database. The basic steps are familiar to any AI developer: generate vector embeddings using specialized AI models, import the vectors into the database alongside the underlying text, and tune vector indexes. In other words, build an ETL (Extract, Transform, Load) pipeline, extract the data from your database, apply transformations, run it through the AI model, reload and reformat it, then reinsert it into your database and then tune the vector indexes. This approach not only involves significant engineering complexity but also introduces latency, making it difficult to keep your application in sync with your live data despite it being stored alongside it.

An additional challenge is to keep the vector index up to date, which is hard to do manually. While manually tuned indexes are performant and provide excellent results, they can be sensitive to updates in the underlying data and require performance and quality testing before they’re ready to hit the road.

Let’s walk through an example journey of an operational workload and see how AlloyDB AI’s new features remove friction from building enterprise-grade AI, and enable users to modernize applications from their database.

AlloyDB as a vector database

Imagine you run a large e-commerce platform with a products table in AlloyDB, containing structured data like product_id, color, price, and inventory_count, alongside unstructured data such as product_description.

You want to build a gen AI search feature to improve the quality of search in your application and make it more dynamic and personalized for users. You want to evolve from solely supporting simple lexical searches such as “jacket”, which perform exact matches, to searches such as “warm coat for winter” that can find semantically similar items like jackets, coats or vests. To refine the quality, you also want to combine this semantic matching with structured filters such as color = 'maroon' or price < 100. Some of these filters may even live in a different table, such as an orders table which stores information about the user’s order history.

aside_block: <ListValue: [StructValue([(‘title’, ‘Get started with a 30-day AlloyDB free trial instance’), (‘body’, <wagtail.rich_text.RichText object at 0x7f0420dea850>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

From operational to AI-native

Before you can get started on application logic, you need to generate embeddings on your data so you can perform a vector search. For this you would typically need to:

Build an ETL pipeline to extract products data from AlloyDB
Write custom code to batch the data and send it to an embedding model API on Vertex AI
Carefully manage rate limits, token limits, and failures
Write the resulting vectors back into your database
Build another process to watch for UPDATE commands so you can do it again and again, just to keep your data fresh

AlloyDB AI’s new feature, auto vector embeddings, eliminates this entire workflow.

It provides a fully managed, scalable solution to create and maintain embeddings directly from the database. The system batches API calls to Vertex AI, maximizing throughput, and can operate as a background process to ensure that your critical transactions aren’t blocked.

To generate vector embeddings from your product_description column, you just run one SQL command:

code_block: <ListValue: [StructValue([(‘code’, “CALL ai.initialize_embeddings(rn model_id => ‘gemini-embedding-001’,rn table_name => ‘products’,rn content_column => ‘product_description’,rn embedding_column => ‘product_embedding’,rn incremental_refresh_mode => ‘transactional’ — Automatically updates on data changesrn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f0420dea550>)])]>

Now AlloyDB can handle embedding generation for you. Your products table is AI-enabled and embeddings are automatically updated as your data changes.

If you prefer to manually refresh embeddings, you can run the following SQL command:

code_block: <ListValue: [StructValue([(‘code’, “CALL ai.refresh_embeddings(rn table_name => ‘products’,rn embedding_column => ‘product_embedding’, — embedding vector columnrn batch_size => 50 — optional overridern);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f0420dea3d0>)])]>

Turbocharging search with AlloyDB AI

Now that you have embeddings, you face the second hurdle: performance and quality of search. Say a user searches for “warm winter coat.” Your query may look like this:

code_block: <ListValue: [StructValue([(‘code’, “SELECT * FROM productsrnWHERE color = ‘maroon’rnORDER BY product_embedding <-> google_ml.embedding(‘gemini-embedding-001’, ‘warm coat for winter’)rnLIMIT 10;”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f0420dea940>)])]>

To make this vector search query performant, you need a vector index. But traditional vector indexes require deep expertise: you have to manually configure parameters, rebuild the index periodically as data changes, and hope your tuning is correct. This complexity slows development and adds operational complexity.

code_block: <ListValue: [StructValue([(‘code’, ‘– Optimal `num_leaves` and `max_num_levels` are based on number of vectors in thern– products table, which means the user will have to figure that out beforehand torn– properly tune the index.rnrnCREATE INDEX idx_products_embedding ON productsrnUSING scann (product_embedding)rnWITH (num_leaves=100000, max_num_levels=2);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f0420deaa00>)])]>

The new auto vector index feature abstracts all this away and delivers a fully automated and integrated vector search experience that is self-configuring, self-maintaining, and self-tuning. To create a fully optimized index, you just run:

code_block: <ListValue: [StructValue([(‘code’, “– AlloyDB will automatically figure out index configuration underneath the hood.rnCREATE INDEX idx_products_embedding ON productsrnUSING scann (product_embedding)rnWITH (mode = ‘AUTO’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f0420deaa90>)])]>

With mode=’AUTO’, AlloyDB handles everything:

Automatic configuration: It analyzes your data and automatically configures the index parameters at creation time to meet your performance and quality goals.
Automatic maintenance: The index updates incrementally and automatically as your data changes, ensuring it remains optimized without any manual intervention. It automatically splits as the index grows in size and automatically updates centroids when data distribution drifts.
Automatic query plan optimization: This is where the real magic happens. The ScaNN index leverages real-time workload statistics to self-tune and optimize te execution plan. For a deeper dive, read our previous blog, A deep dive into AlloyDB’s vector search enhancements.

Two new ways to become AI-native

With AlloyDB’s new capabilities, making your operational workload AI-native no longer requires complex ETL pipelines and infrastructure code.

Auto vector embeddings transforms your data by handling the entire embedding generation and management lifecycle inside the database.
Auto vector index simplifies retrieval by providing a self-tuning, self-maintaining index that automatically optimizes complex filtered vector searches.

By removing this complexity, AlloyDB empowers you to use your existing SQL skills to build and scale world-class AI experiences with speed and confidence, moving projects from proof-of-concept to production faster than ever before. Get started with auto vector embeddings and the auto vector index today.

To get started, try our 30-day AlloyDB free trial. New Google Cloud customers also get $300 in free credits.

Read More for the details.

2025 11 07

GCP – Easy AI workflow automation: Deploy n8n on Cloud Run

Tibor Kiss Cloud, Google Cloud gcp

n8n is a powerful yet easy-to-use workflow and automation tool for multi-step AI agents, and many teams want a simple, scalable, and cost-effective way to self-host it. With just a few commands, you can deploy n8n to Cloud Run and have it up and running, ready to supercharge your business with AI workflows that can manage spreadsheets, read and draft emails, and more. The n8n docs now tell you how to deploy the official n8n Docker image to our serverless platform, connect it to Cloud SQL for persistent data storage, call Gemini as the agents’ LLM, and (optionally) connect your workflows directly to Google Workspace.

Deploy n8n to Cloud Run in minutes

You can deploy the official n8n image directly to Cloud Run. This gives you a managed, serverless environment that automatically scales from zero to handle any workload, so you only pay for what you use. That means whenever you’re not actively using n8n, you’re not paying for any compute and your n8n data is persisted in Cloud SQL.

To first try out n8n quickly on Cloud Run, deploy it with this one command:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud run deploy –image=n8nio/n8n \rn –allow-unauthenticated \rn –port=5678 \rn –no-cpu-throttling \rn –memory=2Gi’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f0437df8af0>)])]>

This gives you a running instance of n8n that you can use to try out n8n and all its awesome features for workflow automation with the power of AI. Connect your first n8n agent to Gemini (provide your Gemini API key for the “Google Gemini Chat Model” credentials) and see it in action.

Then when you’re ready to use n8n for actual workflows, you can follow the steps in the n8n docs for a more durable, secure setup (using Cloud SQL, Secrets Manager, etc.). You can either use a Terraform script or follow along step-by-step through each gcloud command in the instructions.

Connect Google Workspace tools

A key benefit of hosting on Google Cloud is the ability to easily connect n8n to your Google Workspace tools. The n8n docs walk you through the steps to configure OAuth for Google Cloud, allowing your n8n workflows to securely access and automate tasks using Google tools like Gmail, Google Calendar, and Google Drive.

Here’s a demo showing an n8n instance on Cloud Run that uses Gmail and Google Calendar to schedule appointments on your behalf whenever an email hits your inbox with a request to meet:

The two AI agents in this n8n workflow call Gemini to do the following:

The Text Classifier reads your incoming emails to see which ones are asking for time to meet
The Agent checks your calendar for your availability, and sends a response with a suggested time

Cloud Run is great for all AI apps

Cloud Run is a versatile, easy-to-use runtime for all your AI application needs. Whether your agentic app was made with n8n, LangChain, ADK, or no framework at all, you can deploy it to Cloud Run. This collaboration on Cloud Run and n8n is another example of how we aim to simplify the process for developers to build and deploy intelligent applications.

Next steps

Read more about Cloud Run (or just try it out in the web console!)
Explore n8n

Read More for the details.

2025 11 07

GCP – Google Cloud Europe establishes new European Advisory Board

Tibor Kiss Cloud, Google Cloud gcp

Across the world, organizations are partnering with Google Cloud to tackle their toughest challenges, drive digital transformation, and unlock new levels of growth. In Europe, organizations face unique and complex regulatory challenges. To ensure we’re delivering the best possible value and experience for our customers here, we have established a new European Advisory Board. This distinguished group of leaders from across various industries will act as a vital feedback channel, help customers navigate complex regulatory landscapes, and foster a strong, sustainable digital economy. Their counsel is key to ensuring Google Cloud products not only meet but exceed European requirements, driving our regional expertise and differentiation and ultimately supporting Europe’s digital transformation.

The board comprises renowned leaders with deep expertise spanning technology, finance, retail, and public service.

The new board members are:

Jim Snabe (Chair): A global business leader and current Chairman of Siemens AG. With a long career at the intersection of technology and innovation, including his time as Co-CEO of SAP AG, Jim brings deep expertise in guiding multinational organizations through digital transformation and growth. His leadership will be pivotal in steering the board’s strategic direction.
Stefan F Heidenreich: A business leader with extensive experience in the consumer goods industry, including as Chairman of the Management Board and CEO of Beiersdorf AG. His knowledge of brand management, market strategy, and organizational leadership will provide valuable commercial insights.
Nigel Hinshelwood: An expert in financial services with significant leadership roles at institutions like HSBC and Lloyds Banking Group. His understanding of Europe’s financial sector and regulatory environment will be crucial for guiding Google Cloud’s work with major banking and financial services clients.
Christophe Cuvillier: A prominent French businessman and former CEO of Unibail-Rodamco-Westfield. With a background in luxury, retail, and real estate, Christophe’s perspective on customer-centricity and business transformation in the consumer sector will be a key asset to the board.
Tim Radford (from Jan 2026): A former British military leader and operational commander with a background in defense and large-scale project delivery. His insights into leveraging technology to achieve strategic business objectives will be vital to the board’s discussions.

“It is a privilege to chair Google Cloud’s EMEA advisory board,” said Jim Snabe. “Europe is at a critical juncture in its digital evolution. This board’s mission is to provide counsel that helps Google Cloud not only accelerate innovation but also ensure it is done in a way that aligns with Europe’s values and priorities, fostering a secure and inclusive digital future.”

The formation of this board underscores Google Cloud’s ongoing commitment to a European-first strategy, collaborating closely with local leaders to build technology solutions that are tailored to the continent’s unique needs and opportunities. The board will meet periodically to advise Google Cloud leadership on a range of strategic issues, from product development and market entry to policy and sustainability initiatives.

Read More for the details.

2025 11 07

GCP – Boosting LLM Performance with Tiered KV Cache on Google Kubernetes Engine

Tibor Kiss Cloud, Google Cloud gcp

Large Language Models (LLMs) are powerful, but their performance can be bottlenecked by the immense NVIDIA GPU memory footprint of the Key-Value (KV) Cache. This cache, crucial for speeding up LLM inference by storing Key (K) and Value (V) matrices, directly impacts context length, concurrency, and overall system throughput. Our primary goal is to maximize the KV Cache hit ratio by intelligently expanding NVIDIA GPU High Bandwidth Memory (HBM) with a tiered node-local storage solution.

Our collaboration with the LMCache team (Kuntai Du, Jiayi Yao, and Yihua Cheng from Tensormesh) has led to the development of an innovative solution on Google Kubernetes Engine (GKE).

Tiered Storage: Expanding the KV Cache Beyond HBM

LMCache extends the KV Cache from the NVIDIA GPU’s fast HBM (Tier 1) to larger, more cost-effective tiers like CPU RAM and local SSDs. This dramatically increases the total cache size, leading to a higher hit ratio and improved inference performance by keeping more data locally on the accelerator node. For GKE users, this means accommodating models with massive context windows while maintaining excellent performance.

Performance Benchmarking and Results

We designed tests to measure the performance of this tiered KV Cache by configuring workloads to fill each storage layer (HBM, CPU RAM, Local SSD). We benchmarked these configurations using various context lengths (1k, 5k, 10k, 50k, and 100k tokens), representing diverse use cases such as:

1k – 5k tokens: High-fidelity personas and complex instructions
10k tokens: Average user prompts (small RAG) or web page/article content
50k tokens: Prompt stuffing
100k tokens: Content equivalent to a long book

Our primary performance indicators were Time to First Token (TTFT), token input throughput, and end-to-end latency. The results highlight the best-performing storage setup for each KV Cache size and the performance improvements achieved.

Experiment Setup

We deployed a vLLM server on an A3 mega machine, leveraging local SSD for ephemeral storage via emptyDir.

Hardware: 8 × nvidia-h100-mega-80gb NVIDIA GPUs
Model: Llama-3.3-70B-Instruct
LMCache version: v0.3.3
Cache Configuration:

HBM only
HBM + CPU RAM
HBM + CPU RAM + Local SSD

Storage Resources: HBM: 640Gi, CPU RAM: 1Ti, Local SSD: 5Ti

Benchmark Tool: SGLang bench_serving

Requests: Tests were conducted with system prompt lengths of 1k, 5k, 10k, 50k, and 100k tokens. Each system prompt provided a shared context for a batch of 20 inference requests, with individual requests consisting of a unique 256-token input and generating a 512-token output.

Example Command:

code_block: <ListValue: [StructValue([(‘code’, “python3 sglang/bench_serving.py –host=${IP} –port=${PORT} –dataset-name=’generated-shared-prefix’ –model=$MODEL –tokenizer=$MODEL –backend=vllm –gsp-num-groups=80 –gsp-“), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7fb230d4a940>)])]>

Benchmark Results

Our tests explored different total KV Cache sizes. The following results highlight the optimal storage setup for each size and the performance improvements achieved:

Test 1: Cache (1.1M – 1.3M tokens) fits entirely within HBM

Results: In this scenario, adding slower storage tiers provided no advantage, making an HBM-only configuration the optimal setup.

Test 2: Cache (4.0M – 4.3M tokens) exceeds HBM capacity but fits within HBM + CPU RAM

System Prompt Length	Best-performing Storage Setup	Mean TTFT (ms) Change (%) vs. HBM only	Input Throughput Change (%) vs. HBM only	Mean End-to-End Latency Change (%) vs. HBM only
1000	HBM	0%	0%	0%
5000	HBM + CPU RAM	-18%	+16%	-14%
10000	HBM + CPU RAM	-44%	+50%	-33%
50000	HBM + CPU RAM + Local SSD	-68%	+179%	-64%
100000	HBM + CPU RAM + Local SSD	-79%	+264%	-73%

Test 3: Large cache (12.6M – 13.7M tokens) saturates HBM and CPU RAM, spilling to Local SSD

System Prompt Length	Best-performing Storage Setup	Mean TTFT (ms) Change (%) vs. HBM only	Input Throughput Change (%) vs. HBM only	Mean End-to-End Latency Change (%) vs. HBM only
1000	HBM + CPU RAM	+5%	+1%	-1%
5000	HBM + CPU RAM	-6%	+27%	-21%
10000	HBM + CPU RAM	+121%	+23%	-19%
50000	HBM + CPU RAM + Local SSD	+48%	+69%	-41%
100000	HBM + CPU RAM + Local SSD	-3%	+130%	-57%

Summary

These results clearly demonstrate that a tiered storage solution significantly improves LLM inference performance by leveraging node-local storage, especially in scenarios with long system prompts that generate large KV Caches.

Optimizing LLM inference is a complex challenge requiring the coordinated effort of multiple infrastructure components (storage, compute, networking). Our work is part of a broader initiative to enhance the entire end-to-end inference stack, from intelligent load balancing at the Inference Gateway to advanced caching logic within the model server.

We are actively exploring further enhancements by integrating additional remote storage solutions with LMCache.

Next Steps

Get started with the same setup mentioned above on GKE.
Keep up to date on the LLM-D Inference Stack.

Read More for the details.

2025 11 07

GCP – Agent Factory Recap: Build AI Apps in Minutes with Google’s Logan Kilpatrick

Tibor Kiss Cloud, Google Cloud gcp

In our latest episode of The Agent Factory, we were thrilled to welcome Logan Kilpatrick from Google Deep Mind for a vibe coding session that showcased the tools shaping the future of AI development. Logan, who has had a front-row seat to the generative AI revolution at both OpenAI and now Google, gave us a hands-on tour of the vibe coding experience in Google AI Studio, showing just how fast you can go from an idea to a fully-functional AI application.

A podcast discussing vibe coding in Google AI Studio

This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.

The Build Experience in Google AI Studio – What is it?

This episode focused on the Build feature in Google AI Studio and Logan used the term vibe coding to describe the experience of using it. This feature is designed to radically accelerate how developers create AI-powered apps. The core idea is to move from a natural language prompt of an idea for an app to a live, running application in under a minute. It handles the scaffolding, code generation, and even error correction, allowing you to focus on iterating and refining your idea.

The Factory Floor

The Factory Floor is our segment for getting hands-on. Here, we moved from high-level concepts to practical code with live demos.

Vibe Coding a Virtual Food Photographer

Timestamp: [01:14]

To kick things off, Logan hit the “I’m Feeling Lucky” button to generate a random app idea: a virtual food photographer for restaurant owners. The goal was to build an app that could:

Accept a simple text-based menu.
Generate realistic, high-end photography for each dish.
Allow for style toggles like “rustic and dark” or “bright and modern.”

In about 90 seconds, we had a running web app. Logan fed it a quirky menu of pizza, blueberries, and popcorn, and the app generated images of each. We also saw how you can use AI-suggested features to iteratively adjust the prepared photos—like adding butter to the popcorn, and add functionality—like changing the entire design aesthetic of the site.

Grounding with Google Maps

Timestamp: [10:25]

Next, Logan showcased one of the most exciting new features: grounding with Google Maps. This allows the Gemini models to connect directly to Google Maps to pull in rich, real-time place data without setting up a separate API. He demonstrated a starter template app that acted as a local guide, finding Italian restaurants in Chicago and describing the neighborhood.

Exploring the AI Studio Gallery

Timestamp: [14:55]

For developers looking for inspiration, Logan walked us through the AI Studio Gallery. This is a collection of pre-built, interactive examples that show what the models are capable of. Two highlights were:

Prompt DJ: An app that uses the Lyria model to generate novel, real-time music based on a prompt.
Vibe Check: A fun tool for visually testing and comparing how different models respond to the same prompt, which is becoming a popular way for developers to quickly evaluate a model’s suitability for their use case.

“Yap to App”: A Conversational Pair Programmer

Timestamp: [19:51]

For the final demo, Logan used a speech-to-text input to describe an app idea which he called “Yap to App”. His pitch: an AI pair programmer that could generate HTML code and then vocally coach him on how to improve it. After turning his spoken request into a written prompt, AI Studio built a voice-interactive app. The AI assistant generated a simple HTML card and then, when asked, provided verbal suggestions for improvement.

The Agent Industry Pulse

Timestamp: [26:19]

In this segment, we covered some of the biggest recent launches in the agent ecosystem:

Veo 3.1: Google’s new state-of-the-art video generation model that builds on Veo 3, adding richer native audio and the ability to define the first and last frames of a video to generate seamless transitions. Smitha showcased a quick applet, built entirely in AI Studio, where users can upload a selfie of themselves and generate a video of their future career in AI using Veo 3.1.
Anthropic’s Skills: A new feature that allows you to give Claude specific tools (like an Excel script) that it can decide to use on its own to complete a task. We compared this to Gemini Gems, noting the difference in approach between creating a persona (Gem) and providing a tool (Skill).
Recent Google Launches: Logan highlighted several other key releases, including the new Gemini computer use model for building agents that can navigate browsers, updates to the Flash and Flash-Lite models, and foundational upgrades to the AI Studio experience itself.

Logan Kilpatrick on the Future of AI Development

We also had the chance to discuss the bigger picture with Logan, from developer reactions to the future of models themselves.

Grounding with Google Maps

Timestamp: [31:26]

When asked which launch developers have been most excited about, Logan admitted he was surprised by the overwhelmingly positive reception for grounding with Google Maps. He noted that the Maps API is one of the most widely used developer APIs in the world, and making it incredibly simple to integrate with Gemini unlocked key use cases for countless developers and startups.

From Models to Systems: The Next Frontier

Timestamp: [32:26]

Looking ahead, Logan shared his excitement for the continued progress on code generation, which he sees as a fundamental accelerant for all other AI capabilities. He also pointed out a trend: models are evolving from simple tools into complex systems.

Historically, a model was something that took a token in and produced a token out. Now, models are starting to look more like agents out of the box. They can take actions: spinning up code sandboxes, pinging APIs, and navigating browsers. “Folks have thought about agents and models as these decoupled concepts,” Logan said, “and it feels like they’re coming closer and closer together as the model capabilities keep improving.”

Conclusion

This conversation was a powerful reminder of how quickly the barrier to entry for building sophisticated AI applications is falling. With tools like Google AI Studio, the ability to turn a creative spark into a working prototype is no longer a matter of weeks or days, but minutes. The focus is shifting from complex scaffolding to rapid, creative iteration.

Your turn to build

We hope this episode inspired you to get hands-on. Head over to Google AI Studio to try out vibe coding for yourself, and don’t forget to watch the full episode for all the details.

Connect with us

Logan → LinkedIn, X, BlueSky, blog
Mollie → LinkedIn, X, BlueSky
Smitha → LinkedIn, YouTube, X, Instagram

Read More for the details.

2025 11 07

GCP – Build Your First ADK Agent Workforce

Tibor Kiss Cloud, Google Cloud gcp

The world of Generative AI is evolving rapidly, and AI Agents are at the forefront of this change. An AI agent is a software system designed to act on your behalf. They show reasoning, planning, and memory and have a level of autonomy to make decisions, learn, and adapt.

At its core, an AI agent uses a large language model (LLM), like Gemini, as its “brain” to understand and reason. This allows it to process information from various sources, create a plan, and execute a series of tasks to reach a predefined objective. This is the key difference between a simple prompt-and-response and an agent: the ability to act on a multi-step plan.

The great news is that you can now easily build your own AI agents, even without deep expertise, thanks to Agent Development Kit (ADK). ADK is an open-source Python and Java framework by Google designed to simplify agent creation.

To guide you, this post introduces three hands-on labs that cover the core patterns of agent development:

Building your first autonomous agent
Empowering that agent with tools to interact with external services
Orchestrate a multi-agent system where specialized agents collaborate

Build your first agent

This lab introduces the foundational principles of ADK by guiding you through the construction of a personal assistant agent.

You will write the code for the agent itself and will interact directly with the agent’s core reasoning engine, powered by Gemini, to see how it responds to a simple request. This lab is focused on building the fundamental scaffolding of every agent you’ll create.

aside_block: <ListValue: [StructValue([(‘title’, ‘Go to the lab!’), (‘body’, <wagtail.rich_text.RichText object at 0x7f04342d1dc0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Empower your agent with tools

An agent without custom tools can only rely on its built-in knowledge. To make it more powerful for your specific use-case, you can give it access to specialized tools. In this lab, you will learn three different ways to add tools:

Build a Custom Tool: Write a currency exchange tool from scratch.
Integrate a Built-in Tool: Add ADK‘s pre-built Google Search tool.
Leverage a Third-Party Tool: Import and use a Wikipedia tool from the LangChain library.

aside_block: <ListValue: [StructValue([(‘title’, ‘Go to the lab!’), (‘body’, <wagtail.rich_text.RichText object at 0x7f04342d1550>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Build a Team of Specialized Agents

When a task is too complex for a single agent, you can build out a multi-agent team. This lab goes deep into the power of multi-agent systems by having you build a “movie pitch development team” that can research, write, and analyze a film concept.

You will learn how to use ADK’s Workflow Agents to control the flow of work automatically, without needing user input at every step. You’ll also learn how to use the session state to pass information between the agents.

aside_block: <ListValue: [StructValue([(‘title’, ‘Go to the lab!’), (‘body’, <wagtail.rich_text.RichText object at 0x7f04342d1400>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Summary: Build Your First AI Teammate Today

Ready to build your first AI agents? Dive into the codelabs from this post:

Share your progress and connect with others on the journey using the hashtag #ProductionReadyAI. Happy learning!

Read More for the details.

2025 11 07

AWS – AWS Advanced .NET Data Provider Driver is Generally Available

Tibor Kiss AWS, Cloud AWS

The Amazon Web Services (AWS) Advanced .NET Data Provider Driver is now generally available for Amazon RDS and Amazon Aurora PostgreSQL and MySQL-compatible databases. This advanced database driver reduces RDS Blue/Green switchover and database failover times, improving application availability. Additionally, it supports multiple authentication mechanisms for your database, including Federated Authentication, AWS Secrets Manager authentication, and token-based authentication with AWS Identity and Access Management (IAM).

The driver builds on top of Npgsql PostgreSQL, native MySql.Data, and MySqlConnector drivers to further enhance functionality beyond standard database connectivity. The driver is natively integrated with Aurora and RDS databases, enabling it to monitor database cluster status and quickly connect to newly promoted writers during unexpected failures that trigger database failovers. Furthermore, the driver seamlessly works with popular frameworks like NHibernate and supports Entity Framework (EF) with MySQL databases.

The driver is available as an open-source project under the Apache 2.0 license. Refer the instructions on the on the GitHub repository to get started.

Read More for the details.

2025 11 07

AWS – Amazon Cognito user pools now supports private connectivity with AWS PrivateLink

Tibor Kiss AWS, Cloud AWS

Amazon Cognito user pools now supports AWS PrivateLink for secure and private connectivity. With AWS PrivateLink, you can establish a private connection between your virtual private cloud (VPC) and Amazon Cognito user pools to configure, manage, and authenticate against your Cognito user pools without using the public internet. By enabling private network connectivity, this enhancement eliminates the need to use public IP addresses or relying solely on firewall rules to access Cognito. This feature supports user pool management operations (e.g., list user pools, describe user pools), administrative operations (e.g., admin-created users), and user authentication flows (sign in local users stored in Cognito). OAuth 2.0 authorization code flow (Cognito managed login, hosted UI, sign-in via social identity providers), client credentials flow (Cognito machine-to-machine authorization), and federated sign-ins via SAML and OIDC standards are not supported through VPC endpoints at this time.

You can use PrivateLink connections in all AWS Regions where Amazon Cognito user pools is available, except AWS GovCloud (US) Regions. Creating VPC endpoints on AWS PrivateLink will incur additional charges; refer to AWS PrivateLink pricing page for details. You can get started by creating an AWS PrivateLink interface endpoint for Amazon Cognito user pools using the AWS Management Console, AWS Command Line Interface (CLI), AWS Software Development Kits (SDKs), AWS Cloud Development Kit (CDK), or AWS CloudFormation. To learn more, refer to the documentation on creating an interface VPC endpoint and Amazon Cognito’s developer guide.

Read More for the details.

2025 11 07

AWS – AWS KMS now supports Edwards-curve Digital Signature Algorithm (EdDSA)

Tibor Kiss AWS, Cloud AWS

AWS Key Management Service (KMS) announces support for the Edwards-curve Digital Signature Algorithm (EdDSA). With this new capability, you can create an elliptic curve asymmetric KMS key or data key pairs to sign and verify EdDSA signatures using the Edwards25519 curve (Ed25519). Ed25519 provides 128-bit security level equivalent to NIST P-256, faster signing performance, and small signature size (64 bytes) and public key sizes (32 bytes).

Ed25519 is ideal for situations that require small key and signature sizes, such as Internet of Things (IoT) devices and blockchain applications like cryptocurrency.

This new capability is available in all AWS Regions, including the AWS GovCloud (US) Regions and the China Regions. To learn more about this new capability, see Asymmetric key specs section in the AWS KMS Developer Guide.

Read More for the details.

2025 11 06

AWS – Amazon SageMaker launches custom tags for project resources

Tibor Kiss AWS, Cloud AWS

Today, Amazon SageMaker Unified Studio announced new capabilities allowing SageMaker projects to add custom tags to resources created through the project. This helps customers enforce tagging standards that conform to Service Control Policies (SCP) and helps enable cost tracking reporting practices on resources created across the organization.

As an Amazon SageMaker Unified Studio administrator, you can configure a project profile with tag configurations that will be pushed down to all projects using the project profile. Project profiles can be setup to pass Key and Value tag pairings or pass the Key of the tag with a default Value that can be modified during project creation. All tag values passed to the project will result in the resources created by that project being tagged. This provides administrators a governance mechanism that enforces project resources have the expected tags.

This first release of custom tags for project resources is supported only through application programming interface (API).

Custom tags for project resources capability is available in all AWS Regions where Amazon SageMaker Unified Studio is supported, including: Asia Pacific (Tokyo), Europe (Ireland), US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), South America (São Paulo), Asia Pacific (Seoul), Europe (London), Asia Pacific (Singapore), Asia Pacific (Sydney), Canada (Central), Asia Pacific (Mumbai), Europe (Paris), Europe (Stockholm)

To learn more, visit Amazon SageMaker then get started with the custom tag API documentation.

Read More for the details.

2025 11 06

AWS – AWS B2B Data Interchange is now available in AWS Europe (Ireland) Region

Tibor Kiss AWS, Cloud AWS

Customers in AWS Europe (Ireland) Region can now use AWS B2B Data Interchange to build highly customizable, scalable and cost-efficient EDI workloads.

AWS B2B Data Interchange automates validation, transformation, and generation of EDI files such as ANSI X12 documents to and from JSON and XML data formats. With this launch, you can use AWS B2B Data Interchange to process your EDI documents in AWS Europe (Ireland) Region, which enables you to meet your compliance and data sovereignty obligations while modernizing your B2B integration workloads. As part of this launch, the AWS B2B Data Interchange generative AI mapping capability will also become available in AWS Europe (Ireland) Region, simplifying mapping code development and ultimately expediting trading partners onboarding.

To learn more about AWS B2B Data Interchange visit our product page, user-guide or take our self-paced workshop. See the AWS Region Table for complete regional availability.

Read More for the details.

2025 11 06

AWS – Amazon ECS announces non-root container support for managed EBS volumes

Tibor Kiss AWS, Cloud AWS

Amazon Elastic Container Service (ECS) now supports mounting Amazon Elastic Block Store (EBS) volumes to containers running as non-root users. With this launch, ECS automatically configures the EBS volume’s file system permissions to allow non-root users to read and write data securely, while preserving the root-level ownership of the volume. This enhancement simplifies security-first container deployments by removing the need for manual permission management or custom entrypoint scripts.

This feature enhances container security by allowing tasks to run as non-root users, reducing the risk of privilege escalation and unauthorized access to data. Previously, for a container in a task to write to a mounted Amazon EBS volume, it had to run as the root user. ECS now automatically manages EBS volume permissions, simplifying workflows and ensuring that all containers within a task — regardless of user ID — can securely read and write to the mounted volume.

This feature is now available in all AWS Regions where Amazon ECS and Amazon EBS are supported, for EC2, AWS Fargate, and ECS Managed Instances launch types. To learn more, see Use Amazon EBS volumes with Amazon ECS in the Amazon ECS Developer Guide.

Read More for the details.

2025 11 06

AWS – Amazon DynamoDB Streams expands AWS PrivateLink support to FIPS endpoints

Tibor Kiss AWS, Cloud AWS

Amazon DynamoDB Streams now supports AWS PrivateLink for all available Amazon DynamoDB Streams Federal Information Processing Standard (FIPS) endpoints in US and Canada commercial AWS Regions.

With this launch, you can establish a private connection between your virtual private cloud (VPC) and Amazon DynamoDB Streams FIPS endpoints instead of connecting over the public internet, helping you meet your organization’s business, compliance, and regulatory requirements to limit public internet connectivity.

Amazon DynamoDB Streams support for AWS PrivateLink FIPs endpoints is available with Amazon DynamoDB Streams in the US and Canada commercial AWS Regions: US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), and Canada West (Calgary).

To learn more about Amazon DynamoDB Streams support for AWS PrivateLink FIPs endpoints, visit the Amazon DynamoDB Stream documentation. For more information about AWS PrivateLink and its benefits, visit the AWS PrivateLink product page.

Read More for the details.

2025 11 06

AWS – Amazon Keyspaces (for Apache Cassandra) is now available in the Middle East (UAE) Region

Tibor Kiss AWS, Cloud AWS

Amazon Keyspaces (for Apache Cassandra) is now available in the Middle East (UAE) Region, allowing customers in the Middle East to build Cassandra-compatible applications with lower latency while keeping their data within the Region to meet data residency requirements.

Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed Apache Cassandra–compatible database service. Amazon Keyspaces is serverless, so you pay for only the resources that you use and you can build applications that serve thousands of requests per second with virtually unlimited throughput and storage.

The Middle East (UAE) Region provides the same Amazon Keyspaces features available in other AWS Regions, including point-in-time recovery, Multi-Region replication, CDC streams, and IPv6 support. This regional expansion enables organizations in the Middle East to build highly scalable, low-latency applications using familiar Cassandra Query Language (CQL) without the operational burden of managing Cassandra clusters.

To learn more about on Keyspaces, visit the Amazon Keyspaces documentation.

Read More for the details.

2025 11 06

AWS – AWS IoT Greengrass v2.16 introduces system log forwarder and TPM2.0 capabilities

Tibor Kiss AWS, Cloud AWS

AWS announces the release of AWS IoT Greengrass v2.16, introducing new core components for nucleus and nucleus lite. AWS IoT Greengrass is an Internet of Things (IoT) edge runtime and cloud service that helps customers build, deploy, and manage device software at the edge. The latest version 2.16 release includes enhanced debugging capabilities through the system log forwarder component. This component uploads system log files to AWS Cloud Watch, making it easier for developers to troubleshoot IoT edge applications.

The AWS IoT Greengrass v2.16 release also features a new nucleus lite version (v2.3) with TPM2.0 specification support, enabling developers to manage edge device security for their resource constrained devices using hardware-based root of trust modules. The implementation helps developers to scale their IoT deployments with confidence while providing secure storage for secrets and streamlined device authentication.

AWS IoT Greengrass v2.16 is available in all AWS Regions where AWS IoT Greengrass is offered. To learn more about AWS IoT Greengrass v2.16 and its new features, visit the AWS IoT Greengrass documentation. Follow the Getting Started guide for a quick introduction to AWS IoT Greengrass.

Read More for the details.

2025 11 06

AWS – Amazon Elastic VMware Service (Amazon EVS) is now available in additional Regions

Tibor Kiss AWS, Cloud AWS

Today, we’re announcing that Amazon Elastic VMware Service (Amazon EVS) is now available in all availability zones in the Asia Pacific (Mumbai), Asia Pacific (Sydney), Canada (Central) and Europe (Paris) Regions. This expansion provides more options to leverage the scale and flexibility of AWS for running your VMware workloads in the cloud.

Amazon EVS lets you run VMware Cloud Foundation (VCF) directly within your Amazon Virtual Private Cloud (VPC) on EC2 bare-metal instances, powered by AWS Nitro. Using either our step-by-step configuration workflow or the AWS Command Line Interface (CLI) with automated deployment capabilities, you can set up a complete VCF environment in just a few hours. This rapid deployment enables faster workload migration to AWS, helping you eliminate aging infrastructure, reduce operational risks, and meet critical timelines for exiting your data center.

The added availability in the Asia Pacific (Mumbai), Asia Pacific (Sydney), Canada (Central) and Europe (Paris) Regions gives your VMware workloads lower latency through closer proximity to your end users, compliance with data residency or sovereignty requirements, and additional high availability and resiliency options for your enhanced redundancy strategy.

To get started, visit the Amazon EVS product detail page and user guide.

Read More for the details.

2025 11 06

AWS – AWS announces a new Regional planning tool in Builder Center

Tibor Kiss AWS, Cloud AWS

Today, AWS announced a new tool called AWS Capabilities by Region in Builder Center. This tool helps you discover and compare AWS services, features, APIs, CloudFormation resources across AWS Regions. You can explore service availability through an interactive interface, compare multiple Regions side-by-side, and view forward-looking roadmap information. This detailed visibility helps you make informed decisions about global deployments and prevent project delays due to service unavailability.

In addition to this tool, AWS also enhanced the AWS Knowledge Model Context Protocol (MCP) Server to include information about Regional capabilities in an LLM-compatible format. MCP clients and agentic frameworks can connect to the AWS Knowledge MCP Server to get real-time insights into regional service availability and suggestions for alternative solutions when specific services or features are unavailable.

You can begin exploring AWS Capabilities by Region in AWS Builder Center today. The Knowledge MCP server is also publicly accessible at no cost and does not require an AWS account. Usage is subject to rate limits. Follow the getting started guide for setup instructions.

Read More for the details.

2025 11 06

AWS – AWS Backup now supports AWS KMS customer managed keys with logically air-gapped vaults

Tibor Kiss AWS, Cloud AWS

AWS Backup now supports encrypting backups in logically air-gapped vaults with AWS Key Management Service (KMS) customer managed keys (CMKs). This enhancement provides additional encryption options beyond the existing AWS-owned keys, helping organizations meet their regulatory and compliance requirements.

You can now create logically air-gapped vaults using your own customer managed keys (CMKs) in AWS KMS, giving you more control over your backup protection strategy. Whether you want to use keys from the same account or across accounts, you maintain centralized key management while preserving the security benefits of logically air-gapped vaults. This integration works seamlessly with your existing logically air-gapped vaults and other AWS Backup features, ensuring no disruption to your backup workflows.

AWS KMS customer managed key support with logically air-gapped vaults is available in all AWS Regions where logically air-gapped vaults are currently supported.

You can get started with logically air-gapped vault support for CMKs using the AWS Backup console, API, or CLI. When creating a new logically air-gapped vault, you can now choose between an AWS-owned key or your own CMK for encryption. For more information about implementing this feature, visit the AWS Backup product page, documentation, and blog.

Read More for the details.

2025 11 06

AWS – Deadline Cloud expands support with latest 6th, 7th, and 8th generation instances

Tibor Kiss AWS, Cloud AWS

AWS announces expanded instance family support in Deadline Cloud, adding new 6th, 7th, and 8th generation EC2 instances to enhance visual effects and animation rendering workloads. This release includes support for C7i, C7a, M7i, M7a, R7a, R7i, M8a, M8i, and R8i instance families, along with additional 6th generation instance types that were previously unavailable. Deadline Cloud is a fully managed service that helps customers run visual compute workloads in the cloud without having to manage infrastructure.

With this enhancement, studios can utilize a broader range of AWS compute technology to optimize their rendering workflows. The compute-optimized (C-series), general-purpose (M-series), and memory-optimized (R-series) instances provide tailored options for different rendering workloads – from compute-intensive simulations to memory-heavy scene processing. The inclusion of latest-generation instances like M8a and R8i enables customers to access improved performance and efficiency for their most demanding rendering tasks.

These instance families are available in all 10 AWS Regions where Deadline Cloud is offered. The specific instance types available in each Region depend on the regional availability of the EC2 instance types themselves.

To learn more about the new instance types supported in Deadline Cloud and their regional availability, see the AWS Deadline Cloud pricing page.

Read More for the details.