Google Cloud

Editor’s Note: Launched in 2007, Chess.com is a premium platform for online chess and one of the largest of its kind. A Cloud SQL for MySQL shop, it transitioned to Cloud SQL Enterprise Plus edition, improving the user experience, cutting costs, and significantly reducing response times, decreasing p99 latency response from 14ms to 4ms. Read on to learn more.

Chess.com stands at the intersection of history and technology, uniting players from across the globe in a shared love for the ancient and intellectually stimulating game of chess. In recent years, we’ve witnessed a noticeable surge in demand due to a confluence of factors, including the global chess boom in 2020–2021 because of lockdowns, and media exposure from chess-focused events and TV shows. These led to an extraordinary increase in the size of the chess community and the number of active Chess.com memberships, which almost doubled in a matter of weeks. In fact, we recently achieved the remarkable milestone of 150 million members.

But such rapid growth is not without its challenges — if you don’t scale quickly enough, server capacity issues will result in site unresponsiveness and interruptions.

Scaling to keep up with demand

We took these challenges as an opportunity to reevaluate our infrastructure and adopt a more scalable, global architecture. After all, chess is a universal game that brings people together around the world, so we wanted to build a multi-regional, highly scalable platform to reflect that.

The vast majority of our workloads are built to be stateless or event-driven. However, there are many cases where we need to persist data. Given that our traffic and its sources originate from all over the globe, we have high performance and SLA requirements for our databases. These databases are vital repositories, housing essential data like games, puzzles, analyses, and user settings. We faced capacity and latency challenges with various operational databases including MySQL, ScyllaDB, Elasticsearch, and Redis.

A streamlined, cost-effective solution

Upgrading from Cloud SQL for MySQL Enterprise edition to Cloud SQL for MySQL Enterprise Plus edition was the perfect solution to tackle our complex needs, offering scalability, flexibility, and minimal administrative overhead. The Enterprise Plus edition also introduced enhanced availability and data protection features, while new optimizations and improved configurations significantly boosted performance.

Our migration journey required meticulous planning, but with the help of the Database Migration Service (DMS) and its custom destination features, we easily transitioned to Cloud SQL for MySQL Enterprise Plus edition. The migration was swift, thanks to DMS facilitating data backfill and Change Data Capture (CDC) setup. The new architecture not only resolved the server capacity challenges we were seeing from our growing user base, but also set the stage for a great user experience. By deploying databases in several Google Cloud regions across the globe, we’re able to connect millions worldwide in their shared passion for chess.

Cloud SQL for MySQL Enterprise Plus edition has been a game-changer for us, reducing operational burdens and boosting our development teams’ satisfaction. New visibility into system and query performance, which comes with the Cloud SQL Insights feature, has transformed the way our teams work and provides critical insights that fuel our innovation. Specifically, the p99 latency response saw a remarkable drop of 71.4 percent — going from 14ms to an impressive 4ms.

Deploying the Enterprise Plus edition for our core services enabled us to migrate to clusters half the size of our previous environment. This server footprint reduction delivered significant cost savings and even reduced latency. These optimizations have resulted in easier configuration and quicker database responses, ultimately improving maintenance by 90%. Many tasks that used to take five to ten minutes now take just a few seconds.

Overall, this shift has been monumental, delivering superior system performance and helping to ensure a smooth experience for our users. It’s been especially critical for services like Settings, which includes gameplay settings and is used throughout our entire system.

Future-ready: Google Cloud drives ongoing innovation

Looking ahead, we’re thrilled about the capabilities that Cloud SQL for MySQL Enterprise Plus edition offers. The ease with which we can manage and create new database instances aligns perfectly with our agile development approach, and the robust SLA of the Enterprise Plus edition instills confidence, allowing us to leverage Cloud SQL for our most critical workloads. Our collaboration with the Cloud SQL product team is ongoing, driving continuous improvements to meet the demanding needs of our enterprise applications. This forward-thinking partnership reflects our commitment to staying at the forefront of technology and delivering unparalleled experiences to our users.

Get Started:

Learn more about Cloud SQL Enterprise Plus and Data Migration Service (DMS). Start a free trial today!

Read More for the details.

2024 03 11

GCP – A window into protein folding: Lowering the barriers for AlphaFold Inferencing

The open-source tool Vertex AI AlphaFold Inference Pipeline has enabled biotech companies in streamlining protein-folding activities, accelerating their go to market timeline. It addresses key challenges in protein structure prediction by unleashing the power of parallel processing, optimizing compute resources, and scaling to meet high-throughput demands. Furthermore, it ensures reproducibility, lineage analysis, flexibility, adaptability, and seamless integration with upstream and downstream systems – all within Vertex AI as the one-stop platform. With this tool, researchers can unlock new possibilities, make groundbreaking discoveries faster than ever before, and drive end-to-end efficiency in their biotech drug discovery efforts.

However, even with Google Cloud’s efforts to make the AlphaFold algorithm more accessible to biotech firms, many bioscience organizations still struggle to integrate this technology seamlessly into their researchers’ workflows.

The biggest challenge is this: scientists who obsess over protein shapes aren’t usually coding ninjas or cloud wizards. Asking them to wrestle with complicated setups just to get a glimpse of a protein is like asking a chef to build their own oven before they can cook dinner. It’s not the best recipe for success (or tasty results).

Solution Overview

To reduce the friction, we are making our Vertex AI AlphaFold Inference Pipeline easier to use, including introducing a user-friendly AlphaFold Portal – think of it like protein modeling for beginners. We empower scientists, irrespective of their prior experience with cloud computing, to derive protein structures with minimal effort. The portal eliminates the need to engage with intricate coding (like Python on a Jupyter notebook), enabling users to focus on protein inference results iterations.

The Google Cloud AlphaFold repository now includes the option to deploy this serverless portal, which offers a streamlined, secure, and centralized way to manage protein folding experiments. Launch new experiments with a single click, simplifying workflows and saving valuable time.

Centralized Pipelines

The portal makes researchers’ work more efficient in several ways:

Centralized access: Multiple researchers can access the portal through a single web address instead of running their own Jupyter notebook instances or deploying infrastructure on separate projects.Streamlined protein folding: Researchers can run protein folding pipeline jobs under their usernames and filter simulation results based on other researchers’ work. This allows for easy comparison and fine-tuning.Enhanced collaboration: Previously, each researcher needed to run their own Jupyter notebook instance to run each protein-folding job. Now, they can collaborate more easily by accessing and comparing simulation results in a centralized location.

1- AlphaFold Portal Dashboard

Consider this dashboard to be the central hub for protein folding endeavors. Users can personalize the display, expertly filter results, and utilize designated link buttons to directly access protein resources. The need to navigate through complex configuration or executions has now been simplified.

Are you prepared to engage in protein folding? With just two clicks, your sequence (in FASTA format) will be processed and simulated. The UI will auto select recommendations for the optimal GPU machine configuration based on the type and size of your protein. However, if you are not satisfied with the suggested settings, you have the option to expand the advanced settings and customize them to your desired specifications.

2 – New Protein Folding

Furthermore, we have integrated a preview function for your protein models. Tapping into an open-source visualization tool, you can now seamlessly explore the intricate molecular structures without leaving the interface.

3 – Protein structure visualization

This tool empowers everyone in your biotech organization to harness the power of protein folding, regardless of their cloud or coding experience. Executing this highly complex and compute intensive workload seamlessly on a streamlined, optimized infrastructure, ensuring efficiency and ease of use.

Getting started

If you’re a Google Cloud newbie, no worries! We recommend checking out the Getting Started page to get familiarized with Google Cloud. Then, create a project to house all this protein-folding magic.

To proceed, follow the instructions provided in the open-source Google Cloud AlphaFold repository, accessible via the link. This repository contains convenient, pre-built templates that will assist you in setting up all the necessary components. Kindly note that this part of the process may require some technical expertise. If you encounter any challenges or require guidance, your dedicated GCP representative is readily available to assist you in navigating the complexities of the cloud.

Read More for the details.

2024 03 08

GCP – Build generative AI and similarity search applications at virtually unlimited scale with Spanner

Spanner, Google Cloud’s fully managed, highly available distributed database service, combines virtually-unlimited horizontal scalability with relational semantics, for both relational and non-relational workloads — all with a 99.999% availability SLA. As data volumes grow and applications demand more from their operational databases, customers need scale. We recently announced support for searching vector embeddings with exact nearest neighbor (KNN) search in preview, helping businesses build generative AI, at virtually unlimited scale. All these capabilities are available within Spanner, so you can perform vector search on your transactional data without moving your data to another database, maintaining operational simplicity.

In this blog, we discuss how vector search can enhance gen AI applications, and how Spanner’s underlying architecture supports extremely large-scale vector search deployments. In addition, we discuss the many operational benefits of using Spanner instead of a dedicated vector database.

Generative AI and vector embeddings

Generative AI is enabling all kinds of new applications, from virtual assistants that can have personalized conversations, to generating new content from simple text prompts. Pre-trained large language models (LLMs), on which gen AI relies, open the door for the broader developer community to easily build gen AI applications, even without specialized machine learning expertise. But because LLMs sometimes hallucinate and provide incorrect responses, combining LLMs with vector search and operational databases can help build gen AI applications that are grounded on contextual, domain-specific, and real-time data, for high-quality AI-assisted user experiences.

Imagine a financial institution has a virtual assistant that helps customers answer questions about their account, performs account management, and recommends financial products that best fit a customer’s unique needs. In complex scenarios, the customer’s decision-making process can spread across multiple chat sessions with the virtual assistant. Performing vector search over the conversation history can help the virtual assistant find the most relevant content, enabling a high-quality, highly relevant, and informative chat experience.

Vector search relies on vector embeddings — numerical representations of content such as text, images, or video generated by embedding models — and helps the gen AI application to identify the most relevant data to include in LLM prompts, thereby customizing and improving the quality of the LLM’s responses. Vector search can be performed by computing the distance between vector embeddings. The closer the embeddings are in the vector space, the more similar their content.

Bring virtually unlimited scale to vector search with Spanner

Vector workloads that need to support a large number of users can easily reach a very large scale, as seen in the financial virtual assistant example described above. Large-scale vector search workloads can have both a large number of vectors (e.g., greater than 10 billion), or queries per second (e.g., greater than millions of QPS). Not surprisingly, this can be challenging for many database systems. But many of these searches are highly partitionable, where each search is constrained to data associated with a particular user. These workloads are a great fit for Spanner KNN search because Spanner efficiently reduces the search space to provide accurate, real-time results with low latencies. Spanner’s horizontally scalable architecture lets it support vector search on trillions of vectors for highly partitionable workloads.

Spanner also lets you query and filter vector embeddings using SQL, maintaining application simplicity. Using SQL, you can easily join vector embeddings with operational data, and combine regular queries with vector search. For example, you can use secondary indexes to efficiently filter rows of interest before performing a vector search. Spanner’s vector search queries return fresh, real-time data as soon as transactions are committed, just like any other query on your operational data.

Operational simplicity and resource efficiency with Spanner

Further, Spanner’s in-database vector search capabilities eliminate the cost and complexity of managing a separate vector database, streamlining your operational workflow. In Spanner, vector embeddings and operational data are stored together and managed the same way, enabling vector embeddings to benefit from all of Spanner’s enterprise features, including high 99.999% availability, managed backups, point-in-time recovery (PITR), security and access control features, and change streams. Compute resources are shared between operational and vector queries, enabling better resource utilization and cost savings. Additionally, these capabilities are also supported by Spanner’s PostgreSQL interface, thereby giving users coming from PostgreSQL a familiar and portable interface.

Spanner is also integrated with popular AI developer tools including LangChain Vector Store, Document Loader, and Memory, helping developers to easily build AI applications with their preferred tooling.

Getting started

The rise of gen AI has spurred new interest in vector search capabilities. With support for KNN vector search on top of its virtually unlimited scale, Spanner is well-suited to support your large-scale vector search needs, all on the same platform that you already rely on for your demanding, distributed workloads. To learn more about Spanner and its vector search (and get started for free), check out the following resources:

Spanner: Database Unlimited – An overview of Spanner’s top use cases, and deep dives of how it delivers unlimited scale, strong consistency and up to 99.999% availabilitySpanner vector search documentation Vector embeddings and how you can use Spanner’s vector search in retailHow to use Spanner’s ML.PREDICT SQL function for in-database vector embedding generation (tutorial), LLM queries (tutorial), and online inference with custom models (tutorial) served by Vertex AISpanner’s AI ecosystem integrations including LangChain and the open source spanner-analytics package that facilitates common data-analytic operations in Python and that includes integrations with Jupyter Notebooks

Read More for the details.

2024 03 08

GCP – Domain-specific AI apps: A three-step design pattern for specializing LLMs

The world of AI has undergone a transformation with the rise of large language models (LLMs). These advanced models, often regarded as the zenith of current natural language processing (NLP) innovations, excel at crafting human-like text for a wide array of domains. A trend gaining traction is the tailoring of LLMs for specific fields — imagine chatbots exclusively for lawyers, for instance, or solely for medical experts.

This article embarks on a journey through the key advantages of domain-specific LLMs. Along the way, we’ll dissect three pivotal techniques for their specialization: prompt engineering, retrieval augmented generation (RAG), and fine-tuning. And to tie these technicalities into a relatable narrative, I introduce what I fondly term the “student analogy.” Let’s dive in!

Key benefits of LLMs customized to your domain

Domain-specific LLMs bring several distinct advantages:

Precision and expertise: By fine-tuning on or grounding in datasets from specialized domains, such as law or medicine, LLMs yield results that are not just accurate but also deeply relevant, capturing the nuances of the domain far better than their generic counterparts.Enhanced reliability: With a narrowed focus, these models are less susceptible to external and irrelevant information, ensuring more consistent and reliable outputs.Safety and liability: Domains like healthcare and law come with heightened stakes. A misinformed output can be detrimental. Domain-specific LLMs, often embedded with additional safety mechanisms, can deliver more trustworthy insights.Improved user experience: Engaging with a model that speaks the domain’s language — able to understand its specific jargon and context — leads to a more gratifying user interaction.Model efficiency: Instead of relying on a massive general-purpose model, it’s often more efficient to fine-tune a smaller model on a domain-specific dataset. This smaller model can offer higher quality outputs at a significantly lower cost.

In summary, while general-purpose LLMs are versatile and can handle a wide range of tasks, LLMs customized to your domain cater to the unique demands and intricacies of particular fields, ensuring more accurate, reliable, and efficient outputs. They bridge the gap between broad general knowledge and deep specialized expertise.

Tailoring LLMs for domain specificity

We’ve established the significance of domain-specific LLMs. In this section, we delve into a three-step guide to craft specialized LLMs. The figure below provides an overview of these techniques, which we will discuss in more detail.

Prompt engineering, RAG, and fine-tuning can offer a three-step approach for building domain-specific LLMs.

Step 1: Prompt engineering

Prompt engineering is the quickest way to extract domain-specific knowledge from a generic LLM without modifying its architecture or undergoing retraining. This technique involves crafting questions or prompts that guide the model to generate outputs tailored to a specific domain. For instance, we may instruct a general model to “Always provide a short answer using medical terms.” This specificity guides the model to offer a short response and use medical jargon in its response.

You can refer to the “Introduction to prompt design” documentation guide to learn about how to create prompts that elicit the desired response from language models provided by Vertex AI on Google Cloud.

Step 2: RAG (retrieval augmented generation)

RAG merges the strengths of information retrieval and LLMs. By connecting to an external knowledge source, such as a database of documents, the LLM can fetch pertinent information to form its responses. This is especially valuable for producing explainable and accurate answers, as it allows the model to access real-time or specialized facts that might be beyond its original training data. For instance, in crafting a medical chatbot, RAG allows the LLM to access a database containing up-to-date medical journals, research papers, and clinical guidelines. As a result, when a user inquires about the latest treatment for a specific condition, the LLM can retrieve and integrate current data from this database to offer a comprehensive answer, grounded in the most up-to-date medical knowledge.

Google’s Vertex AI Search is a fully managed end-to-end RAG search pipeline, empowering users to create AI-enabled search experiences for both public and internal websites. In addition, Vertex AI Vector Search , formerly known as the Vertex AI Matching Engine, offers a vector database that seamlessly integrates with your own LLM, enabling the development of RAG-based question-answering chatbots, as elaborated in this article.

It is essential to note that although building a RAG prototype is relatively simple, the development of a production-level RAG system entails considerable complexity. This process often requires iterative quality improvements and necessitates making complex design decisions, such as selecting appropriate embeddings, choosing the most suitable vector database, and determining the most effective chunking algorithms.

For those looking to dive deeper into the nuances of RAG technology, I recommend exploring the insights shared in the Medium article titled “Advanced RAG Techniques: an Illustrated Overview,” and the detailed discussions found in this Google Cloud blog post “Your RAGs powered by Google Search technology.” These resources shed light on the complexities and critical design decisions that are fundamental to the development of advanced RAG systems.

Step 3: Fine-tuning

Fine-tuning allows you to specialize a pre-trained LLM like PaLM 2 for Text (text-bison) on Vertex AI (Gemini support coming soon). This involves training the model on a smaller, domain-specific dataset. As a result, the model becomes inherently more adept in that particular area. For instance, fine-tuning an LLM already equipped with RAG for medical knowledge improves its understanding of medical topics. This allows the model to provide more accurate and informative responses, as RAG supplies relevant medical knowledge while fine-tuning ensures the response is expressed using appropriate medical terminology and phrasing.

You can refer to tuning language foundational models on Vertex AI for more information about how to tune a model on Google Cloud.

Note: The sequence of these approaches represents a recommended way of tailoring LLMs, but it isn’t a strict progression. Each technique can be used independently, and its adoption should align with your project’s needs. You might choose to use any of these steps individually, in combination, or omit some entirely. For a deeper understanding, the following section offers a detailed comparison of each technique.

Comparing prompt engineering, RAG, and fine-tuning

The table below shows a side-by-side comparison of the key benefits of prompt engineering, RAG, and fine-tuning.

Comparison of prompt engineering, RAG, and fine-tuning

Here’s an analogy that can help to further elucidate the differences between the concepts of prompt engineering, RAG, and fine-tuning.

Think of an LLM as a student from a non-English speaking background who is interested in learning the language. Training the LLM from scratch on a large corpus of data is akin to immersing the student in a vast library filled with books, articles, and web pages. Through this immersion, the student grasps the syntax, semantics, and common phrases, and as a result, can respond well to high-level questions. However, they might not fully comprehend specific technical jargon or domain-specific knowledge.

Prompt engineering, RAG, and fine-tuning in turn can prepare the student for domain-specific questions as described below.

Prompt engineering: Think of this as giving the student a set of instructions to guide their responses. For instance, we might instruct them to “Always provide a comprehensive answer using a professional tone when addressing the following question.” The student can tailor their response; however, their domain knowledge is unchanged.

RAG: Here, we provide the student with a domain-specific book and then conduct an open-book exam. During the test, the student is allowed to skim through the book to retrieve relevant information for each question. The student can double check their response and make sure that the final response is factual, as well as citing the section that they used in their response.

Fine-tuning: In this phase, we hand the student a domain-specific book and ask them to study it in depth, preparing for a closed-book exam. Once the exam begins, the student cannot consult the book and must rely solely on what they’ve internalized from their study. In this case, the student can respond to the questions well. However, unlike the previous case, they may mistakenly offer incorrect answers and may not be able to provide accurate references.

For rapid prototyping and deployment, we recommend getting started with prompt engineering. It requires minimal resources and offers the most immediate improvements. Once you’re ready for a more extensive setup, you can integrate RAG for real-time, up-to-date information retrieval. Lastly, you can invest in fine-tuning when you have a sufficiently large and diverse dataset at hand.

Read More for the details.

2024 03 08

GCP – Memorystore for Redis vector search and LangChain integrations for gen AI

In the increasingly competitive generative AI space, developers are looking for scalable and cost-effective means of differentiation and ways to improve user experience. Last week, we announced several enhancements to Memorystore for Redis, evolving it into a core building block for developers who are creating low-latency generative AI applications.

First, we launched native Memorystore support for vector store and vector search, so you can leverage Memorystore as a ultra-low latency data store for your gen AI applications and use cases such as Retrieval Augmented Generation (RAG), recommendation systems, semantic search, and more. With the introduction of vectors as first-class data types in Memorystore for Redis 7.2, we’ve augmented one of the most popular key-value stores with the functionality needed to build gen AI applications with Memorystore’s ultra-low and predictable latency.

Second, we launched open-source integrations with the popular LangChain framework to provide simple building blocks for large language model (LLM) applications. We launched LangChain integrations for:

Vector store: Memorystore’s vector store capabilities directly integrate with LangChain’s vector stores, simplifying retrieval-based tasks and enabling powerful AI applications.Document loaders: Memorystore becomes a high-performance backend for document loaders within LangChain. Store and retrieve large text documents with lightning speed, enhancing LLM-powered question answering or summarization tasks.Memory storage: Memorystore now serves as a low-latency “memory” for LangChain chains, storing users’ message history with a simple Time To Live (TTL configuration). “Memory”, in the context of LangChain, allows LLMs to retain context and information across multiple interactions, leading to more coherent and sophisticated conversations or text generation.

With these enhancements, Memorystore for Redis is now positioned to provide blazing-fast vector search, becoming a powerful tool for applications using RAG, where latency matters (and Redis wins!). In addition, just as Redis is often used as a data cache for databases, you can now also use Memorystore as an LLM cache to provide ultra fast lookups — and significantly reduce LLM costs. Please check out this Memorystore CodeLab for hands-on examples of using these LangChain integrations.

For gen AI, performance matters

Several products in Google Cloud’s Data Cloud portfolio — BigQuery, AlloyDB, Cloud SQL, and Spanner — offer native support as vector stores with integrations with LangChain. So why choose Memorystore? The simple answer is performance, since it stores all the data and embeddings in memory. A Memorystore for Redis instance can perform vector search at single-digit millisecond latency over tens of millions of vectors. So for real-time use cases and when the user experience depends on low latencies and producing answers quickly, Memorystore is unrivaled for speed.

To provide the low-latencies for vector search that our users have come to expect from Memorystore, we made a few key enhancements. First, we engineered our service to leverage multi-threading for query execution. This optimization allows queries to distribute across multiple CPUs, resulting in significantly higher query throughput (QPS) at low latency — especially when extra processing resources are available.

Second, because we understand that search needs vary, we are providing two distinct search approaches to help you find the right balance between speed and accuracy. The HNSW (Hierarchical Navigable Small World) option delivers fast, approximate results — ideal for large datasets where a close match is sufficient. If you require absolute precision, the ‘FLAT’ approach guarantees exact answers, though it may take slightly longer to process.

Below, let’s dive into a common use case of retrieval augmented generation (RAG) and demonstrate how Memorystore’s lightning-fast vector search can ground LLMs in facts and data.

Then, we’ll provide an example of how to combine Memorystore for Redis with LangChain to create a chatbot that answers questions about movies.

Use case: Memorystore for RAG

RAG has become a popular tool for “grounding” LLMs in facts and relevant data, to improve their accuracy and minimize hallucinations. RAG augments LLMs by anchoring them with the fresh data that was retrieved based on a user query (learn more here). With Memorystore’s ability to search vectors across both FLAT and HNSW indexes and native integration with LangChain, you can quickly build a high-quality RAG application that retrieves relevant documents at ultra-low latency and feeds them to the LLM such that user questions are answered with accurate information.

Below we demonstrate two workflows using LangChain integrations: data loading in preparation for RAG, and RAG itself, to engineer improved LLM experiences.

Data loading

Memorystore’s vector store integration with LangChain seamlessly handles the generation of embeddings and then stores them within Memorystore, streamlining the entire RAG workflow.

Using the LangChain Document Loader integration, data is loaded in a LangChain “document” format.

Data sources can include files, PDFs, knowledge base articles, data already in a database, data already in Memorystore, etc

Vertex AI is used to create embeddings for the loaded documents

Using the LangChain Vector Store integration, the embeddings from step #2, as well as metadata and the data itself, are loaded into Memorystore for Redis

Now that Memorystore for Redis is loaded with the embeddings, metadata, and the data itself, you can leverage RAG to perform ultra-fast vector search and ground your LLMs with relevant facts and data.

Retrieval augmented generation

A user’s question is submitted to a chat app, which leverages Memorystore for Redis vector search to feed relevant documents to an LLM, to help ensure the LLM’s answer is grounded and factual.

A user submits a query to a chat application that leverages the LangChain framework.Vertex AI is used to create embeddings for the submitted query.The chat application uses the embeddings from Step #2 to perform vector search against data stored in Memorystore for Redis. The chat app retrieves the top N similar docs.The retrieved docs are provided to the LLM along with the original user query. The retrieved docs provide relevant facts and grounding for the LLM.The LLM’s well-grounded answer is provided back to the original user.

Because of Memorystore’s ultra-fast vector search capabilities, the vector search performed in step 3 is extremely fast, speeding up the entire system by lowering the chat latency and improving the user experience.

Putting it all together: Build a movie chatbot

Now, let’s create a chatbot using Memorystore for Redis and LangChain that answers questions about movies based on a Kaggle Netflix dataset. The three steps below are an abridged version of the full codelab here, which details the additional prerequisite steps of provisioning a Memorystore for Redis instance, a Compute Engine VM in the same VPC, and how to enable the Vertex AI APIs.

Step 1: Dataset preparation

We start by downloading a dataset of Netflix movie titles. The dataset includes movie descriptions, which are then processed into LangChain documents.

Step 2: Connecting services

Next, we leverage an embeddings service with Vertex AI and connect to Memorystore for Redis. These services are essential for handling the semantic search and data storage needs of our chatbot:

code_block
<ListValue: [StructValue([(‘code’, ’embeddings_service = VertexAIEmbeddings(model_name=”textembedding-gecko@latest”, project=f'{PROJECT_ID}’)rnrnindex_config = HNSWConfig(rn name=”netflix_complete:”, distance_strategy=DistanceStrategy.COSINE, vector_size=768rn)rnrnRedisVectorStore.init_index(client=client, index_config=index_config)rnrnvector_store = RedisVectorStore(rn client=client, index_name=”netflix_complete:”, embeddings=embeddings_servicern)rnrn# ‘documents’ is an array containing Langchain Document objects. These objects rn# follow a dictionary-like structure and hold both the textual content of a rn# document and its associated metadata. rn# Dictionary fields may include:rn# * ‘content’: The core text of the document.rn# * ‘metadata’: Additional information about the document (e.g., source, rn# title, author, etc.).rnvector_store.add_documents(documents)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e18bc89d1c0>)])]>

Step 3: Chatbot integration

The final step involves integrating the processed data with our chatbot framework. We use the Redis vector store to manage movie documents and create a conversational retrieval chain. This chain leverages the embeddings and chat history to answer user queries about movies:

code_block
<ListValue: [StructValue([(‘code’, ‘retriever = vector_store.as_retriever(search_type=”mmr”, search_kwargs={‘k’: 5, ‘lambda_mult’: 0.8})rnllm = VertexAI(model_name=”gemini-pro”, project=f'{PROJECT_ID}’)rnrnmemory = ConversationSummaryBufferMemory(rn llm=llm,rn chat_memory=chat_history,rn output_key=’answer’,rn memory_key=’chat_history’,rn return_messages=True)rnrnrag_chain = ConversationalRetrievalChain.from_llm(rn llm = llm,rn retriever = retriever,rn verbose = False,rn memory = memory,rn condense_question_prompt = condense_question_prompt_passthrough,rn combine_docs_chain_kwargs={“prompt”: prompt},rn )rnrnquestion = “What movie was Brad Pitt in?”rnanswer = rag_chain({“question”: question, “chat_history”: chat_history})[“answer”]’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e18bc89da60>)])]>

This streamlined process illustrates the core steps in building a conversational AI for movie inquiries, from data preparation to querying the chatbot. For those interested in all of the technical details and full code, please review the complete notebook.

Get started today

With the launch of Memorystore for Redis 7.2 in preview with its new vector search capabilities and LangChain integrations, Memorystore is now positioned to become a critical building block in your gen AI applications. We invite you to explore this new functionality and let us know how you’re harnessing the power of Memorystore for your innovative use cases!

Read More for the details.

2024 03 07

GCP – GKE provides fully managed kubernetes support for Elastic Cloud

We’re excited to announce a new partnership with Elastic to support their Elastic Cloud on Kubernetes (ECK) product on GKE in Autopilot mode of operation.

Customers can now take advantage of the combined benefits of GKE Autopilot and ECK for their search, observability and security needs, all running on infrastructure that provides a fully managed Kubernetes experience, dynamic just-in time scalability, and a low total cost of ownership (TCO).

What is the GKE mode of operation?

GKE Autopilot is the default and recommended mode of operation to run your applications on GKE. Autopilot fully manages your cluster for you, but still gives you access to the full Kubernetes API, and all the control you need to fine-tune your workload’s performance.

Autopilot manages your nodes and capacity for you by adding and removing nodes from your cluster in response to your workload’s changing capacity demands and characteristics, helping provide a strong experience. Simply specify the resource requests and your scale and infrastructure requirements in your podspec (for example, generic compute, CPU intense, or GPU acceleration) and GKE provisions the right infrastructure just-in-time to meet your workloads needs.

GKE Autopilot also manages your Kubernetes upgrades, helps provide a secure posture out-of-the-box, and helps us stand by our reliability commitment with one of the first-ever Pod-level service level agreements (SLA), in addition to the control-plane SLA. And since you only pay for the resources your workloads use, you don’t have to worry about wasted cluster capacity, improving costs. Because of that, and the other benefits mentioned above, Autopilot provides you with one of the best TCOs in the industry.

What is ECK?

Elastic Cloud on Kubernetes, or ECK for short, is the official Elastic Operator for Kubernetes.
ECK is a great way to run the Elastic Stack on your cloud-native Kubernetes environment, as it encompasses key operational knowledge and best practices, and streamlines the operations and use of the Elastic Stack.

In addition to automating operational and cluster administration tasks, ECK helps streamline the entire Elastic-Stack-on-Kubernetes experience. It makes upgrading, resizing clusters, and configuring various advanced functionality like cross-cluster search and replication, index lifecycle management or index snapshot policies and repositories as simple as adding a few lines of code to a resource definition.

With ECK, you can easily manage one, tens, or hundreds of Elasticsearch clusters at large scale.

Why ECK on GKE Autopilot?

With Elastic Cloud on Kubernetes running on GKE Autopilot, you gain enhanced operational efficiencies, as it removes the need to know, plan and size Kubernetes node pools to match the size of your Elastic nodes.

You have several options for where to run your Elastic Stack: using self-managed solutions like ECK on GKE Autopilot, or Elastic’s Elasticsearch Service, which is Elastic Cloud’s hosted and fully managed option.

You may want to choose ECK, for example, if you have complex networking requirements with multiple network domains and tight controls over which network domains your data traverses through. In such cases, a managed solution like Elastic Cloud might be less suitable for you. However, ECK still provides many important operational advantages, especially when paired with a fully automated Kubernetes platform such as GKE Autopilot.

For example, one customer was using Elasticsearch to provide functional search services on their product catalog and technical search services to monitor their API usage.

Consumption of these services was seasonal, but because the Elasticsearch clusters were deployed on self-managed infrastructure, they had long provisioning cycles and couldn’t easily adapt to seasonal usage.

Additionally, the customer had to deploy across several providers and regions to achieve global distribution. As such, they decided to use Kubernetes, which allowed them to apply the same standards, architecture, and tools across all their deployments and their global footprint.

To standardize the technical stack as much as possible, they needed a managed service for Kubernetes, either GKE Standard or GKE Autopilot. They opted for GKE Autopilot, whose billing model and reduced total cost of ownership (TCO), let them focus on their business rather than their infrastructure.

Migrating to GKE Autopilot also provided them better operational efficiencies, with its just-in-time global scalability, while ECK provided the operational efficiencies of a managed Elastic Stack.

Next steps — get started with GKE Autopilot and ECK

Together, the teams at GKE and Elastic are excited about this partnership, which will help joint customers serve customers across the globe with managed services. To get up and running with Autopilot quickly, check out the following resources:

Overview of GKE Autopilot Overview of ECK Overview of ECK on GKE Autopilot How to deploy ECK on GKE Autopilot – tutorial

Read More for the details.

2024 03 07

GCP – Build an enterprise developer platform on GKE for fast, reliable application delivery

Platform engineering, rooted in DevOps principles, streamlines developer experience and boosts productivity through automated infrastructure and self-service tools. In this blog, we’ll talk about how Google Kubernetes Engine (GKE) enables you to build a secure, scalable internal developer platform in Google Cloud for fast and reliable application delivery.

Adopting a modern architecture with cloud-first infrastructure

In today’s world of modern technology, staying ahead means being agile. Businesses that innovate rapidly and deliver reliably hold the key to success. Adopting a modern architecture with these key elements will power your software delivery process:

Modular systems: A good example of a modular system is a microservice-based application. Instead of one large, monolithic structure, a microservice-based application breaks down into a collection of smaller, independent services, enabling agile delivery and faster rollout of innovation.Containerized applications on Kubernetes (K8s): Containers offer a lightweight method to package and deploy applications, facilitating seamless movement across various environments. K8s platforms, such as GKE, establish a robust foundation for platform engineering through declarative configuration and automation.Modern databases at the core: Modern databases are designed to handle the massive amounts of data that businesses generate today. They offer high performance, scalability, and reliability, helping to unlock the insights hidden in your data.

Embracing a modern architecture can help you attain the agility, scalability, and security essential for success in the digital era. Outcomes are measured through key metrics, such as deployment frequency, lead time for changes, mean time to restore, and change failure rate.

Deploying an enterprise developer platform on GKE

Platform engineering centers around creating and managing internal developer platforms (IDP) for your organization. GKE, through its deep integration with the Google Cloud ecosystem, serves as an ideal foundation for an IDP. Let’s explore how GKE empowers you to build a robust platform that supports your developers.

Establish a scalable, developer-friendly Kubernetes foundation

Many organizations manage distributed locations and support multiple applications and teams. For a platform built to scale from the start, consider GKE Fleet. This management layer allows you to operate multiple Kubernetes clusters from a centralized control plane. You can efficiently manage distributed clusters with feature deployment, upgrades, and monitoring while ensuring consistency across your environments. By adding a multi-cluster Gateway and multi-cluster Service to your fleet, you can simplify cross-cluster communications, traffic management, and high availability for microservice-based applications.

Fleet-based team management also establishes logical partitions to create a multi-tenant model that aligns with your organization. The balance of collaboration and isolation can be easily achieved as developers get autonomy within their assigned scope (e.g., designated clusters, namespaces and roles), while platform engineers benefit from the centralized oversight and tooling.

Implementing GitOps-based Kubernetes management tools like Config Sync can drive automation, operational efficiency, and consistency. The GitOps workflow offers advantages like version control for tracking changes, effortless rollbacks to mitigate issues, and proactive configuration drift prevention. In addition, Config Sync can help you maintain configuration integrity and standardization across multiple clusters. Furthermore, Config Sync’s multi-repository support enables you to delegate specific configuration aspects to individual application teams, simplifying management in complex environments.

With a robust foundation that automates repetitive cluster management tasks, platform engineers gain the freedom to focus on high-value initiatives. These include developing self-service infrastructure solutions and optimizing CI/CD pipelines for applications across the fleet.

Implement a zero-trust system architecture

Platform engineers need to embrace a zero-trust mindset. Zero trust operates under the fundamental principle that no user, device, or network traffic should be inherently trusted. Therefore, each interaction attempt requires continuous authentication and authorization . As more applications adopt a microservices-based architecture, complexity and attack surface increases. In addition, traditional perimeter-based security models break down with the growing popularity of hybrid and remote working. GKE provides robust features to support the implementation of a zero-trust security strategy.

For microservices, Istio-based Anthos Service Mesh and Policy Controller combine to deliver strong, identity-based security. With mutual TLS and microservice segmentation at scale, you can establish a layered defense across workloads and clusters. Centralized policy enforcement ensures consistency, while audit logging enhances compliance. This comprehensive approach not only aligns with zero-trust principles but also enforces continuous authentication and authorization, alongside fine-grained security controls and least-privilege access for your services.

The GKE Security Posture dashboard offers continuous visibility, actionable insights, trend analysis and compliance monitoring for your infrastructure and workloads, providing a centralized dashboard with clear remediate guidance. It automatically scans your GKE clusters to identify potential misconfigurations, vulnerabilities, and policy violations across workloads and nodes. You can also scan workloads in the runtime to mitigate OS and language package vulnerability risk. In addition, GKE Policy Controller, which is based on the open-source Open Policy Agent Gatekeeper project, comes with 14 out-of-the-box policy bundles for common compliance and security controls. It assists you in adhering to security standards and best practices (e.g., CIS benchmarks) by providing proactive enforcement of policies and preventative guardrails across your clusters.

Platform engineering empowers developers by streamlining the software development lifecycle (SDLC), fostering rapid and secure software delivery. Google Cloud offers a robust suite of products and best practices to safeguard every stage of your development process. GKE’s tight integration with Cloud Build, Cloud Deploy, Cloud Armor, Secret Manager, and other Google Cloud services provides a comprehensive framework for ensuring supply chain integrity from code to production.

Embed cloud FinOps for core platform efficiency

Enterprises turn to cloud FinOps to tackle unpredictable cloud costs and better align tech teams with financial objectives. While cost savings are a major motivator, FinOps also prioritizes speed, enabling developers to innovate within established financial controls. Platform engineers can leverage GKE features to optimize costs from the outset. Workload rightsizing, demand-based downscaling, efficient cluster bin packing, and maximizing discount coverage are key things you can focus on for Kubernetes cost optimization.

For cost visibility, proper labeling of GKE resources is crucial for fine-grained cost tracking against projects, teams, or even specific applications. With GKE fleet and team scopes, you can label GKE resources automatically for cost transparency. GKE’s powerful 4-way autoscaling delivers direct cost savings by ensuring you only pay for the capacity you truly need. Additionally, creating platform policies that leverage Spot VMs for less critical workloads, can significantly reduce your cloud bill.

FinOps isn’t about setting it and forgetting it. It’s an ongoing journey of monitoring, reporting, and right-sizing to ensure your cloud investments stay healthy. With GKE, you’re not left in the dark after launch. It provides right-sizing recommendations to optimize your clusters and workloads, plus centralized views of resource usage across your fleet and teams. Through continuous optimization, you’ll build a developer platform that’s both efficient and scalable for the long term. FinOps simply gives you the guardrails and processes to make this growth sustainable, keeping you on track with your financial goals.

Build globally scalable workloads with managed database

Containers and databases form the foundation of cloud-native workloads like serverless web apps, real-time fraud detection, gen AI use cases, and more. GKE simplifies your technology stack, offering integration with various Google Cloud database options – AlloyDB, Spanner, Firestore, Bigtable, and others. This allows you to choose the ideal database for your application’s needs while enjoying streamlined operations with GKE, including automated scaling, high availability, and backups.

Beyond what we’ve discussed, several other crucial elements contribute to a successful internal developer platform. This includes robust application CI/CD pipelines, comprehensive logging, and monitoring solutions. To visualize how these pieces fit together, take a look at Figure 1 below. It illustrates the key components for building a highly available, multi-region IDP within the Google Cloud ecosystem.

We are also excited to announce the preview release of our GKE Enterprise reference architecture solution guide. This guide is designed to assist you in deploying an internal developer platform using GKE and Google Cloud solutions.

To learn more about using GKE to build your enterprise developer platform, contact our sales team about our hands-on platform engineering workshop. Additionally, you can sign up for a 90-day GKE Enterprise trial – easily accessed through your Google Cloud console – to dive deep into all the powerful features GKE offers.

Read More for the details.

2024 03 07

GCP – eBay accelerates its AI-driven recommendation engine with Vertex AI

At any given time, eBay’s marketplace has more than 1.9 billion listings from millions of different sellers, running the gamut from the latest gadgets to the most obscure collectibles. Using deep machine learning (ML), AI-driven search capabilities, and natural language processing (NLP), we have streamlined the user experience, making it easier to explore and find relevant products on our platform.

By implementing Google Cloud Vertex AI and Vector Search, we accelerated engineering velocity and generated more advanced models to deliver better results to our customers. With these solutions in place, we can iterate four times faster and have put new features in motion that we otherwise wouldn’t have been able to launch, resulting in reduced friction for both buyers and sellers on our platform.

Bridging the gap between on-premises and public cloud

As one of the largest online global marketplaces, eBay has spent more than 28 years analyzing, learning from, and evolving alongside the customer journey. In our mission to reinvent the future of e-commerce for enthusiasts, we’ve built a robust in-house infrastructure for our ML processes using state-of-the-art solutions, including an ML GPU cluster, Vector Similarity Engine, and many other innovations described in the eBay Tech Blog. Furthermore, we represent the Ads Recommendations team, or “Recs” team for short, an innovative and mature recommender systems team (RecSys 2016 paper, RecSys 2021 paper, eBay Tech Blog 2022 post, eBay Tech Blog Jan 2023 post, eBay Tech Blog Oct 2023 post). In order to accelerate innovation and deliver velocity even further, we began to explore how to augment specific components in our ML workflow with additional capabilities from Google Cloud.

We turned to a third-party cloud provider to support our on-premises technologies, keeping three main criteria in mind: advanced AI capabilities, limitless scalability, as well as a mature and reliable infrastructure stack. We focused our efforts on Vertex AI, a platform that lets us train and deploy ML models, as well as quickly deploy sets of embeddings based on textual information of item and user entities with Vector Search. Given the maturity of our Recs team’s ML infrastructure, we did not want or need to replace our whole ML flow all at once. Instead, we appreciated the flexibility of Vertex AI, which allowed us to incorporate specific training and prediction components of our ML flow using Google Cloud in a plug-and-play manner.

We explored Vector Search for deep-learning-based semantic embeddings for eBay listings. Vector Search can scour billions of semantically similar or semantically related items, helping our engineers iterate rapidly and give customers quicker access to the products they’re seeking. With this tool, our use of embeddings is not limited to words or text. We can generate semantic embeddings for many kinds of data, including images, audio, video, and user preferences.

Our ML engineers develop many ML embedding models as they iterate on their hypotheses for improved features. Any time there is a change in the ML model, even when it’s a small feature or a new version, the vector space of embeddings changes and therefore a new vector search index needs to be created. We wanted to highlight that the capability of Google Cloud Vector Search for fast onboarding of new embeddings indexes, in a self-service capability, is very important for rapid ML model iteration. Rapid ML model iteration is ultimately what brings value to the end user by an improved recommendations experience.

As an applications team, we do not own all levels of the production infrastructure stack, particularly the low-level network connectivity component. That’s where the Google network infrastructure team stepped in to collaborate with eBay’s network infrastructure team to build a hybrid interconnect link between eBay’s data center and Google Cloud. The Google Customer team was particularly helpful enabling this component by connecting our Recs team to relevant stakeholders across Google and eBay, and even connected us to teams at eBay we have not previously worked with. Once the network connection was established, we began experimenting with different setups and APIs to create a framework that would meet our team’s and our customers’ needs.

Test, rinse, repeat

To create a more robust and intuitive recommendations engine, we began by building a custom model on Vertex AI and indexing our embeddings into Vector Search, iterating and testing to evaluate the system’s effectiveness (see Figure 1). With our hybrid interconnect link in place, we took the same index that we used with our in-house system, put it on our Google Cloud infrastructure, and ran an apples-to-apples comparison. Since the data was the same, we expected a neutral result — and that’s what we got with our initial A/B test. But while the real-time performance was on par with what we had previously achieved, the onboarding process was much smoother and faster than our on-prem equivalent.

Figure 1: ML engineering architecture diagram for the Recs team in eBay for a) eBay on-prem only components, and b) including the addition of Google Cloud Vertex AI and Vector Search components.

We used to spend a lot of our engineering cycles on creating benchmarks for our in-house models, and onboarding new indexes would take months. Now, with Vector Search, we can complete this process in two weeks. After optimizing and tweaking the model, we ran another A/B test with Vector Search that showed a statistically significant positive revenue uplift, proving the long-term value of our Google Cloud implementation. We launched this feature, which demonstrated positive return on investment for ads revenue lift and cloud costs, to serve production traffic with several thousand transactions per second. Throughout this process, we have worked hand in hand with Google Cloud, to create the best customer experience possible.

For example, eBay has SLA latency requirements for real-time applications that serve recommendations. The Google Cloud team listened to our requirements and helped us refine our Vector Search configuration to hit a real-time read latency of less than 4ms at 95%, as measured server-side on the Google Cloud dashboard for Vector Search.

Creating next-generation shopping experiences with Google Cloud

eBay has been using tech-enabled recommendation generation systems to create better customer experiences for years, but “good enough” has never been part of our vocabulary. We’re always looking for innovative ways to use new technologies to improve our platform and bolstering our in-house system with Google Cloud is helping us get there.

The advanced AI capabilities and excellent client support from Google Cloud have enabled our team to accelerate iterations of ML workstreams, ultimately delivering an enhanced shopping experience for eBay buyers. Our platform can now better analyze, anticipate, and deliver more relevant recommendations, simplifying the process of finding and purchasing items online.

As we look ahead and continue to explore more generative AI tools from Google Cloud, such as Gemini, we’re excited to see what the next iteration of eBay will deliver. With Google Cloud, we have created the right hybrid cloud environment with the AI resources needed to elevate the buyer’s journey beyond what was previously possible.

Read More for the details.

2024 03 07

GCP – Move-in ready Kubernetes security with GKE Autopilot

Creating and managing the security of Kubernetes clusters is a lot like building or renovating a house. Both require making concessions across many areas when trying to find a balance between security, usability and maintainability.

For homeowners, these choices include utility and aesthetic options, such as installing floors, fixtures, benchtops, and tiles. They also include security decisions: what types of doors, locks, lights, cameras, and sensors should you install? How should they be connected, monitored, and maintained? Who do you call when there’s a problem?

Kubernetes clusters are similar: Each cluster is like a house you’re constructing. The initial security decisions you make determine how well you can detect and respond to attacks. Wouldn’t it be nicer if all of those decisions were made by experts who stuck around and upgraded your house when the technology improved?

This is where GKE Autopilot comes in. We use Google Cloud’s deep Kubernetes security expertise to configure your clusters to be move-in ready for your production workloads. Autopilot is a great example of Google Cloud’s shared fate operating model, where we work to be proactive partners in helping you to achieve your desired security outcomes on our platform. With Google Cloud’s security tools we give you the means to run an entire city of Kubernetes clusters more securely.

The work we do to configure cluster-level security depends on which mode you choose for the cluster. In Standard mode, Google Cloud handles all of the Kubernetes security configuration, but leaves many node configuration decisions and in-cluster policy configuration up to you. In Autopilot mode, we fully configure nodes, node pools, and in-cluster policy for you according to security best practices, allowing you to focus on workload-specific security.

Say goodbye to node security responsibilities

By fully managing the nodes in Autopilot mode, Google Cloud handles the complexity of the node security surface, while still letting you use the flexible Kubernetes API to fine-tune the configuration.

This approach can help solve security challenges, including:

Keeping nodes patched, balancing speed of patching with availabilityPreventing overprivileged containers that create risk of node breakoutsPreventing unauthorized access and changes to nodes via SSH, privileged volume types, webhooks, and the certificates APIProtecting privileged namespaces such as kube-system from unauthorized accessAllowlisting privileges required for common security tools from our Autopilot partner workloads list

The benefits of this shift in responsibility extend beyond stopping insecure practices — it also makes configuring some important security features much easier. On Autopilot, enabling GKE Sandbox is simple because we manage the nodes. When you set the runtimeClass option in your podspec, Autopilot figures out the correct node configuration to run the sandboxed container for you. Additionally, GKE Security Posture vulnerability scanning is on by default.

By taking on the responsibility for node security, we continue to make usability improvements while also tightening host security over time. We add new defenses and increasingly sophisticated detection as technology improves, without you needing to migrate node settings or configuration.

Built-in cluster policy

In-cluster policy is responsible for making sure Kubernetes API objects like Pods and Services are well-configured, don’t create security risks, and are compliant with your organization’s security best practices. In particular, you need to avoid pods that have the ability to “break out” to the node and access privileged agents or interfere with other workloads.

Typically this work is done by installing tooling like Policy Controller or a third-party product like Gatekeeper. However, even managed policy introduces some overhead:

Deciding which tooling and vendor you preferIdentifying which controls are appropriate for your organization’s containersCreating exemptions for privileged security and monitoring toolingCreating self-serve exemptions for developers needing to debug containers

Autopilot removes that work by installing policies that meet security best practices but allow the majority of workloads to run without modification. We work with a vetted set of third-party security partners to ensure their tools and monitoring work out of the box without customer effort. We have built in some self-service security exemptions that keep workloads protected by default but still allow developers to debug problems.

Autopilot’s built-in policy implements 93% of the Kubernetes Baseline security standard and 60% of the Restricted standard. While most users will find this to be a good balance of security and usability, if you have additional security or compliance requirements, you can address those using an additional policy tool as mentioned above.

The benefits of this policy aren’t just more compliance checkboxes. From Jan 2022 to December 2023, the default Autopilot configuration protected clusters against 62% of the container breakout vulnerabilities reported through Google’s kCTF VRP and kernelCTF.

GKE security configuration done right

Autopilot helps prevent insecure configuration at the GKE API layer and simplifies policy concerns by standardizing the configuration. This is the building code that reduces the risk of electrical fires.

Our goal with GKE has always been to build in security by default, and over the years we’ve added always-on security in multiple parts of the product. Some highlights include enabling Shielded Nodes and Auto-upgrades by default to protect against node tampering and to keep clusters patched.

As we introduced those changes, we retained the option to disable some of these security features on GKE Standard, in order to maintain backwards compatibility. This adds overhead for cluster admins: to keep clusters secure at scale, you’d need to configure Organization Policy Service or Terraform validation for each feature.

With Autopilot, we started fresh and removed all of those less secure options. Strong security features including Workload Identity, auto-upgrades, and Shielded Nodes are always on and can’t be turned off. As a result of ensuring that all new clusters are Autopilot clusters, your security policy management is simplified. Additionally, moving security feature configuration into the workload and out of the node pool API provides a consolidated surface on which to enforce policy.

Try it out today

Autopilot is the default choice for all new GKE clusters, and gives you move-in ready security. It sets up a baseline of in-cluster security policy for you, and allows common, trusted security tools. It simplifies enforcing cluster security practices by moving responsibility for node security to Google, protecting the node and system components, and removing legacy security options. Try it out by creating a cluster today.

Read More for the details.

2024 03 06

GCP – Build generative AI applications with similarity search in Cloud SQL for MySQL

Generative AI is transforming application development across industries as developers build brand new user experiences that weren’t possible before. We’re already seeing customers like Linear build amazing new AI-powered applications with Google Cloud databases. Recently, we announced that you can now also use Cloud SQL for MySQL to perform similarity searches by indexing and searching for vector embeddings generated by your favorite large language model (LLM). Cloud SQL now allows you to store vectors in the same Cloud SQL for MySQL instance you’re already using, and then search against your vector store using either an exact nearest neighbor (KNN) or approximate nearest neighbor (ANN) search.

Vector search in Cloud SQL for MySQL is built on Google’s open-source ScaNN libraries, which support multiple ANN index types: Tree-AH, Tree-SQ, and Brute Force with autotuning. Cloud SQL supports multiple distance measures, such as cosine, L2, and dot_product. Combining your vector store with your operational data allows you to create more meaningful and relevant experiences by leveraging vector search augmented with real-time data. Let’s dig in to how you can use this capability, currently in preview! Fill out our preview form here to request access.

Let’s imagine you’re building a new website for a library system that helps library patrons pick out e-books that they might enjoy. Users will tell you an e-book they like, and then your website finds other e-books that they might also enjoy. You’ve just upgraded the backend of your website to Cloud SQL for MySQL’s new Enterprise Plus edition, because your library serves customers in a large city who use library services around-the-clock, and you wanted to make sure your website is taking advantage of Enterprise Plus edition’s 99.99% SLA for high availability and up to 3x higher read throughput.

The journey begins by enabling a new MySQL flag called cloudsql_vector. In order to use similarity search, you’ll need to turn your data into embeddings (vectors) and store these vectors in the catalog table. The next sections are easy-to-follow steps that guide you through:

Getting embeddingsStoring and indexing those embeddingsPerforming similarity searches

Let’s assume that your catalog table is called library_catalog and includes details like book titles, descriptions, and copies available. You want to update your library’s catalog to include vector embeddings for all items currently in circulation. You can add a new column to store these vectors — let’s call the column item_embedding, with vector data type. To do this, you would update your table like this.

code_block
<ListValue: [StructValue([(‘code’, ‘ALTER TABLE library_catalog ADD COLUMN item_embedding VECTOR(3)rnUSING VARBINARY;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8f48244b80>)])]>

Next, generate a vector embedding for each item currently in your catalog. Use your favorite LLM for this — for example, you could use Vertex AI’s pre-trained text embeddings model to create embeddings based off of the item’s description in your catalog. The below example uses the textembedding-gecko@001 model.

code_block
<ListValue: [StructValue([(‘code’, ‘from vertexai.language_models import TextEmbeddingModelrnrnrndef text_embedding() -> list:rn “””Text embedding with a Large Language Model.”””rn model = TextEmbeddingModel.from_pretrained(“textembedding-gecko@001”)rn embeddings = model.get_embeddings([“What is life?”])rn for embedding in embeddings:rn vector = embedding.valuesrn print(f”Length of Embedding Vector: {len(vector)}”)rn return vectorrnrnrnif __name__ == “__main__”:rn text_embedding()’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8f48244460>)])]>

Then, update that item’s row to store its new vector embedding.

code_block
<ListValue: [StructValue([(‘code’, “// Replace ‘[x,y,z]’ with the vector embedding returned by the rn// model api call from the above python examplernUPDATE library_catalog rnSET item_embedding=string_to_vector(‘[x,y,z]’) rnWHERE id=1;”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8f48244fa0>)])]>

Once you’ve updated all of the items in your catalog, you can add an index to your catalog to make it easy to perform similarity searches against items in the catalog. We support three different index types:

Tree-SQ: This is the default algorithm as it requires reduced memory and supports persistence across restarts minimizing operational toil. It also features slightly better recall due to reduced compression compared to Tree-AH, but with some additional compute cost.Tree-AH: Ideal for applications that prioritize speed and compute efficiency at the cost of additional memory with slight operational overhead.Brute-Force: Suitable when the distance measure needs to be precise at the cost of speed.

You could also perform brute force searches on your vectors to get the closest nearest neighbor (KNN) by not adding an index. By default, we’ll create a Tree-SQ index.

code_block
<ListValue: [StructValue([(‘code’, “CALL mysql.create_vector_index(rn’vectorIndex’, ‘library_catalog’, ‘item_embedding’,rn’table_size=<num of rows>’)”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8f482446a0>)])]>

Now, when a customer comes to the library website, they get suggestions of books similar to their favorite book, combined with a filter on what items are currently available for check out. For example, your website could run this query to get books similar to one that the customer says they like:

code_block
<ListValue: [StructValue([(‘code’, “// Replace ‘[x,y,z]’ with the vector embedding for the rn// query / search terms returned by the model api call rn// from the above python examplernSELECT book_title, num_available rnFROM library_catalog rnWHERE NEAREST(item_embedding) TO (string_to_vector(‘[x,y,z]’));”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8f482448b0>)])]>

If you wanted to get the exact nearest items in your catalog, you could instead search for:

code_block
<ListValue: [StructValue([(‘code’, “SELECT book_title, num_available, dist, rn vector_distance(item_embedding, string_to_vector(‘[x,y,z]’)) rnFROM books rnWHERE num_available > 0 rnORDER BY dist rnLIMIT 10;”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8f482443d0>)])]>

And this is just the start. Watch the video below to see this in action – you can follow this example, or customize for retail, healthcare, financial industries, and more. Interested in trying the preview? Fill out this form and we’ll be in touch!

Read More for the details.

2024 03 06

GCP – DZ BANK unlocks 70% toil savings and 90% cost savings with a Cloud Run-first approach

Editor’s note: DZ BANK is the second largest bank by assets in Germany. In this post, Cloud Engineer Tim Harpe from DZ BANK shares how migrating to Google Cloud resulted in spectacular efficiency gains and cost savings.

DZ BANK chose Google Cloud to accelerate its digital transformation because Google Cloud offers cutting-edge technology, top-tier expertise, and meets the rigorous security and compliance standards demanded by the financial sector. In a short amount of time, we have containerized and migrated some of our most business-critical applications to Google Cloud and Cloud Run, achieving 70% toil reduction, 90% cost savings, and unlocking new capabilities through integrations with leading services like BigQuery.

As a result of this success, we’re proud to be recognized as one of the winners of the Google Cloud Customer Award 2023 in the Financial Services Industry category!

Our modernization journey

As a 150 year old institution, DZ BANK has a lot of legacy infrastructure, much of it still running on-premises as virtual machines (VMs), with over 55,000 CPUs. On-prem VMs require a lot of effort to manage, and they’re expensive. A few years ago, we decided to modernize our stack to reduce our overhead, increase developer velocity, and strengthen our competitiveness.

Around 90% of our application stack is developed in-house and managed by us. To modernize each app, we first containerize it and then deploy it on either Cloud Run or Google Kubernetes Engine (GKE).

Cloud Run is a great fit for many of our apps, especially ones that experience spikes in traffic. Since Cloud Run can quickly scale to thousands of containers, we don’t need to provision for peak loads. We also have many smaller internal workloads that are not used during off hours. Now, we can scale them on request with Cloud Run, scale to zero when they’re not being used, and only pay for the infrastructure when it’s serving customer traffic. In the past, we separated each of these workloads into their own individual VMs that ran 24/7.

For apps where we want more control over the underlying hardware, we put them on GKE. The first app we migrated to Google Cloud was newly developed in-house to calculate customer risk and creditworthiness. It ran every night and scaled up to 60,000 CPUs. GKE enabled us to fine-tune scalability and hardware settings to exactly meet our needs.

Innovating for the future

Moving forward, DZ BANK is taking a Cloud Run-first approach with all newly-developed apps to help us innovate for the future. One initiative we have within the bank is to be more sustainable and a more responsible citizen of the world and we’re now considering ESG (Environmental, Social and Governance) criteria in our credit processes. As part of this commitment, we developed a new app, running on Cloud Run, which evaluates creditworthiness based on ESG scores assigned to businesses.

Selecting Cloud Run has accelerated our development process. Cloud Run’s built-in integrations for Cloud SQL, Secret Manager, and Cloud Load Balancing allowed us to create a simple architecture. Running on Google Cloud also makes it easy to access our custom data lake, which is powered by BigQuery.

From the perspective of our platform team, we also appreciate Cloud Run’s approach to simplifying security. As the central bank for cooperative banks in Germany, DZ BANK must continuously meet stringent compliance and security standards. Cloud Run offers a strong foundation with encryption and built-in access control, plus integrates with the broader Google Cloud ecosystem of security products like customer-managed encryption keys (CMEK) and VPC Service Controls. This enables us to maintain control over our data while minimizing the risk of exfiltration.

Results and what’s next

We are now three years into our modernization journey with Google Cloud, and we’ve seen significant benefits. Running containers on Cloud Run instead of on-prem VMs eliminates OS and infrastructure maintenance, delivering a 70% toil reduction. In addition, the pay-per-use pricing model has reduced our infrastructure costs by 90%.

We look forward to further partnering with Google Cloud in our journey to further modernize our fleet while also building out new innovative apps for the future.

This blog post received contributions from various people. In particular, we would like to thank Latif Ajouaoui for strategic insights, Yuriy Babenko for technical oversight, and Wietse Venema for reviews.

Read More for the details.

2024 03 06

GCP – Running AI on fully managed GKE, now with new compute options, pricing and resource reservations

Deploy GPU workloads in Autopilot

Kubernetes is a popular way to run AI workloads like training, and large language model (LLM) serving, including our new open model Gemma. Google Kubernetes Engine (GKE) in Autopilot mode provides a fully managed Kubernetes platform that offers the power and flexibility of Kubernetes but without the need to worry about compute nodes, so you can focus on delivering your own business value through AI. Today we’re excited to announce the new Accelerator compute class in Autopilot that improves GPU support with resource reservation capabilities, and a lower price for most GPU workloads (you can opt in to this pricing today, and eventually all workloads will be migrated). In addition, a new Performance compute class enables high-performance workloads to run on Autopilot mode at scale. Both compute classes also have more available ephemeral storage right on the boot disk, giving you more room to download AI models, etc before needing to configure additional storage via generic ephemeral volumes. With these enhancements, using our fully managed Kubernetes platform for inference and other compute-intensive workloads is even better.

With GKE running in Autopilot mode you avoid the need to specify and provision nodes upfront, and can focus on building the workload and creating your own business value. As a fully managed platform, once your workload is built you can run it with less operational overhead. Today’s news sweetens the deal even further.

Lower-priced GPUs, better discounts

We’re lowering the price for the majority of GPU workloads running on GKE in Autopilot mode, and moving to a new billing model to improve compatibility with other products and experiences in Google Cloud. Now, you can move workloads between the Standard and Autopilot modes of GKE, as well as between Compute Engine VMs and keep your existing Reservations and committed use discounts.

When you enable the new pricing model (by specifying the Accelerate compute class as illustrated in the code sample below), resources are billed based on Compute Engine VM resources, plus a premium for the fully managed experience. Today the new pricing model is an opt-in; after April 30, versions of GKE will be released that automatically migrate GPU workloads to this new model. The price for most workloads resulting from these changes is lower (workloads on NVIDIA T4 GPUs with less than 2 vCPU per GPU see a slight price increase).

Here’s a comparison of the hourly prices for several workload sizes in the us-central1 region for GPU, CPU and Memory resources (storage additional):

GPU

Pod Resource Requests

VM resources

Old price (GPU Pod)

New price (Accelerator Compute Class Pod)

NVIDIA A100 80GB

1 GPU

11 vCPU

148 GB memory

1 GPU

12 vCPU

170 GB memory

$6.09

$5.59

NVIDIA A100 40GB

1 GPU

11 vCPU

74 GB memory

1 GPU

12 vCPU

85 GB memory

$4.46

$4.09

NVIDIA L4

1 GPU

11 vCPU

40 GB memory

1 GPU

12 vCPU

48 GB memory

$1.61

$1.12

NVIDIA T4

1 GPU

1 vCPU

1 GB memory

1 GPU

2 vCPU

2 GB memory

$0.46

$0.47

NVIDIA T4

1 GPU

20 vCPU

40 GB memory

1 GPU

22 vCPU

48 GB memory

$1.96

$1.37

When using the Accelerator compute class, the workload is billed for (and can utilize) the complete node VM capacity, including bursting into resources allocated for system Pods.

To opt in to these changes today, upgrade to version 1.28.6-gke.1095000 or later, and add the compute-class selector to your existing GPU workloads, like so:

code_block
<ListValue: [StructValue([(‘code’, ‘apiVersion: v1rnkind: Podrnmetadata:rn name: my-gpu-podrnspec:rn nodeSelector:rn cloud.google.com/compute-class: “Accelerator”rn cloud.google.com/gke-accelerator: nvidia-l4rn containers:rn – name: my-gpu-containerrn image: nvidia/cuda:11.0.3-runtime-ubuntu20.04rn command: [“/bin/bash”, “-c”, “–“]rn args: [“while true; do sleep 600; done;”]rn resources:rn limits:rn nvidia.com/gpu: 1’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e15ae8c5490>)])]>

High-performance CPU resources

If you need dedicated CPU resources for your workloads, Autopilot now takes a similar approach as it does with GPUs. You can now run GKE Autopilot workloads on Compute Engine’s main machine families including the new C3, C3D and H3 machines, as well as C2, C2D, and more! These resources can be requested as part of the Performance compute class. Here’s an example:

code_block
<ListValue: [StructValue([(‘code’, ‘apiVersion: v1rnkind: Podrnmetadata:rn name: performance-podrnspec:rn nodeSelector:rn cloud.google.com/compute-class: Performancern cloud.google.com/machine-family: c3rn containers:rn – name: my-containerrn image: “k8s.gcr.io/pause”rn resource:rn requests:rn cpu: 20rn memory: “100Gi”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e15ae8c5190>)])]>

Reservations

Reservations can help ensure that your project has resources for future increases in demand, but previously you weren’t able to consume reservations in Autopilot mode. Good news, now you can! Using reservations is a breeze, and they can be used with both GPUs (when you opt in to the new model), and high-performance CPUs.

Larger boot disks

While GKE allows you to mount multiple persistent volumes to a container, each of which can be up to 64TB on any path in your container, offering larger boot disks for Pods lets you use ephemeral storage without mounting a separate volume. When using either the Performance or Accelerator compute-class labels above, your workload can now consume up to 122GiB of ephemeral storage. Need more? Persistent disks can be mounted to expand further.

Hardware when you need it, simplicity when you don’t

You may be wondering, where do regular Autopilot Pods fit in with this new model? Think about it like this: if you have a workload that requires dedicated, high-performance CPU hardware such as that offered by C3 machines, you can annotate just that workload with those requirements using the node selector described above.

But what about supporting workloads that run alongside the primary ones but don’t need the same computing power? This is where Autopilot mode really excels: by default, all those other workloads will continue to run on the standard Pod model, offering great price/performance for workloads that don’t have high-performance CPU needs. In Autopilot mode, just annotate those workloads that need specialized hardware, like a specific GPU or machine family, and we’ll do the rest. Leave the other workloads blank, and rest assured that they won’t accidentally run on the specialized hardware. This way, you get the best value out of each of your execution environments: broadly applicable defaults in Autopilot, and specialized hardware when you need it.

Here’s what our customers are saying

“At Contextual AI, we are building the next generation of Retrieval Augmented Generation (RAG). Contextual Language Models (CLMs) are end-to-end optimized to address pain points of RAG 1.0 and help enterprise customers build production-grade workflows. To achieve this, we rely on GKE Autopilot, a fully managed Kubernetes service that handles the complexity of running our application. With GKE Autopilot, we can easily scale our pods, optimize our resource utilization, and ensure the security and availability of our nodes. We also take advantage of the new billing models that offer more cost-effective GPUs for our inference tasks, while using regular Autopilot pods for our non-GPU services. We are excited to use GKE Autopilot to power CLMs while saving us money and improving our performance.” – Soumitr Pandey, Member of Technical Staff, Contextual AI

“We opted for GKE Autopilot for our ML infrastructure as it empowers our team to concentrate on research and development instead of cluster management. This approach not only automates resource provisioning throughout the entire regional cluster but also streamlines our operations. The latest enhancements in Autopilot are particularly exciting. They not only provide a unified resource pool but also introduce reservation capabilities, giving us greater control in meeting project deadlines.” – Jon Mason, CEO, Hotspring

To learn more about all the new features that we launched for Autopilot this week, check out the following resources:

Consume capacity reservations in Autopilot clusters

Run CPU-intensive workloads with optimal performance

AI/ML orchestration on GKE

Read More for the details.

2024 03 06

GCP – Creating Spring-based gRPC microservices managed by Prometheus and Grafana

gRPC is a modern open-source, high-performance Remote Procedure Call (RPC) framework that plays a critical role in efficiently connecting microservices. Spring, meanwhile, is a popular framework that provides a flexible, modular programming model, comprehensive infrastructure integrations with a rich ecosystem and community support. Imagine if you could use these two popular technologies together? In this blog, we explore integrating gRPC with Spring-based microservices. Then, we also show you how to leverage Google Cloud Managed Service for Prometheus and Grafana for effective monitoring and observability, enabled by new gRFC A66 metrics.

Integrate gRPC with Spring Boot

There are a few open source efforts that provide the integration between gRPC and Spring; one of them is grpc-ecosystem/grpc-spring. Essentially, it’s a Spring Boot module to help set up the generated gRPC client/server by providing Spring style, declarative annotations and configurations.

Briefly, to create gRPC/Spring microservices, the author needs to:

Create a Spring Boot application

Add the grpc-spring-boot-starter (provided by grpc-ecosystem/grpc-spring) as a dependency

Follow the gRPC development guide to

Define the service proto

Generate the gRPC(Java) artifacts and add them as dependencies

Implement the gRPC(Java) service by implementing the generated interface

Annotate the service implementation with @GrpcService

Inject the client with @GrpcClient

For detailed documentation, please refer to: https://grpc-ecosystem.github.io/grpc-spring

How the observability data is produced and collected

With the Spring integration, the gRPC library collects the metrics defined in the gRFC A66 specification. And later on, it can be configured to be exposed and scraped by a monitoring backend, such as Prometheus.

code_block
<ListValue: [StructValue([(‘code’, ‘# Port serves the monitoring traffic.rnserver.port=8080rn# Port serves the gRPC traffic.rngrpc.server.port=9091rn# Expose the prometheus metrics via the monitoring port.rn# By default, expose on `/actuator/prometheus`.rnmanagement.endpoints.web.exposure.include=prometheus’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e15af10e670>)])]>

Managed Service for Prometheus is Google Cloud’s fully managed, multi-cloud, cross-project solution for Prometheus metrics. We can easily configure it to scrape the Prometheus metrics from the URL we set on the gRPC/Spring microservices (/actuator/prometheus demonstrated in the above configuration).

How the observability data is consumed

Once the gRPC metrics are collected by Managed Service for Prometheus, they can be queried not only via Cloud Monitoring Metrics Explorer, but also by third-party visualization tools such as Grafana using PromQL. We provide a predefined Grafana dashboard for gRFC A66 metrics under grpc-ecosystem/grpc-spring github repo, which you can easily import to Grafana and customize.

Connecting everything together

The grpc-ecosystem/grpc-spring repo provides a concrete example to demonstrate the end-to-end user experience, including following components:

A gRPC/Spring microservice as the client

A gRPC/Spring microservice as the server

The overall architecture looks like this:

The workflow proceeds as follows:

1. The client makes gRPC calls to the server, with APIs supporting the following semantics:

Unary RPC

Client Streaming RPC

Server Streaming RPC

Bidirectional Streaming RPC

2. The client keeps making RPC calls until it reaches a satisfactory number of queries per second (QPS).

3. The server randomly injects a small percentage of errors, for demonstration purposes.

4. Finally, both the client and server enable gRPC A66 metrics generation and expose them with Prometheus.

Then, to create a Prometheus data source in Grafana, follow the Grafana / Managed Prometheus user guide.

The example also provides a gRPC A66 metrics dashboard that can be imported to Grafana.

Getting started

To learn more about gRPC/Spring integration, please check out github.com/grpc-ecosystem/grpc-spring.

To learn more about gRPC/Spring integration with Managed Service for Prometheus example, please see github.com/grpc-ecosystem/grpc-spring/examples/grpc-observability.

Read More for the details.

2024 03 06

GCP – Unify customer and partner data with the new entity resolution framework in BigQuery

Announcing the BigQuery entity resolution framework
In today’s data-driven world, fragmented information can paint a blurry picture of your users and customers. Connecting the dots between disparate records to reveal a unified identity is a common challenge. Manual data matching is error-prone, time-consuming, and does not scale.

That’s where entity resolution can provide critical value. Whether it’s stitching together a customer’s purchase history across platforms or identifying fraudulent activity hidden within duplicate accounts, entity resolution unlocks the true potential of your data to give you a unified view of who and what matters most.

Match records without moving or copying data
The BigQuery entity resolution framework allows you to integrate with the identity provider of your choice using standard SQL queries. BigQuery customers can now resolve entities in place without invoking data transfer fees or managing ETL jobs. Identity providers can provide their identity graphs as a service on Google Cloud Marketplace without revealing their matching logic or identity graphs to end users.

The BigQuery entity resolution framework uses remote function calls to match your data in an identity provider’s environment. Your data does not need to be copied or moved during this process as shown here:

The end user grants the identity provider’s service account read access to their input dataset and write access to their output dataset.

The user calls the remote function that matches their input data with the provider’s identity graph data. Matching parameters are passed to the provider with the remote function.

The provider’s service account reads the input dataset and processes it.

The provider’s service account writes the entity resolution results to the user’s output dataset.

Why use entity resolution?
The BigQuery entity resolution framework benefits a wide range of industries and use cases, including:

Marketing: Enhance customer segmentation and targeting by clustering customer profiles across channels.
Financial services: Identify fraudulent transactions and customer churn by accurately linking financial records.
Retail: Gain a holistic view of customer behavior by deduplicating purchase records across platforms.
Healthcare: Improve patient care by unifying medical records from disparate sources.
Data sharing: Prepare data for use in BigQuery data clean rooms, which allows organizations to share data in low-trust environments.

Entity resolution pricing
The BigQuery entity resolution framework does not incur additional storage or compute costs beyond any fees charged by the identity provider for use of their service. Identity providers pay no additional costs beyond the storage and compute required to implement and run their entity resolution service. The framework is available in all BigQuery compute models and its use is not restricted by edition.

What our partners say about the BigQuery entity resolution framework
We’ve worked closely with entity resolution providers to design our framework. Here’s what they have to say:

“Entity Resolution on BigQuery is truly a game changer that greatly enhances data connectivity while minimizing data movement. Now Google Cloud clients can access an extensible identity framework that spans data warehouses, clean rooms and AI; and marketers can extend their custom data pipelines with a consistent enterprise identity across LiveRamp’s Data Collaboration Platform services. The result: better customer understanding and measurement, and enriched marketing signals to guide brand success.” – Erin Boelkens, VP of Product, LiveRamp

“TransUnion’s identity resolution unifies customer data and improves its hygiene through deduplication, verification, and correction. With Entity Resolution on Google Cloud and TransUnion’s integration, data engineering teams can reduce setup and ongoing management tasks while making consumer identity ready for insights, audience building, and activation.” – Ryan Engle VP of Identity Solutions, Credit Marketing, and Platform Integrations, TransUnion

Take the next step
If you are an identity provider and want to offer your identity resolution service to Google Cloud customers, you can get started today using the BigQuery entity resolution guide. For additional help, ask your Google Cloud account manager to reach out to the Built with BigQuery team.

The Built with BigQuery team helps Independent Software Vendors (ISVs) and data providers build innovative applications with Google Data Cloud. Participating companies can:

Accelerate product design and architecture through access to designated experts who can provide insight into key use cases, architectural patterns, and best practices
Amplify success with joint marketing programs to drive awareness, generate demand, and increase adoption

Read More for the details.

2024 03 05

GCP – Build supercharged gen AI applications with LangChain and Google Cloud databases

langchain-google-cloud-sql-pg-python

Generative AI is empowering developers — even those without experience in machine learning — to build transformative AI applications. In order to get started they need to integrate large language models (LLMs) and other foundation models with operational databases and craft prompts to pull relevant information from various data sources, including their existing enterprise systems.

Role of databases in gen AI

Developers are finding that large language models on their own, are insufficient for building high quality and non-hallucinating enterprise Gen AI apps. Operational databases bridge that gap between LLMs and enterprise gen AI apps by grounding the LLMs and the applications in the actual enterprise data. This is an implementation of a Gen AI technique called Retrieval Augmented Generation (RAG), which many applications are adopting. This allows them to incorporate external knowledge from databases for more accurate, domain-specific, and up-to-date results. Vector-enabled databases offer semantic search without compromising security and readiness while also being easy to use and requiring no data movement.

Streamlining RAG workflows with LangChain and Google Cloud databases

To provide application developers with tools to help them quickly and more efficiently build RAG applications, we built a deeper integration with LangChain, a popular open-source LLM orchestration framework. Last year we shared reference patterns for leveraging Vertex AI embeddings, foundation models and vector search capabilities with LangChain to build generative AI applications. Developers now have access to a suite of LangChain packages for leveraging Google Cloud’s database portfolio for additional flexibility and customization to drive the most value from their private data.

Each package will have up to three LangChain integrations:

Document loader for loading and storing information from documents,

Vector stores to enable semantic search for our databases that support Vectors

Chat Messages Memory to enable chains to recall previous conversations

This enables developers the flexibility to build complex workflows, allowing them to easily swap out underlying components (such as a vector database) as needed depending on their specific use cases. Examples include personalized product recommendations, question answering, document search and synthesis, customer service automation, and more.

The LangChain Vector stores integration is available for Google Cloud databases with vector support, including AlloyDB, Cloud SQL for PostgreSQL, Memorystore for Redis, and Spanner.

The Document loaders and Memory integrations are available for all Google Cloud databases including AlloyDB, Cloud SQL for MySQL, PostgreSQL and SQL Server, Firestore, Bigtable, Memorystore for Redis, El Carro for Oracle databases, and Spanner.

This table lists the packages that are now available.

Python Packages

Google Cloud Database

GitHub repo

Documentation

LangChain repo

Cloud SQL PostgreSQL

langchain-google-cloud-sql-mysql-python

Cloud SQL for MySQL

(no Vector Store)

langchain-google-cloud-sql-mssql-python

Cloud SQL for SQL Server

(no Vector Store)

langchain-google-alloydb-pg-python

AlloyDB for PostgreSQL

langchain-google-spanner-python

Spanner

langchain-google-bigtable-python

Bigtable

(no Vector Store)

langchain-google-memorystore-redis-python

Memorystore for Redis

langchain-google-firestore-python

Firestore (in Native mode)

langchain-google-datastore-python

Firestore (in Datastore mode)

langchain-google-el-carro-python

El Carro for Oracle Databases

LangChain’s integration with Google Cloud databases provides access to accurate and reliable information stored in an organization’s databases, enhancing the credibility and trustworthiness of LLM responses. Additionally, it enables enhanced contextual understanding, by pulling in contextual information from databases resulting in highly relevant and personalized responses tailored to user needs. This provides:

Accelerated development: LangChain encodes best practices for RAG applications, allowing development teams to get started quickly;

Interoperability: LangChain provides standard interfaces for various components needed to build RAG applications, giving developers the flexibility and ease to swap between different components;

Observability: Given that LLMs are a central component of RAG applications, there is an inherently large amount of non-determinism. This makes observability more important than ever. Developers have access to LangSmith which is a built-in observability platform.

New era of generative AI application development

Collaboration with LangChain supports our long-term commitment to be an open, integrated, and innovative database platform. With LangChain’s broad reach (over 5 million downloads monthly), this integration introduces a new era of AI development, enabling application developers to build cutting-edge generative AI applications that are not only intelligent, but also knowledge-driven and firmly grounded in reality.

In addition to the above offerings, Vertex AI also provides a fully managed, purpose built Search engine for RAG applications. Please see this link for details.

Getting started

If you’re interested in getting started with our LangChain Integrations, consider trying out the sample quickstart application as a starting point. In the associated notebook, we provide a detailed explanation of how to use the LangChain Document loader, Vector store, and Chat Messages Memory integrations. For step by step guidance, you can also check out the following Codelabs:

Cloud SQL for PostgreSQL Github Link, Colab

Memorystore Github Link, Colab

AlloyDB for PostgreSQL Github Link, Colab

Spanner Github Link Colab

To hear more about these capabilities, join our live Data Cloud Innovation Live webcast on March 7, 2024, 9:00 AM – 10:00 AM PST, to hear from product engineering about the latest innovations across AlloyDB for PostgreSQL, Spanner, BigQuery, and Looker.

Read More for the details.

2024 03 05

GCP – Personalized Service Health now in the Google Cloud mobile app

To simplify incident management for businesses, in August 2023 we introduced Personalized Service Health to provide fast, transparent, relevant, and actionable communication about Google Cloud service disruptions.

Today, we’re excited to announce that Personalized Service Health is available in the Google Cloud mobile app as well, making it easier for businesses to stay informed, from anywhere at any time, about incidents that impact their services. This is a new addition to existing Google Cloud mobile app troubleshooting features like metrics and log based alerts, log viewer, incident management and cloud resources management.

The Google Cloud mobile app now displays incident data that is personalized to your cloud service subscriptions, allowing you to quickly identify and mitigate issues that are affecting your applications, seamlessly, wherever you may be.

Critical incident information at-a-glance

The mobile app experience is designed for on-the-go needs, surfacing only the most important information about active and recently closed incidents.

You can now view active and recently closed incidents

relevant to your projects (test data)

As with other distribution channels in Personalized Service Health, notifications are triggered based on the creation of alert policies. By defining criteria in Personalized Service Health for notifying the user, those criteria apply to any distribution channel chosen, such as email, mobile app, SMS, webhook, pub/sub, or other channels.

The Google Cloud mobile app lets you view incident details for both active and recently closed incidents. The experience is designed for mobile devices, surfacing only the most important information — which product, location(s), and projects are impacted, and what is known about workarounds or expected duration. Personalized Service Health’s web UI and the APIs may provide additional information, such as a longer incident history, while other surfaces may have more comprehensive information for use in service management analysis.

Google Cloud incident notification thresholds

Google Cloud has thresholds for posting incidents via Personalized Service Health. This is to prevent over-communication of minor incidents that have little to no impact on your business operations. Incidents have to reach a certain scope and severity threshold before they are shared via Personalized Service Health. For example, Google engineers will analyze a temporary slowdown of a service to prevent recurrence, but absent significant impact, Personalized Service Health may well not notify you of it.

An active incident begins with an alert and an engineer being paged. And as it progresses, Google engineers learn more about it until ultimately, they learn the cause of the incident and how each system involved is affected. Personalized Service Health is designed to provide pertinent updates to each active incident as we learn more, such as discovering workarounds or estimating the time it will take to resolve the incident. These updates are disseminated through all Personalized Service Health notification channels, including the Google Cloud mobile app.

You can now view personalized incident details
directly within the Google Cloud mobile app

Stay informed with the latest application release

To get started with Personalized Service Health, please see the linked information page and enable Personalized Service Health. To begin receiving notifications in the Google Cloud mobile app, set up an alerting policy in Personalized Service Health and configure the mobile app distribution channel in Personalized Service Health.

The mobile app experience will evolve through frequent releases as we build more sophisticated solutions for keeping you informed. For that reason, we strongly advise users to ensure they always install the latest update of the Google Cloud mobile app. This new feature is available today. If you have any feedback, we would love to hear from you — just click the “Send feedback” button in the app. And if you don’t have the app, go ahead and download the app today from Google Play or the Apple App Store.

Read More for the details.

2024 03 05

GCP – Dividends from data: Building a lean data stack for a Series C Fintech

It is often said that a journey of a thousand miles begins with a single step.

10 years ago, building a data technology stack felt a lot more like a thousand miles than it does today; technology, automation, and business understanding of the value of data have significantly improved. Instead, the problem today is knowing how to take the first step.

Figure: PrimaryBid Overview

PrimaryBid is a regulated capital markets technology platform connecting public companies to their communities during fundraisings. But choosing data technologies presented a challenge as our requirements had several layers:

PrimaryBid facilitates novel access for retail investors to short-dated fundraising deals in the public equity and debt markets. As such, we need a platform that can be elastic to market conditions.PrimaryBid operates in a heavily regulated environment, so our data stack must comply with all applicable requirements.PrimaryBid handles many types of sensitive data, making information security a critical requirement.PrimaryBid’s data assets are highly proprietary; to make the most of this competitive advantage, we needed a scalable, collaborative AI environment.As a business with international ambitions, the technologies we pick have to scale exponentially, and be globally available.And, perhaps the biggest cliche, we needed all of the above for as low a cost as possible.

Over the last 12 or so months, we built a lean, secure, low-cost solution to the challenges above, partnering with vendors that are a great fit for us; we have been hugely impressed by the quality of tech available to data teams now, compared with only a few years ago. We built an end-to-end unified Data and AI Platform. In this blog, we will describe some of the decision-making mechanisms together with some of our architectural choices.

The 30,000 foot view

The 30,000 foot view of PrimaryBid’s environment will not surprise any data professional. We gather data from various sources, structure it into something useful, surface it in a variety of ways, and combine it together into models. Throughout this process, we monitor data quality, ensure data privacy, and send alerts to our team when things break.

Figure: High Level summary of our data stack

Data gathering and transformation

For getting raw data into our data platform, we wanted technology partners whose solutions were low-code, fast, and scalable. For this purpose, we chose a combination of Fivetran and dbt to meet our needs.

Fivetran supports a huge range of pre-built data connectors, which allow data teams to land new feeds in a matter of minutes. The cost model we have adopted is based on monthly ‘active’ rows, i.e., we only pay for what we use.

Fivetran also takes care of connector maintenance, freeing up massive amounts of engineering time by outsourcing the perpetual cycle of updating API integrations.

Once the data is extracted, dbt turns raw data into a usable structure for downstream tools, a process known as analytics engineering. dbt and Fivetran make a synergistic partnership, with many Fivetran connectors having dbt templates available off the shelf. dbt is hugely popular with data engineers, and contains many best practices from software development that ensure analytics transformations are robust.

Both platforms have their own orchestration tools for pipeline scheduling and monitoring, but we deploy Apache Airflow 2.0, managed via Google Cloud’s Cloud Composer, for finer-grained control.

Data storage, governance, and privacy

This is the point in our data stack where Google Cloud starts to solve a whole variety of our needs.

We start with Google Cloud’s BigQuery. BigQuery is highly scalable, serverless, and separates compute costs from storage costs, allowing us only to pay for exactly what we need at any given time.

Beyond that though, what sold us the BigQuery ecosystem was the integration of the data and model privacy, governance and lineage throughout. Leveraging Google Cloud’s Dataplex, we set security policies in one place, on the raw data itself. As the data is transformed and passed between services, these same security policies are adhered to throughout.

One example is PII, which is locked away from every employee bar a necessary few. We tag data one time with a ‘has_PII’ flag, and it doesn’t matter what tool you are using to access the data, if you do not have permission to PII in the raw data you will never be able to see it anywhere.

Figure: Unified governance using Dataplex

Data analytics

We chose Looker for our self-service and business intelligence (BI) platform based on three key factors:

Instead of storing data itself, Looker writes SQL queries directly against your data warehouse. To ensure it writes the right query, engineers and analysts build Looker analytics models using ‘LookML’. LookML for the most part is low-code, but for complex transformations, SQL can be written directly into the model, which plays to our team’s strong experience with SQL. In this instance we store the data in BigQuery and access through Looker.Being able to extend Looker into our platforms was a core decision factor. With the LookML models in place, transformed, clean data can be passed to any downstream service.Finally, the interplay between Looker and Dataplex is particularly powerful. Behind the scenes, Looker is writing queries against BigQuery. As it does so, all rules around data security and privacy are preserved.

There is much more to say about the benefits we found using Looker; we look forward to discussing these in a future blog post.

AI and machine learning

The last step in our data pipelines is our AI/ML environment. Here, we have leaned even further into Google Cloud’s offerings, and decided to use Vertex AI for model development, deployment, and monitoring.

To make model building as flexible as possible, we use the open-source Kubeflow framework within Vertex AI Pipeline environment for pipeline orchestration; this framework decomposes each step of the model building process into components, each of which performs a fully self-contained task, and then passes metadata and model artifacts to the next component in the pipeline. The result is highly adaptable and visible ML pipelines, where individual elements can be upgraded or debugged independently without affecting the rest of the code base.

Figure: Vertex AI Platform

Finishing touches

With this key functionality set up, we’ve added a few more components to add even more functionality and resilience to the stack:

Real-ime pipelines: running alongside our FiveTran ingestion, we added a lightweight pipeline that brings in core transactional data in real time. This leverages a combination of managed Google Cloud services, namely Pub/Sub and Dataflow, and adds both speed and resilience to our most important data feeds.Reverse ETL: Leveraging a CDP partner, we write analytics attributes about our customers back into our customer relationship management tools, to ensure we can build relevant audiences for marketing and service communications.Generative AI: following the huge increase in available gen AI technologies, we’ve built several internal applications that leverage Google’s PaLM 2. We are working to build an external application too — watch this space!

So there you have it, a whistle-stop tour of a data stack. We’re thrilled with the choices we’ve made, and are getting great feedback from the business. We hope you found this useful and look forward to covering how we use Looker for BI in our organization.

Special thanks to the PrimaryBid Data Team, as well as Stathis Onasoglou and Dave Elliott for their input prior to publication.

Read More for the details.

2024 03 05

GCP – Confluent brings real-time capabilities to Google Cloud generative AI

In 2023, the spotlight was on generative AI (gen AI) and how it is paving the way for a new category of AI that can create and co-innovate with humans to produce new content, such as text, code, images, and music. Gen AI capabilities are not only promising but extremely powerful, given that large language models (LLMs) can be trained and tuned on vast amounts of data.

Still, the freshness of the data can limit a gen AI model’s out-of-box capabilities and potential. Gen AI models sometimes need to be extended to other systems, such as to access new information when the use case requires real-time context. As a result, many organizations find themselves looking to solve issues related to real-time data access, integrating data from multiple sources in different formats, and the complexity associated with training and leveraging models.

Data streaming platforms like Confluent, powered by Apache Kafka, can help overcome these data challenges by providing the latest data and information. With Confluent, businesses can easily connect, process, integrate, and scale the data needed to support their gen AI use cases, enabling them to solve highly-specific, contextual issues in real time.

For instance, a customer service team could use Confluent to stream real-time customer requests and responses to a chatbot built with Google Cloud technologies. Leveraging AI, the chatbot can access the information used in responses to create more personalized and relevant recommendations while considering real-time context, such as weather conditions, demographics, and purchase history.

Improving personalization is just one example of ways businesses can benefit from using real-time data and gen AI to deliver better customer experiences. Overall, it can also help boost sales with better recommendations, reduce churn with more satisfying customer experiences, and even reduce costs by helping to automate more support tasks with the help of AI chatbots.

Creating a central nervous system for data movement

Through its many years of research and product development, Google has become a recognized leader in the AI space. Already, Google’s LLMs are providing a strong foundation for a rich set of gen AI capabilities that customers and partners can leverage to build new innovations.

Now, Confluent is making these models even easier to use by helping integrate structured and unstructured data from various sources directly into Google Cloud. You can use Connectors/Clients to stream reliable, real-time data in Confluent to Google Cloud AI products and services at scale.

The diagram above illustrates common architecture patterns for using Confluent to stream real-time data that can support gen AI workflows.

Knowledge workflows: Confluent gathers data from various sources across internal and external data systems, pre-processes it into a specific format, and stores it to an appropriate location, which can be used as a knowledge base to build context for gen AI.Inference workflows: Confluent streams data to the systems and tools that help create gen AI-powered interactions between human users and machines, such as text, voice, conversation, and more.Central nervous system: Confluent orchestrates the data exchange between processes and services seamlessly, abstracting events as data streams, processing them, and connecting them directly to models and neural networks. By using stream processing the results can provide communication with a human through various machine interfaces.

To demonstrate how this comes to life, Confluent built a gen AI-powered, personalized shopping assistant that leverages Confluent and Google Cloud generative AI. The application flow allows a customer to have a conversation with an AI chatbot, which connects to Vertex AI and interacts in real time. Here is an example of the dialog:

Behind the scenes, Confluent takes the request, sends it to Vertex AI, and then provides a response. With Confluent, Apache Kafka provides the framework for the business to quickly process the data and provide a generative AI response. This delivers an enriched customer experience and allows the customer to receive precise details on product availability and purchasing locations.

Confluent improves gen AI chatbots in a number of ways, including:

Creating a unified view for gen AI models. Confluent can combine data from multiple sources, including cross-cloud and on-premises, to create a unified view for gen AI models. This view can be used for a variety of applications, such as generating targeted content based on the user, no matter where the data lives..Reducing the cost of training and deploying gen AI models. Confluent helps reduce the amount of data that needs to be processed by gen AI models, leading to lower training and deployment costs.Improving the accuracy and performance of gen AI models: By providing gen AI models with real-time access and the ability to process data from various sources, Confluent improves the accuracy and performance of gen AI models.Making gen AI models more accessible to everyone. Confluent is easy to use and manage, making it easier for developers and organizations to build and deploy gen AI applications that require real-time information.

In addition to these benefits, there are a number of ways that Confluent enables the real-time data movement to Google Cloud AI services that empowers organizations to build more sophisticated gen AI experiences with text, voice, images, and video. For example, streaming your most recent, up-to-date customer data to Vertex AI Search and Conversation can enable teams to harness your data to provide more personalized, relevant chatbot responses and improve personalized recommendations based on customer preferences.

Confluent and Google Cloud: Better gen AI, together

Overall, Confluent allows organizations to get more out of Google Cloud generative AI, helping them build, deploy, and scale gen AI applications faster without worrying about whether they have the data they need.

Confluent delivers real-time data streaming, data integration, scalability, and reliability in one industry-leading platform, allowing organizations from every industry to tap into gen AI to provide better experiences or solve customer problems. Confluent also recently launched Data Streaming for AI, an initiative leveraging Google gen AI partnerships to accelerate organizations’ development of real-time AI applications. And that’s just the beginning — Confluent continues to work on delivering top data streaming innovation to help companies meet real-time AI demands with trustworthy, relevant data served up in the moment.

Learn more about Google Cloud’s open and innovative generative AI partner ecosystem. To get started with Confluent, join the Data Streaming Startup Challenge and begin experimenting with Confluent Cloud on the Google Cloud Marketplace today!

Read More for the details.

2024 03 04

GCP – Regional vs. zonal GKE clusters: making the right choice for your workloads