Cloud

Secure your static web apps with Private Endpoints by limiting exposure to the public internet.
Read More for the details.

2021 08 18

Azure – General availability: Public DNS support for private Azure Kubernetes Service clusters

Simplify name resolution to your AKS clusters without compromising your Kubernetes API server security.
Read More for the details.

2021 08 18

GCP – Scalable tech support via AI-augmented chat

As Googlers transitioned to working from home during the pandemic, more and more turned to chat-based support to help them fix technical problems. Google’s IT support team looked at many options to help us meet the increased demand for tech support quickly and efficiently.

More staff? Not easy during a pandemic.

Let service levels drop? Definitely not.

Outsource? Not possible with our IT requirements.

Automation? Maybe, just maybe…

How could we use AI to scale up our support operations, making our team more efficient?

The answer: Smart Reply, a technology developed by a Google Research team with expertise in machine learning, natural language understanding, and conversation modeling. This product provided us with an opportunity to improve our agents’ ability to respond to queries from Googlers by using our corpus of chat data. Smart Reply trains a model that provides suggestions to techs in real time. This reduces the cognitive load when multi-chatting and helps a tech drive sessions towards resolution.

In the solution detailed below, our hope is that IT teams in a similar situation can find best practices and a few shortcuts to implementing the same kind of time saving solutions. Let’s get into it!

Challenges in preparing our data

Our tech support service for Google employees—Techstop—provides a complex service, offering support for a range of products and technology stacks through chat, email, and other channels.

Techstop has a lot of data. We receive hundreds of thousands of requests for help per year. As Google has evolved we’ve used a single database for all internal support data, storing it as text, rather than as protocol buffers. Not so good for model training. To protect user privacy, we want to ensure no PII (personal identifiable information – e.g. usernames, real names, addresses, or phone numbers) makes it into the model.

To address these challenges we built a FlumeJava pipeline that takes our text and splits each message sent by agent and requester into individual lines, stored as repeated fields in a protocol buffer. As our pipe is executing this task, it also sends text to the Google Cloud DLP API, removing personal information from the session text, replacing it with a redaction that we can later use on our frontend.

With the data prepared in the correct format, we are able to begin our model training. The model provides next message suggestions for techs based on the overall context of the conversation. To train the model we implemented tokenization, encoding, and dialogue attributes.

Splitting it up

The messages between the agent and customer are tokenized: broken up into discrete chunks for easier use. This splitting of text into tokens must be carefully considered for several reasons:

Tokenization determines the size of the vocabulary needed to cover the text.
Tokens should attempt to split along logical boundaries, aiming to extract the meaning of the text.
Tradeoffs can be made between the size of each token, with smaller tokens increasing processing requirements but enabling easier correlation between different spans of text.

There are many ways to tokenize text (SAFT, splitting on white spaces, etc.), here we chose sentence piece tokenization, with each token referring to a word segment.

Prediction with encoders

Training the neural network with tokenized values has gone through several iterations. The team used an Encoder-Decoder architecture that took a given vector along with a token and used a softmax function to predict the probability that the token was likely to be the next token in the sentence/conversation. Below, a diagram represents this method using LSTM-based recurrent networks. The power of this type of encoding comes from the ability of the encoder to effectively predict not just the next token, but the next series of tokens.

This has proven very useful for Smart Reply. In order to find the optimal sequence, an exponential search over each tree of possible future tokens is required. For this we opted to use beam search over a fixed-size list of best candidates, aiming to avoid increasing the overall memory use and run time for returning a list of suggestions. To do this we arranged tokens in a trie, and used a number of post processing techniques, as well as calculating a heuristic max score for a given candidate, to reduce the time it takes to iterate through the entire token list. While this improves the run time, the model tends to prefer shorter sequences.

In order to help reduce latency and improve control we decided to move to an Encoder-Encoder architecture. Instead of predicting a single next token and decoding a sequence of following predictions with multiple calls to the model, it instead encodes a candidate sequence with the neural network.

In practice, the two vectors – the context encoding and the encoding of a single candidate output – are combined with dot product to arrive at a score for the given candidate. The goal of this network is to maximize the score for true candidates – e.g. candidates that did appear in the training set – and minimize false candidates.

Choosing how to sample negatives affects the model training greatly. Below are some strategies that can be employed:

Using positive labels from other training examples in the batch.
Drawing randomly from a set of common messages. This assumes that the empirical probability of each message is sampled correctly.
Using messages from context.
Generating negatives from another model.

As this encoding generates a fixed list of candidates that can be precomputed and stored, each time a prediction is needed, only the context encoding needs to be computed, then multiplied by the matrix of candidate embeddings. This reduces both the time from the beam search method and the inherent bias towards shorter responses.

Dialogue Attributes

Conversations are more than simple text modeling. The overall flow of the conversation between participants provides important information, changing the attributes of each message. The context, such as who said what to whom and when, offers useful bits of input for the model when making a prediction. To that end the model uses the following attributes during its prediction:

Local User ID’s – we set a finite number of participants for a given conversation to represent the turn taking between messages, assigning values to those participants. In most cases for support sessions there are 2 participants, requiring ID 0, and 1.
Replies vs continuations – initially modeling focused only on replies. However, in practice conversations also include instances where participants are following up on the previously sent message. Given this, the model is trained for both same-user suggestions and “other” user suggestions.
Timestamps – gaps in conversation can indicate a number of different things. From a support perspective, gaps may indicate that the user has disconnected. The model takes this information and focuses on the time elapsed between messages, providing different predictions based on the values.

Post processing

Suggestions can then be manipulated to get a more desirable final ranking. Such post-processing includes:

Preferring longer suggestions by adding a token factor, generated by multiplying the number of tokens in the current candidate.
Demoting suggestions with a high level of overlap with previously sent messages.
Promoting more diverse suggestions based on embedding distance similarities.

To help us tune and focus on the best responses the team created a priority list. This gives us the opportunity to influence the model’s output, ensuring that responses that are incorrect can be de-prioritized. Abstractly it can be thought of as a filter that can be calibrated to best suit the client’s needs.

Getting suggestions to agents

With our model ready we now needed to get it in the hands of our techs. We wanted our solution to be as agnostic to our chat platform as possible, allowing us to be agile when facing tooling changes and speeding up our ability to deploy other efficiency features. To this end we wanted an API that we could query either via gRPC or via HTTPs. We designed a Google Cloud API, responsible for logging usage as well as acting as a bridge between our model and a Chrome Extension we would be using as a frontend.

The hidden step, measurement

Once we had our model, infrastructure, and extension in place we were left with the big question for any IT project. What was our impact? One of the great things about working in IT at Google is that it’s never dull. We have constant changes, be it planned or unplanned. However, this does complicate measuring the success of a deployment like this. Did we improve our service or was it just a quiet month?

In order to be satisfied with our results we conducted an A/B experiment, with some of our techs using our extension, and the others not. The groups were chosen at random with a distribution of techs across our global team, including a mix of techs with varying levels of experience ranging from 3 to 26 months.

Our primary goal was to measure tech support efficiency when using the tool. We looked at two key metrics as proxies for tech efficiency:

The overall length of the chat.
The number of messages sent by the tech.

Evaluating our experiment

To evaluate our data we used a two-sample permutation test. We had a null hypothesis that techs using the extension would not have a lower time-to-resolution, or be able to send more messages, than those without the extension. The alternative hypothesis was that techs using the extension would be able to resolve sessions quicker or send more messages in approximately the same time.

We took the mid mean of our data, using pandas to trim outliers greater than 3 standard deviations away. As the distribution of our chat lengths is not normal, with significant right skew caused by a long tail of longer issues, we opted to measure the difference in means, relying on central limit theorem (CLT) to provide us with our significance values. Any result with a p-value between 1.0 and 9.0 would be rejected.

Across the entire pool we saw a decrease in chat lengths of 36 seconds.

In reference to the number of chat messages we saw techs on average being able to send 5-6 messages more in less time.

In short, we saw techs were able to send more messages in a shorter period of time. Our results also showed that these improvements increased with support agent tenure, and our more senior techs were able to save an average of ~4 minutes per support interaction.

Overall we were pleased with the results. While things weren’t perfect, it looked like we were onto a good thing.

So what’s next for us?

Like any ML project, the better the data the better the result. We’ll be spending time looking into how to provide canonical suggestions to our support agents by clustering results coming from our allow list. We also want to investigate ways of making improvements to the support articles provided by the model, as anything that helps our techs, particularly the junior ones, with discoverability will be a huge win for us.

How can you do this?

A successful applied AI project always starts with data. Begin by gathering the information you have, segmenting it up, and then starting to process it. The interaction data you feed in will determine the quality of the suggestions you get, so make sure you select for the patterns you want to reinforce.

Our Contact Center AI allows tokenization, encoding and reporting, without you needing to design or train your own model, or create your own measurements. It handles all the training for you, once your data is formatted properly.

You’ll still need to determine how best to integrate its suggestions to your support system’s front-end. We also recommend doing statistical modeling to find out if the suggestions are making your support experience better.

As we gave our technicians ready-made replies to chat interactions, we saved time for our support team. We hope you’ll try using these methods to help your support team scale.

Read More for the details.

2021 08 18

Azure – HB-series Azure Virtual Machines will be retired by 31 August 2024

Transition to new HPC virtual machines by 31 August 2024.
Read More for the details.

2021 08 18

Azure – NC-series Azure Virtual Machines will be retired by 31 August 2022

Transition to new HPC virtual machines by 31 August 2022.
Read More for the details.

2021 08 18

GCP – Five do’s and don’ts of multicloud, according to the experts

Do you want to fire up a bunch of techies? Talk about multicloud! There is no shortage of opinions. I figured we should tackle this hot topic head-on, so I recently talked to four smart folks—Corey Quinn of Duckbill Group, Armon Dadgar of Hashicorp, Tammy Bryant Butowof Gremlin, and James Watters of VMware—about what multicloud is all about, key considerations, and why you should (or shouldn’t!) do it.

Five important insights came out of these discussions. If you’re on a multicloud journey or considering one, keep reading.

Do: Choose to do multicloud for the right reasons

Don’t do multicloud because Gartner says so, implores Corey Quinn. Before embarking on a multicloud, define a “why” focused on business value journey, says Armon Dadger. For example, you might want to use services from each public cloud because of their differentiated services, according to Tammy Bryant Butow. Armon also calls out regulatory reasons, existing business relationships, and accommodating mergers and acquisitions. On the topic of M&A, Corey points out that if you acquire a company that uses another cloud, it’s usually expensive and difficult to consolidate. It can be smarter to stay put.

Don’t: Over-engineer for workload or data portability

Thinking that you’ll build a system that moves seamlessly among the various cloud providers? Hold up, says our group of experts. Armon points out that aspects of your toolchain or architecture may be multicloud—think of some of your workflows or global network routing—but that shifting workloads or data is far from simple. Corey says that trying to engineer for “write once, run anywhere” can slow you down, and ignores the inherent uniqueness that’s part of each platform. Specifically, Corey calls out the per-cloud stickiness of identity management, security features, and even network functionality. And data gravity is still a thing, says James, that causes some to dismiss multicloud outright.

If you’re using multiple public clouds, you take advantage of the distinct value each offers, Armon says. Use native cloud services where possible so that you see the benefits from useful innovations, built-in resilience, and baked-in best practices. The value from that cloud-infused workload may outweigh the benefits of seamless portability.

Do: Recognize different stakeholder interests and needs

James smartly points out that many multicloud debates happen because people are arguing from different perspectives. Context matters. If you’re an infrastructure engineer who invests heavily in a given cloud’s identity and access management model, multicloud looks tricky. Or if you’re a data engineer with petabytes of data homed in a particular cloud, multicloud may look unrealistic. James highlights that many developers default to multicloud because their local tools—where all the work happens—are multicloud. A developer’s IDE and preferred code framework(s) aren’t tied to any given cloud. Be aware that groups within your organization will come at multicloud from distinct directions. And this may impact your approach!

Don’t: Go it alone

Corey talks about the importance of asking others what worked, and what didn’t. Tammy offers her best practices around sharing results from experiments. It’s about sharing knowledge and tapping into it for community benefit. Others have probably tried what you’re trying, and can help you avoid common pitfalls. If you’ve just made an architectural choice that didn’t work out, share it, and help others avoid the pain.

Read research from analysts, go to conferences or watch videos to observe case studies, and join online communities that offer a safe place to share mistakes and learn from others.

Do: Experiment first using techniques like multi-region deployments

If you think you can operate systems across clouds, how about you first try doing it across regions in a specific cloud, suggests Corey. Getting a system to properly work across cloud regions isn’t trivial, he says, and that experience can help you uncover where you have architectural or operational constraints that will be even worse across cloud providers.

This is great guidance if your multicloud aspirations involve using multiple clouds to power one application—versus the more standard definition of multicloud where you use different clouds for different applications—but can also surface issues in your support process or toolchain that fail when faced with distributed systems. Start with muti-region deployments and chaos engineering experiments before aggressively jumping into multicloud architectures.

The Google Cloud take

Do the things above. It’s great advice. I’ll add three more things that we’ve learned from our customers.

Don’t fear multicloud. You’re already doing it. You don’t single-source everything. As Corey mentioned, you probably already have one cloud for productivity tools, another for source code, another for cloud infrastructure. You’ll use software and application services from a mix of providers for a single app. You have that experience in your team and have been doing that for decades. What people do rightly worry about is using more than one infrastructure service beneath an application, as that can introduce latency, security, and logistical hurdles. Make sure you know which model your team is considering.

Embrace the right foundational components, including Kubernetes. Will everything run on Kubernetes? Of course not. Don’t try to do that. But it also represents the closest thing we have to a multicloud API. Companies are using Kubernetes to stripe a consistent experience across clouds. And this isn’t just to orchestrate containers, but also to manage infrastructure and cloud-native services. Also, consider where you need other fundamental consistency across clouds, including areas like provisioning and identity federation.

Use Google Cloud as your anchor. Here’s a fundamental question you have to decide for yourself: Are you going to bring your on-premises technology and practices to the cloud, or bring cloud technology and practices on-prem? We sincerely believe in the latter. Anchor to where you’re trying to get to. We offer Anthos as a way to build and run distributed Kubernetes fleets in Google Cloud and across clouds. By using a cloud-based backplane instead of an on-prem one, you’re offloading toil, leveraging managed services for scale and security, and introducing modern practices to the rest of your team.

We learned a lot about multicloud through these discussions, and it seems like others did too. That’s why we’re going to do a second round of interviews with a new crop of experts so that we can keep digging deeper into this topic. Stay tuned!

Read More for the details.

2021 08 18

GCP – Use Process Metrics for troubleshooting and resource attribution

When you are experiencing an issue with your application or service, having deep visibility into both the infrastructure and the software powering your apps and services is critical. Most monitoring services provide insights at the Virtual Machine (VM) level, but few go further. To get a full picture of the state of your application or service, you need to know what processes are running on your infrastructure. That visibility into the processes running on your VMs is provided out of the box by the new Ops Agent and made available by default in Cloud Monitoring. Today we will cover how to access process metrics and why you should start monitoring them.

Better visibility with process metrics

The data gathered by process metrics include CPU, memory, I/O, number of threads, and more, for any running processes and services on your VMs. When the Ops Agent or the Cloud Monitoring agent is installed, these metrics are captured at 60-second intervals and sent to Cloud Monitoring so you can visualize, analyze, track, and alert on them. A single VM may run tens or hundreds of processes, while you may have tens of thousands running across your fleet of VMs.

As a developer, you may only care about seeing inside a single VM to troubleshoot and identify memory leaks or the source of performance issues.

As an operator or IT Admin, you may be interested in aggregate resource consumption, building baseline views of compute, storage, and networking usage across your VM fleet. Then, when those baseline consumption levels break normal behaviors, you will know when to investigate your systems.

Built for scale and ease of use

Cloud Monitoring is built on the same advanced backend that powers metrics across Google. This proven scalability means your metrics ingestion will be supported despite the extremely high cardinality. Additionally, our agents do not require any config file changes to turn on process metric monitoring.

Lastly, our goal is to provide you the observability and telemetry data where, and when, you need it. So, like the rest of the operations suite, we deliver process metrics in the context of your infrastructure, directly in the VM admin console.

Navigating to a single VM’s in-context process monitoring in GCE

The navigation is simple. Once you have the Ops Agent or the Cloud Monitoring agent installed in your VMs:

Go to the Compute Engine console page and click on VM Instances

Select the VM that you want to investigate

In the navigation menu on the top, click Observability

Click on Metrics

Lastly, click on Processes

In the window on the right you will see a chart and a table with all of the processes in your VM. You can also filter by time frame and sort by name or value. You do not need to do anything, other than have the agent installed, for the process to be detected and displayed.

Fleet-wide metrics monitoring

Cloud Monitoring gives you a look across your fleet of VMs so you can identify the aggregated usage of resources by processes. This level of broad, yet granular, insight can drive your decisions around which software to run or how many VMs you need to optimally power your apps and services. Admins can perform a cost-savings analysis if they determine that certain processes are slowing down the work of a large number of VMs. The larger numbers of less powerful VMs can be replaced by fewer, more capable VMs.

To get this fleet-wide view:

Navigate to Cloud Monitoring

Click Dashboards in the left menu

In the All Dashboards list, click on VM Instances

Towards the top of the window, click on Processes

This provides many charts detailing the processes running across your fleet of VMs.

The new Cloud Monitoring VM Fleet-wide Process view in the VM Instance Dashboard

Get started today

To start identifying and monitoring your process metrics, you must first install the Ops Agent, or have installed the legacy Cloud Monitoring agent. Once that is complete, the process metrics data will automatically be ingested into Cloud Monitoring and the VM admin console.

If you have any questions, or to join the conversation with other developers, operators, DevOps, and SREs, visit the Cloud Operations page in the Google Cloud Community.

Read More for the details.

2021 08 18

GCP – How to conduct live network forensics in GCP

Forensics is the application of science to criminal and civil laws. It is a proven approach for gathering and processing evidence at a crime scene. An integral step in the forensics process is the isolation of the scene without contaminating or modifying the evidence. The isolation step prevents any further contamination or tampering with possible evidence. The same philosophy can be applied to the investigation of digital events.

In this post we will review methods, tactics and architecture designs to isolate an infected VM while still making it accessible to forensic tools. The goal is to allow access so that data and evidence can be captured while protecting other assets. There are many forensic tools for networking that can be used to analyze the captured traffic. This post does not cover these tools but rather how to configure GCP to capture live traffic in the most efficient and secured way. Once traffic is captured, customers can use whatever tools they prefer to run the analysis. More details about these tools and required agents can be found here and details about open source tooling that Google and others are developing are available here.

In cloud security context, when a VM shows signs of compromise, the most common immediate reaction is to take a snapshot, shut down the instance and relocate the image snapshot to an isolated environment, a method known as “dead analysis”. However, shutting down the instance will impede an important step in the investigation and digital forensics, as some important information in a buffer or the RAM may be lost.

The other forensic approach is “live analysis”, in which the VM is kept on and evidence is gathered from the VM directly. Live forensics enables the imaging of RAM, bypasses most hard drives and software encryption, determines the cause of abnormal traffic, and is extremely useful when dealing with active network intrusions. This process is usually performed by forensic analysts. For example, if there is a good chance the malware resides only in memory then live forensics is, in some cases, the only way to capture and analyze the malware. In this method, in addition to disk and memory evidence, a forensic analysis can also capture live-network from data sent over the compromised VM network interfaces. Some of the benefits of collecting live networks are reconstruction and visualizing traffic flow in real-time, in particular during active network intrusions or attacks.

In the cloud, a VM must be isolated when it becomes apparent that an incident has happened, in order to protect other VMs from being infected. Our Cloud Forensics 101 session covers the process and required artifacts, such as logs, that need to be collected for cloud forensics.

What happens when your image is compromised

Let’s now assume that one of the VMs in your infrastructure has been compromised and alarms are coming from products such as GCP’s Cloud Security and Command Center, Chronicle backstory or your SIEM.

An incident response plan consists of 3 phases: preparation (actions taken before an attack), detection (actions taken during an attack) and response (actions taken after an attack). During the detection phase, the Computer Security Incident Response Team (CSIRT) or threat analysts decide whether live acquisition analysis is required. If live forensics is required, for example when it is vital to acquire a VM’s RAM, then one of the first courses of action is to isolate and contain the VM from the rest of the world and connect the Forensics VPC to the VM for investigation. The forensics VPC resides in a forensics GCP project, it includes digital forensics tools to capture evidence from the VM such as SANS Investigative Forensics Toolkit – SIFT, The Sleuth Kit, Autopsy, Encase, FTK and alike. These tools are already installed, configured, tested and ready to use. The forensics project will also save and preserve evidence such as disk and memory images for forensic review.

We’ll cover two scenarios in this post, the first scenario is to isolate the image and connect the forensics VPC to the image for live acquisition.

In the second scenario we will also capture live traffic from the isolated image for live network digital forensics. To capture live traffic from the infected VM, we will leverage the GCP Packet Mirroring service to duplicate all traffic going in and out of the VM and send it to a Forensics VPC for analysis. Network forensics analysis tools such as Palo Alto VM-Series for IDS, ExtraHop Reveal(x), CheckPoint CloudGuard, Arkime (formerly Moloch), Corelight are installed, configured and ready for deployment in the Forensics VPC, these tools will be used to analyze the duplicate network traffic.

Isolating the infected VM from other resources and connecting the forensics VPC

As part of the Incident Response plan preparation phase, the CSIRT created a Google Cloud Forensics Project. Since the Forensics project will be used only when needed, it’s better to automate the creation of the project and its resources with a tool such as Terraform. It is important to grant access to this project only to individuals and groups who deal with incident response and forensics, such as CSIRT. As shown in figure 1, the Forensics project on the right includes its own VPC, non-overlapped subnet and VM images with pre-installed and pre-configured forensics tools. Internal load-balancer and instance-groups are also configured, we will use these resources to capture live traffic, as described later in this post.

Click to enlarge

In order to contain the spread of any malware or network activity, such as data exfiltration, we’ll isolate the VM with VPC firewall rules. The GCP VPC firewall is a distributed firewall that always enforces its rules, protecting the instances regardless of their configuration and operating systems. In other words, the compromised VM cannot override the firewall enforcement if its policies follow the principle of least privilege . Rules can be applied to all instances in the network, target network tags or service accounts.

Step 1 in the diagram above shows how an infected VM is isolated from the rest of the network by firewall rules that deny any ingress and egress traffic from any CIDR beside the forensics subnet CIDR. The infected VM is tagged with a unique network tag, for example “<image-name>_InfectedVM”, then firewalls rules are applied on the network tag. This ensures that the infected VM is isolated from the project and the Internet while enabling access to the VM via VPC peering which we’ll configure in step-2. You can learn more about VPC firewalls rules here.

In step 2, the VPC from the forensics project is peered with the VPC in the production project. When VPC peering is established routes are exchanged between the VPCs. By default, VPC peering exchanges all subnet routes, however, custom routes can also be filtered if required. At this point, the VM from the forensics project can communicate with the infected VM and start the live forensics analysis job using the pre-installed and pre-configured forensics tools.

Shared VPC is a network construct that allows you to connect resources from multiple projects, called service-projects, to a common VPC in a host-project. VPCs from different projects can securely communicate with each other via the hosted project network while centralizing the network administration. Figure 2 depicts Shared VPC topology, rather than using VPC peering, during step 2 the Forensics project is simply attached to the host project. After the attachment, the Shared VPC allows the forensics tools to communicate with the infected VMs.

Click to enlarge

Capturing live network traffic with Google Traffic Mirroring

If live network forensics is required, for example during active network intrusions, then the incoming and outgoing traffic needs to be duplicated and captured. While VPC Flow logs capture the networking metadata telemetry, this is not enough for live network forensics analysis. GCP Packet Mirroring clones the traffic of a specified instance in a VPC and forwards it to a specified internal load balancer which collects the mirrored traffic and sends it to an attached instance group. Packet mirroring captures all the traffic from the specified subnet, network tags, or instance name.

Figure 3 depicts the steps that allow the compromised VM to communicate with the rest of the world (for example beaconing with C&C) while capturing all traffic for investigation in a peered VPC deployment.

Click to enlarge

Figure 4 depicts the steps that allow the compromised VM to communicate with the rest of the world while capturing all traffic for investigation in a shared VPC deployment.

Click to enlarge

We will use the Forensics’ project internal load balancer and the instance group VMs which include packet capture and analysis tools. Note that production and forensics networks must be in the same region. Detailed steps to configure packet mirroring are available on this page.

If you are using a Shared VPC then check thePacket Mirroring configuration for Shared VPC for configuration details. Figure 4 depicts the packet mirroring flow in a shared VPC topology.

It is recommended to automate and periodically test the process to make sure that in case of an incident, the entire setup and Forensics toolchain can be quickly deployed. If after initial investigation a suspicious destination, such as a Command and Control [C&C] Server, has been identified, then the Packet Mirroring policy can be adjusted with a policy filterthat only mirrors traffic from that C&C server IP address.

An incident management plan must be in place for companies using cloud services, and this plan should also include the option of using live acquisition when necessary. design and preparation for forensics acquisition allows the company to build the infrastructure that can be deployed and connected to the appropriate VM automatically. The architectures described in this post can help the process of collecting and preserving vital evidence for the forensic process, while the incident response team resolves the incident.

Read More for the details.

2021 08 18

GCP – Using Compute Engine: Users’ top questions answered

Compute Engine lets you create and run virtual machines (VMs) on Google’s infrastructure, allowing you to launch large compute clusters with ease. When it comes to getting started with Compute Engine, our customers have lots of questions—but some questions come up more often than others.

We looked at an internal list of the most popular Compute Engine documentation pages over a 30-day period to find out what topics were explored by users again and again. Here are the top four questions users have about Compute Engine, in order.

1. What are the different machine families

Compute Engine lets you select the right machine for your needs. You can choose from a curated set of predefined virtual machine (VM) configurations optimized for specific workloads, ranging from small-level purpose to large-scale use cases or create a machine type customized to your needs with our custom machine type feature.

Compute Engine machines are categorized by machine family, including:

General-purpose: Best price-performance ratio for a variety of standard and cloud-native workloads

Compute-optimized: Highest performance per core for compute-intensive workloads, such as ad serving or media transcoding

Memory-optimized: More compute and memory per core than any other family for memory-intensive workloads, such as SAP HANA or in-memory data analytics

Accelerator-optimized: Designed for your most demanding workloads, such as machine learning (ML) or high performance computing (HPC)

Read the documentation to learn more about each machine family category.

2. How to connect to VMs using advanced methods

In general, we recommend using the Google Cloud Console and the gcloud command-line tool to connect to Linux VM instances. However, some of our customers want to use third-party tools, or require alternative connection configurations.

In these cases, there are several methods that might fit your needs better than the standard connection options:

Connecting to instances using third-party tools (e.g. Windows PuTTY, Chrome OS Secure Shell app), or MacOS or Linux local terminal)

Connecting to instances without external IP addresses

Connecting to instances as the root user Manually connecting between instances and running commands as a service account

Read the documentation to learn about advanced methods for connecting Linux VMs.

3. How to set up OS Login

OS Login lets you use IAM roles and permissions to manage access and permissions to VMs.

OS Login is the recommended way to manage users across multiple instances or projects. OS Login provides:

Automatic Linux account lifecycle management

Fine-grained authorization using Google IAM without having to grant broader privileges

Automatic permissions updates to prevent unwanted access

Ability to import existing Linux accounts from Active Directory (AD) and Lightweight Directory Access Protocol (LDAP)

You can also add an extra layer of security by setting up OS Login with two-factor authentication or manage organization access by setting up organization policies.

Read the documentation to learn how to configure OS login and connect to your instances.

4. How to manage SSH keys in metadata

Compute Engine allows you to manually manage SSH keys and local user accounts by editing public SSH key metadata.

You can add public SSH keys to instance and project metadata using:

The Google Cloud Console The gcloud command-line tool

API methods from the Google Cloud Client Libraries

Read the documentation to learn how to manually manage SSH keys and local user accounts in metadata.

Don’t see your question here? Check out the Compute Engine documentation for all of our recommended guides, tutorials, and resources.

Read More for the details.

2021 08 18

Azure – Azure Blob storage – Inventory generally availability

Administrators can enable a daily or weekly inventory to be created to gain understanding of their blobs and containers.
Read More for the details.

2021 08 17

AWS – Amazon EC2 customers can now use ED25519 keys for authentication during instance connectivity operations

Starting today, AWS customers can use ED25519 keys to prove their identity when connecting to EC2 instances. ED25519 is an elliptic curve based public-key system commonly used for SSH authentication.

Read More for the details.

2021 08 17

AWS – AWS Transfer Family expands compatibility for FTPS/FTP clients and increases limit for number of servers

AWS Transfer Family now supports configuring a client side externally accessible IP address on an FTPS/FTP server, allowing clients behind a firewall or a NAT router to connect to the server. Additionally, customers can now easily scale up their workloads by creating up to 50 servers within AWS Transfer Family in a single AWS account and region, a fivefold increase in the previously supported limit of 10 servers.

Read More for the details.

2021 08 17

GCP – Unlocking Application Modernization with Microservices and APIs

If you build apps and services that your customers consume, two things are certain:

You’re exposing APIs in some form or the other.

Your apps are made by multiple functions working together to deliver products and services.

As you scale up and grow, your enterprise architecture can benefit from a sound strategy for both API management and service management, both of which impact your customer and developer experience. In this article, we’ll explore how these two technologies fit into your application modernization strategy, including how we’re seeing our customers use Anthos Service Mesh and Apigee API Management together.

How APIs, microservices, and a service mesh are related

APIs accelerate your modernization journey by unlocking and allowing legacy data and applications to be consumed by new cloud services. As a result, organizations can launch new mobile, web, and voice experiences for customers.

The API layer acts as a buffer between legacy services and front-end systems and keeps the front-end systems up and running by routing requests as the legacy services are migrated or transformed into modern architectures. In addition, an API management platform, like Apigee, manages the lifecycle of those APIs with design, publish, analyze, and governance capabilities.

Once microservices architectures become prevalent in an organization, technical complexity increases and organizations find a need for deeper and more granular visibility into their applications and services. This is where a service mesh comes into play.

A service mesh is not only an architecture that empowers managed, observable, and secure communication across an organization’s services, but also the tool that enables it. Anthos Service Mesh lets organizations build platform-scale microservices with requirements around standardized security, policies, and controls, and it provides teams with in-depth telemetry, consistent monitoring, and policies for properly setting and adhering to SLOs.

How API management and a service mesh compliment one another

Many organizations ask themselves, “Do I really need both an API management platform and a service mesh? How do I manage them together?”

The answer to the first question is yes. These two technologies focus on different aspects of the technology stack and are complementary to each other. A service mesh modernizes your application networking stack by standardizing how you deal with network security, observability, and traffic management. An API management layer focuses on managing the lifecycle of APIs, including publishing, governance, and usage analytics.

Most organizations draw a logical boundary at business units or technology groups. Sharing these microservices outside that boundary with other business units or with partners is where Apigee plays a significant role. You can drive and manage the consumption of those services through developer portals, monitoring API usage, providing authentication, and more, with Apigee.

Google Cloud offers Anthos Service Mesh for service management and Apigee for API management. These two products work together to provide IT teams with a seamless experience throughout the application modernization journey. The Apigee Adapter for Envoy enables organizations that use Anthos Service Mesh to reap the benefits of Apigee by enforcing API management policies within a service mesh.

Accelerate your application modernization journey

Though the journey to application modernization doesn’t always follow a clear-cut path, by adopting API management and a service mesh as part of a modernization journey, your organization can be better equipped to rapidly respond to changing markets securely and at scale.

Wherever you are on your application modernization journey, Google Cloud can help. To learn more about how service management and API management can be part of your application modernization journey, read this whitepaper.

Read More for the details.

2021 08 17

AWS – Fully customizable action space now available in AWS DeepRacer Console

Today, we are excited to announce AWS DeepRacer model action space is now fully customizable. AWS DeepRacer is the fastest way to get started with machine learning (ML) through a fully autonomous 1/18th scale race car driven by reinforcement learning, a 3D racing simulator, and a global racing league. Until now, AWS DeepRacer customers were modifying the steering angle and speed beyond the 20 options available in console by downloading, editing offline, and importing back into the console to improve their model’s performance. With a fully customizable action space, AWS DeepRacer console users can now specify both provided actions and add actions of their own to push their models performance to its limits!

Read More for the details.

2021 08 17

GCP – Build a reinforcement learning recommendation application using Vertex AI

Reinforcement learning (RL) is a form of machine learning whereby an agent takes actions in an environment to maximize a given objective (a reward) over this sequence of steps. Applications of RL include learning-based robotics, autonomous vehicles and content serving. The fundamental RL system includes many states, corresponding actions, and rewards to those actions. Translate that into a movie recommender system: The ‘state’ is the user, the ‘action’ is the movie to recommend to the user, and the ‘reward’ is the user rating of the movie. RL is a great framework for optimizing ML models, as mentioned by Spotify in the keynote in the Applied ML Summit 2021.

In this article, we’ll demonstrate an RL-based movie recommender system executed in Vertex AI and built with TF-Agents,a library for RL in TensorFlow. This demo has two parts: (1) a step-by-step guide leveraging Vertex Training, Hyperparameter Tuning, and Prediction services; (2) a MLOps guide to build end-to-end pipelines using Vertex Pipelines and other Vertex services.

TF-Agents meets Vertex AI

In reinforcement learning (RL), an agent takes a sequence of actions in a given environment according to some policy, with the goal of maximizing a given reward over this sequence of actions. TF-Agents is a powerful and flexible library enabling you to easily design, implement and test RL applications. It provides you with a comprehensive set of logical modules that support easy customization:

Policy: A mapping from an environment observation to an action or a distribution over actions. It is the artifact produced from training, and the equivalent of a “Model” in a supervised learning setup.

Action: A move or behavior that is outputted by some policy, and chosen and taken by an agent.

Agent: An entity that encapsulates an algorithm to use one or more policies to choose and take actions, and trains the policy.

Observations: A characterization of the environment state.

Environment: Definition of the RL problem to solve. At each time step, the environment generates an observation, bears the effect of the agent action, and then given the action taken and the observation, the environment responds with a reward as feedback.

A typical RL training loop looks like the following:

A typical process to build, evaluate, and deploy RL applications would be:

Frame the problem: While this blog post introduces a movie recommendation system, you can use RL to solve a wide range of problems. For instance, you can easily solve a typical classification problem with RL, where you can frame predicted classes as actions. One example would be digit classification: observations are digit images, actions are 0-9 predictions and rewards indicate whether the predictions match the ground truth digits.
Design and implement RL simulated experiments: We will go into detail on simulated training data and prediction requests in the end-to-end pipeline demo.
Evaluate performance of the offline experiments.
Launch end-to-end production pipeline by replacing the simulation constituents with real-world interactions.

Now that you know how we’ll build a movie recommendation system with RL, let’s look at how we can use Vertex AI to run our RL application in the cloud. We’ll use the following Vertex AI products:

Vertex AI training to train a RL policy (the counterpart of a model in supervised learning) at scale
Vertex AI hyperparameter tuning to find the best hyperparameters
Vertex AI prediction to serve trained policies at endpoints

Vertex Pipelines to automate, monitor, and govern your RL systems by orchestrating your workflow in a serverless manner, and storing your workflow’s artifacts using Vertex ML Metadata.

Step-by-step RL demo

This step-by-step demo showcases how to build the MovieLens recommendation system using TF-Agents and Vertex AI services, primarily custom training and hyperparameter tuning, custom prediction and endpoint deployment. This demo is available on Github, including a step-by-step notebook and Python modules.

The demo first walks through the TF-Agents on-policy (which is covered in detail in the demo) training code of the RL system locally in the notebook environment. It then shows how to integrate the TF-Agents implementation with Vertex AI services: It packages the training (and hyperparameter tuning) logic in a custom training/hyperparameter tuning container and builds the container with Cloud Build. With this container, it executes remote training and hyperparameter tuning jobs using Vertex AI. It also illustrates how to utilize the best hyperparameters learned from the hyperparameter tuning job during training, as an optimization.

The demo also defines the prediction logic, which takes in observations (user vectors) from prediction requests and outputs predicted actions (movie items to recommend), in a custom prediction container and builds the container with Cloud Build. It deploys the trained policy to a Vertex AI endpoint, and uses the prediction container as the serving container for the policy at the Vertex AI endpoint.

End-to-end workflow with a closed feedback loop: Pipeline demo

Pipeline architecture

Building upon our RL demo, we’ll now show you how to scale this workflow using Vertex Pipelines. This pipeline demo showcases how to build an end-to-end MLOps pipeline for the MovieLens recommendation system, using Kubeflow Pipelines (KFP) for authoring and Vertex Pipelines for orchestration.

Highlights of this end-to-end demo include:

RL-specific implementation that handles RL modules, training logic and trained policies as opposed to models
Simulated training data, simulated environment for predictions and re-training
Closing of the feedback loop from prediction results back to training
Customizable and reproducible KFP components

An illustration of the pipeline structure is shown in the figure below.

The pipeline consists of the following components:

Generator: to generate MovieLens simulation data as the initial training dataset using a random data-collecting policy, and store in BigQuery [executed only once]
Ingester: to ingest training data in BigQuery and output TFRecord files
Trainer: to perform off-policy (which is covered in detail in the demo) training using the training dataset and output a trained RL policy
Deployer: to upload the trained policy, create a Vertex AI endpoint and deploy the trained policy to the endpoint

In addition to the above pipeline, there are three components which utilize other GCP services (Cloud Functions, Cloud Scheduler, Pub/Sub):

Simulator: to send recurring simulated MovieLens prediction requests to the endpoint
Logger: to asynchronously log prediction inputs and results as new training data back to BigQuery, per prediction requests
Trigger: to recurrently execute re-training on new training data

Pipeline construction with Kubeflow Pipelines (KFP)

You can author the pipeline using the individual components mentioned above:

The execution graph of the pipeline looks like the following:

Refer to the GitHub repo for detailed instructions on how to implement and test KFP components, and how to run the pipeline with Vertex Pipelines.

Applying this demo to your own RL projects and production

You can replace the MovieLens simulation environment with a real-world environment where RL quantities like observations, actions and rewards capture relevant aspects of said real-world environment. Based on whether you can interact with the real world in real-time, you may choose either on-policy (showcased by the step-by-step demo) or off-policy (showcased by the pipeline demo) training and evaluation.

If you were to implement a real-world recommendation system, here’s what you’d do:

You would represent users as some user vectors. The individual entries in the user vectors may have actual meanings like age. Alternatively, they may be generated through a neural network as user embeddings. Similarly, you would define what an action is and what actions are possible, likely all items available on your platform; you would also define what the reward is, such as whether the user has tried the item, how long/much the user has spent on the item, user rating of the item, and so on. Again, you have the flexibility to decide on representations for framing the problem that maximize performance. During training or data pre-collection, you may randomly sample users (and build the corresponding user vectors) from the real world, use those vectors as observations to query some policy for items to recommend, and then apply that recommendation to users and obtain their feedback as rewards.

This RL demo can also be extended to ML applications other than recommendation system. For instance, if your use case is to build an image classification system, then you can frame an environment, where observations are the image pixels or embeddings, actions are the predicted classes, and rewards are feedback on the predictions’ correctness.

Conclusion

Congratulations! You have learned how to build reinforcement learning solutions using Vertex AI in a fully managed, modularized and reproducible way. There is so much you can achieve with RL, and you now have many Vertex AI as well as Google Cloud services in your toolbox to support you in your RL endeavors, be it production systems, research or cool personal projects.

Additional resources

[Recap] step-by-step demo link: GitHub link

[Recap] end-to-end pipeline demo: GitHub link

TF-Agents tutorial on bandits: Introduction to Multi-Armed Bandits

Vertex Pipelines tutorial: Intro to Vertex Pipelines

Read More for the details.

2021 08 17

AWS – Amazon CodeGuru Profiler extends visualizations capability with a new compare option for application profile

Amazon CodeGuru is a developer tool powered by machine learning that provides intelligent recommendations for improving code quality and identifying an application’s most expensive lines of code. Developers can use Amazon CodeGuru Profiler to understand the runtime behavior of their applications, identify and remove code inefficiencies, improve performance, and significantly decrease compute costs.

Read More for the details.

2021 08 17

AWS – Amazon WorkSpaces Renews Windows Desktop Experience with Windows Server 2019 bundles and 64-bit Microsoft Office 2019

Amazon WorkSpaces now offers new bundles powered by Windows Server 2019, providing a Windows 10 desktop experience along with a 64-bit Microsoft Office 2019 Professional Plus bundle option. The feature brings a refreshed Windows 10 desktop experience, and enables customers to run applications that require recent Windows versions.

Read More for the details.

2021 08 17

AWS – AWS CDK releases v1.111.0 – v1.116.0 with updates for unit testing and CDK Pipelines support

During July, 2021, 6 new versions of the AWS Cloud Development Kit (CDK) for JavaScript, TypeScript, Java, Python, .NET and Go were released (v1.111.0 through v1.116.0). With these releases, CDK Pipelines is generally available with a new API which makes it easier to define multi-environment deployment pipelines for cloud applications in all shapes and sizes. Additionally, the new assertions library, which is available in all CDK languages, now includes some new cloud testing primitives such as findResources(). These releases also include 24 bugs fixes and 50 new features that span 30 different modules across the library. Many of these changes were contributed by the developer community.

Read More for the details.

2021 08 17

AWS – AWS Systems Manager Fleet Manager now offers report generation for Managed Instances