Starting today, Amazon Elastic Compute Cloud (Amazon EC2) C5n instances are available in AWS Region Asia Pacific (Osaka). C5n instances, powered by 3.0 GHz Intel® Xeon® Scalable processors (Skylake) and based on the AWS Nitro System, offer customers up to 100Gbps networking for network-bound workloads, while continuing to take advantage of the security, scalability and reliability of Amazon’s Virtual Private Cloud (VPC). Customers can also take advantage of C5n instances network performance to accelerate data transfer to and from S3, reducing the data ingestion wait time for applications and speeding up delivery of results. They are built for workloads such as High Performance Computing (HPC), analytics, machine learning, Big Data and data lake applications.

Read More for the details.

2023 01 05

AWS – Amazon EC2 R6i instances are now available in additional regions

AWS, Cloud AWS

Starting today, Amazon Elastic Compute Cloud (Amazon EC2) R6i instances are available in AWS Region Asia Pacific (Osaka), Middle East (Bahrain), and Africa (Cape Town). R6i instances are powered by 3rd generation Intel Xeon Scalable processors, with an all-core turbo frequency of 3.5 GHz. These instances come with always-on memory encryption using Intel Total Memory Encryption (TME) and are built on AWS Nitro System. The AWS Nitro System is a collection of AWS designed hardware and software innovations that enables the delivery of efficient, flexible, and secure cloud services with isolated multi-tenancy, private networking, and fast local storage. R6i instances are SAP Certified and are built for workloads such as SQL and NoSQL databases, distributed web scale in-memory caches (Memcached and Redis), in-memory databases (SAP HANA), and real time big data analytics (Apache Hadoop and Apache Spark clusters).

Read More for the details.

2023 01 05

AWS – Amazon EC2 R6g, R6gd, and M6gd instances are now available in additional regions

AWS, Cloud AWS

Starting today, Amazon Elastic Compute Cloud (Amazon EC2) R6g instances are available in AWS Region Asia Pacific (Osaka). R6gd instances are available in AWS Regions Asia Pacific (Osaka), Asia Pacific (Seoul) and Europe (Stockholm). M6gd instances are available in AWS Regions Asia Pacific (Osaka) and Africa (Cape Town). These instances are powered by AWS Graviton2 processors, and they are built on AWS Nitro System. The Nitro System is a collection of AWS designed hardware and software innovations that enables the delivery of efficient, flexible, and secure cloud services with isolated multi-tenancy, private networking, and fast local storage. R6g instances are built for running memory-intensive workloads such as open-source databases, in-memory caches, and real time big data analytics. R6gd instances provide local SSD storage and are ideal for memory-intensive workloads that need access to high-speed, low latency storage. M6gd offer a balance of compute, memory, networking, and local SSD resources for a broad set of workloads. They are built for applications such as application servers, microservices, gaming servers, mid-size data stores, and caching fleets that also need access to high-speed, low latency storage.

Read More for the details.

2023 01 05

Azure – General Availability: Azure Sphere support for European Data Boundary

Azure, Cloud Azure

Azure Sphere Security Service EU data processing and storage is now generally available.

Read More for the details.

2023 01 05

GCP – Best Kept Security Secrets: How VPC Service Controls can help build defense in depth

Cloud, Google Cloud gcp

While cloud security skeptics might believe that data in the cloud is just one access configuration mistake away from a breach, the reality is that a well-designed set of defense in depth controls can help minimize the risk of configuration mistakes and other security issues. Our Virtual Private Cloud (VPC) Service Controls can play a vital role in creating an additional layer of security while also making it easier to manage your data in a way that most cloud services can’t do today.

Organizations across industries and business models use cloud services for activities such as processing their data, performing analytics, and deploying systems. VPC Service Controls can empower an organization when deciding how users and data can cross the perimeter of the supported cloud services, if at all. While VPC Service Controls are designed to help stop attackers, they can also enable contextual trusted data sharing (similar to how Zero Trust allows contextual access).

aside_block[StructValue([(u’title’, u’Hear monthly from our Cloud CISO in your inbox’), (u’body’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea0b2a96a90>), (u’btn_text’, u’Subscribe today’), (u’href’, u’https://go.chronicle.security/cloudciso-newsletter-signup?utm_source=cgc-blog&utm_medium=blog&utm_campaign=FY23-Cloud-CISO-Perspectives-newsletter-blog-embed-CTA&utm_content=-&utm_term=-‘), (u’image’, None)])]

What are VPC Service Controls

VPC Service Controls help administrators define a security perimeter around Google-managed services, which can control communication to and between those services. The Service Controls isolate your Google Cloud resources from unauthorized networks, including the internet. For example, this can help you keep a clear separation between services that are allowed to run in production and services that are not.

VPC Service Controls can help you prevent mistakes that lead to costly data breaches because they control access to your data at a granular level. They add context-aware access controls on these services, and can help you achieve your organization’s Zero Trust access goals.

Example of the fine-grained policies based on access context that can be implemented with VPC Service Controls.

Like wearing two layers of clothing made from different fabrics to protect you from winter weather, VPC Service Controls may appear similar to Identity and Access Management (IAM) but they come from a different approach to implementing security. IAM enables granular identity based access control; VPC Service Controls create a security perimeter that protects your cloud resources and sets up private connectivity to Google Cloud’s APIs and services. While it’s recommended to use both, VPC Service Controls have an added bonus: They can support blocking data theft during a breach.

The additional layer of security that VPC Service Controls offer customers is challenging to achieve with on-premise systems or even with other cloud providers. You can think of it as an firewall for APIs that also adds a logical security control around three paths that data can take:

From the public internet to your resources

Inside your VPC and the cloud service perimeter

For service-to-service communication (for example, denying access to someone who wants to load data to BigQuery or exfiltrate data from a BigQuery instance.)

How VPC Service Controls can help stop attackers

VPC Service Controls are used to enforce a security perimeter. They can help isolate resources of multi-tenant Google Cloud services, which can help reduce the risk of data exfiltration or a data breach.

For example, a bank that migrated financial data processing to Google Cloud can use VPC Service Controls to isolate their processing pipeline from public access (or any unauthorized access) by defining a trusted service perimeter.

How VPC Service Controls can enable trusted sharing

VPC Service Controls are used to securely share data across service perimeters with full control over what resource can connect to other resources, or outside the perimeter. This can help mitigate data exfiltration risks stemming from stolen identities, IAM policy misconfigurations, some insider threats, and compromised virtual machines.

Returning to our bank example, that same bank using VPC Service Controls may securely share or access data across Service Perimeters and Organizations. They may allow access to specific partners and for specific operations.

Example of allowing an authorized device plus authorized access.

How VPC Service Controls support Zero Trust access

VPC Service Controls deliver Zero Trust access to multi-tenant Google Cloud services. Clients can restrict access to authorized IPs, client context, user identity, and device parameters while connecting to multi-tenant services from the internet and other services.

A bank can use moving its services to the public cloud as an opportunity to abandon outdated access management approaches and adopt Zero Trust access. VPC Service Controls let them create granular access control policies in Access Context Manager based on attributes such as user location and IP address. For example, it would allow an analyst to only access Big Query from a corporate device on the corporate network during business hours. These policies can help ensure the appropriate security controls are in place when granting access to cloud resources from the Internet.

Next steps with VPC

Check out these pages to learn more about VPC Service Controls for your sensitive cloud deployments, especially for regulated workloads. This blog is the third in our Best Kept Security Secrets series, which includes how to tap into the power of Organization Policy Service and how Cloud EKM can help resolve the cloud trust paradox.

Read More for the details.

2023 01 05

GCP – New year, new skills – How to reach your cloud career destination

Cloud, Google Cloud gcp

Cloud is a great place to grow your career in 2023. Opportunity abounds, with cloud roles offering strong salaries and scope for growth as a constantly evolving field.1 Some positions do not require a technical background, like project managers, product owners and business analysts. For others, like solutions architects, developers and administrators, coding and technical expertise are a must.

Either way, cloud knowledge and experience are required to land that dream job. But where do you start? And how do you keep up with the fast pace of ever-changing cloud technology? Check out these tips below. There are also suggested training opportunities to help support your growth, including no-cost options below!

Start by looking at your experience

Your experience can be a great way to get into cloud, even if it seems non-traditional. Think creatively about transferable skills and opportunities. Here are a few scenarios where you might find yourself today:

You already work in IT, but in legacy systems or the data center. Forrest Brazeal, Head of Content Marketing at Google Cloud, talks about that in detail in this video.

Use your sales experience to become a sales engineer, or your communications experience to become a developer advocate. Stephanie Wong, Developer Advocate at Google Cloud, discusses that here.

You don’t have that college degree that is included in the job requirements. I’ve talked about that in a recent video here.

Your company has a cloud segment, but your focus is in another area. Go talk to people! Access your colleagues who do what you want to do. Get their advice for skilling up.

Define where you need to fill in gaps

If you are looking at a technical position, you will need to show cloud applicable experience, so learn about the cloud and build a portfolio of work. Here are a few key skills we recommend everyone have to start1:

Code is non-negotiable. People who come from software development backgrounds typically find it easier to get into and maneuver through the cloud environment because of their coding experience. Automation, basic data manipulation and scaling is a daily requirement. If you don’t have a language you already know, learning Python is a great place to begin.

Understand Linux. You’ll need to know the Linux filesystem, basic Linux commands and fundamentals of containerization.

Learn core networking concepts like the IP Protocol and the others that layer on top of it, DNS, and subnets.

Make sure you understand the cloud itself, and in particular the specifics about Google Cloud for a role at Google.

Familiarity with open source tooling. Terraform for automation and Kubernetes for containers are portable between clouds and are worth taking the time to learn.

Boost your targeted hands-on skills

Check out Google Cloud Skills Boost for a comprehensive collection of training to help you upskill into a cloud role, including hands-on labs that get you real-world experience in Google Cloud. New users can start off with a 30 day no-cost trial2. Take a look at these recommendations:

No-cost labs and courses

A Tour of Google Cloud Hands-on Labs– 45 minutes

A Tour of Google Cloud Sustainability – 60 minutes

Introduction to SQL for BigQuery and Cloud SQL – 60 minutes

Infrastructure and Application Modernization with Google Cloud – Introductory course with three modules

Preparing for Google Cloud certification – Courses to help you prepare for Google Cloud certification exams

Build hands on projects

This part is critical for the interview portion. Take the cloud skills you have learned and create something tangible that you can use as a story during an interview. Consider building a project on Github so others can see it working live, and document it well. Be sure to include your decision making process. Here is an example:

Build an API or a web application

Develop the code for the application

Pick the infrastructure to deploy that application in the cloud, choose your storage option, and a database with which it will interact

Get valuable cloud knowledge for non-technical roles

For tech-adjacent roles, like those in business, sales or administration, having a solid knowledge of cloud principles is critical. We recommend completing the Cloud Digital Leader training courses, at no cost. Or go the extra mile and consider taking the Google Cloud Digital Leader Certification exam once you complete the training:

No-cost course

Cloud Digital Leader Learning Path – understand cloud capabilities, products and services and how they benefit organizations

$99 registration fee

Google Cloud Digital Leader Certification – validate your cloud expertise by earning a certification

Commit to learning in the New Year

A couple of other resources we have are the Google Cloud Innovators Program, which will help you grow on Google Cloud and connect you with other community members. There is no-cost to join, and it will give you access build your skills and the future of cloud! Join today.

Start your new year strong, whether you are exploring Google Cloud Data, DevOps or Networking certifications by completing Arcade games each week. This January play to win in The Arcade while you learn new skills and earn prizes on Google Cloud Skills Boost. Each week we will feature a new game to help you show and grow your cloud skills, while sampling certification-based learning paths.

Make 2023 the year to build your cloud career and commit to learning all year, with our $299/year annual subscription. This subscription includes $500 of Google Cloud credits (and a bonus $500 of Google Cloud credits after you successfully certify), a $200 certification voucher, $299 annual subscription to Google Cloud Skills Boost with access to the entire training catalog, live-learning events and quarterly technical briefings with executives.

1. Starting your career in cloud from IT – Forrest Brazeal, Head of Content Marketing, Google Cloud
2. Credit card required to activate a 30 day no-cost trial for new users.

Read More for the details.

2023 01 05

GCP – Hype or Trend? 7 API management use cases rising in prominence

Cloud, Google Cloud gcp

Paper currency — which started gaining prominence in the 1600s — changed the face of global economics and ushered in a new era of international monetary regulation. The primary reason currency created such disruption was its ability to standardize the “medium of exchange”. APIs created a similar effect in the world of technology and digitalization by creating a standardized, reusable, and secure way to exchange information.

Modern web APIs took shape in the early 2000s and played a key role in “.com”mercializing every business. APIs started as a connective tissue primarily relegated to a technical context and quickly evolved into a gateway to new business models, revenue streams, and ecosystems. In 2017, McKinsey estimated a total of $1 Trillion in profit could be up for grabs in the API economy. And in 2022, GGV Capital created an index of API-first startups — a generation of stylistically divergent SaaS companies with leaner operating cost structures and organic usage growth. Just as currency is going through an evolution from banknotes to digital wallets, the world of API management is also on the brink of change.

With more than 15 years of experience managing APIs at Google-scale, we’ve got a unique vantage point from which to observe that change. In today’s post, we will spotlight seven API management use cases that we see growing in prominence — and how you can take advantage of these trends to future-proof your architecture.

#1 “Shift left” in API security

As a gateway to a wealth of information, APIs have also quickly become the primary attack vector in security incidents. When we surveyed 500 technology leaders, we learned that more than 50% of organizations experienced an API security incident in the last 12 months. Adding to the increasing magnitude of attacks, there are an increasing number of vectors for potential API security incidents like misconfigurations, outdated APIs/data/components, and bots/spam/abuse.

These security issues aren’t just in production APIs, but at every stage in the API lifecycle. Notably, we found that 67% of the issues are discovered during testing as part of the release management process. This trend ushers in the need for forward-thinking organizations to “shift left with security” — moving controls earlier into the production workflow — by bringing security teams and API teams closer. To stay ahead of security threats, many organizations are actively looking for solutions that allow them to be proactive while minimizing the burden on their security teams. According to our research, integrating capabilities that proactively identify security threats (60%) is top of mind for most IT leaders for the next year.

Source: 2022 API Security Research Report

#2 “Knowledge graph” for your enterprise APIs

It comes as no surprise that every organization is relying on APIs to expand and even ground their digital ecosystem — a network of partners, developers, and customers facilitated by modern, cloud-first technologies. There is a growing magnitude and variety of middleware assets, contributing to the growth of IT complexity.

As the number of APIs continues to increase, there is a need to simplify consumption for internal and external developers. Even the most objectively useful APIs remain unseen by most of the organization. In turn this results in redundant code, reduced developer productivity, or worse — turned into a potential security attack vector. This complexity is shifting focus towards consolidation of all middleware assets, growing adoption, and improving education (see below) to improve developer efficiency and de-mystify the IT complexity.

Source: 2022 State of APIs and applications

This sprawl is a growing problem in the world of APIs, but it has a lot in common with an age-old phenomenon in the world of web pages and content—search. Google was born out of this problem to help organize the world’s information. Similar to Google’s knowledge graph for web pages, there is a need to index, organize, and instantly present API information for developers that need it. Although it is an emerging practice, we see an increasing number of digital leaders and security teams in larger organizations with mature API programs invest in solutions that help consolidate all APIs, organize their information, and manage their lifecycle.

#3 The imminent need for “omni” control planes

APIs have taken on such a vital role in the modern application stack that they have slowly become the neural links across the entire enterprise architecture — bridging legacy and modern applications, shifting architectures towards microservices, and enabling operations across heterogeneous environments. To support all these technological decisions without sacrificing speed, organizations adopted multiple API gateways and fragmented API management solutions. However, this led to a lack of universal visibility, consistent governance, comprehensive security, and meaningful analytics across ALL the enterprise APIs (not just the ones within the confines of a given API management solution). And it increases the maintenance costs — fundamentally undercutting the value of APIs. With this evolution there is a growing need for an omni control plane — analogous to the brain in a human body — across all enterprise APIs.

#4 API governance rising through the priority ranks

Despite the clear need for governance, there is still no unified understanding on a good (or right) approach to API governance. With the rapid adoption of APIs without appropriate standardization or quality standards, API governance is top of mind for IT leaders, again.

According to our research, 45% of IT leaders identified API governance as a critical component of their API program. The top three components of API security, performance analytics, and governance demonstrate the critical need for visibility, quality, and security across all APIs.

Source: 2022 State of APIs and applications

As digital consumers, we have seen this phenomenon across many industries and digital products. For example AirBnB disrupted the short-term rental market by providing standardized listings, detailed information, and high-resolution photos. In fact, the same governance phenomenon is ubiquitous in the world of e-commerce where there is a clear correlation between a high-quality website or product listing and increasing sales.

The same analogy holds true in the world of APIs, as ~90% of developers use APIs in their work there is a direct correlation between the use of APIs and developer productivity. Digital officers and CIOs need to add appropriate governance controls to standardize API design and improve reuse without adding friction to development timelines.

#5 Evolution of design patterns with multiple API gateways

Adoption of new API architectural styles and microservices increased the complexity of the modern application stack. Our research found that 54% of organizations use a service mesh and API management in conjunction today to support the API gateway design pattern. In parallel, there is broad adoption of new protocols like GraphQL or AsyncAPI, outpacing the innovation in API gateways. For example, in a recent survey from DZone found that GraphQL accounted for 22.7% of application integrations.

In response to this challenge IT teams are adopting multiple API gateways — by design — which is creating the need for complex communication patterns for future scalability. But the existing design patterns were mostly sufficient when client applications used homogeneous API protocols (Ex: REST). Although patterns like Backend For FrontEnd (BFF) intended to provide specific API interactions that are relevant on a per-client basis, they still did not account for complexities from multiple gateways and protocols. In response to the adoption of new protocols, there is a need to evolve the existing BFF pattern to account for multiple API gateways and protocols.

#6 Driving green value chains with digital twins

A digital twin is an effectively indistinguishable virtual representation of a physical object, system, or a process. For example the digital twin of a wind turbine (the object being studied) might be used to capture data like performance, rpm (revolutions per minute), or output captured by various sensors outfitted on the turbine. Digital twin adoption is growing and McKinsey estimates investments in digital twins will reach $58 billion by 2026 with a 58% CAGR. Every digital twin uses APIs to monitor, engage, and possibly control the physical asset. For example, Google created the Digital Buildings project — an open source, Apache-licensed effort to manage applications and analyses between a large heterogeneous portfolio of buildings.

Sustainability is one of the driving forces behind the increased use of digital twins. As the need to reach net zero emissions accelerates, many organizations are tying performance (and in some cases even executive pay) to environmental, social, and governance goals. APIs help connect the dots between digital twins and sustainability. For example, an organization operating a manufacturing process could build a digital twin with APIs to collect behavioral data from sensors, monitoring systems, or other sources — which can eventually be integrated into the organization’s digital platform or applications. These digital twins could be used to analyze and optimize the use of materials and energy, to minimize waste and emissions. Additionally, digital twins could be used to monitor and analyze the performance of systems over time, to identify opportunities for continuous improvement.

Overall, APIs play a valuable role in supporting sustainability efforts by enabling digital twins, effectively driving more efficient operation of systems, and providing insights to improve environmental impact. For further examples, check out this video about driving a green value chain with APIs.

#7 Commercializing access to data products

The growing use of data-rich services (like IoT, ML models, remote access services, and web scraping, etc.) coupled with massive ingestion of data everyday is creating massive growth in data delivery paradigms like data lakehouses, data marketplaces, and data streaming systems (global data marketplaces alone are poised to reach $3.5 billion by 2028). Unfortunately, most of these systems are fragmented with almost no relationship or interoperability.

APIs are filling this critical gap for organizations in two critical ways. First, APIs are providing standard and easy access to systems like data lakehouses or analytics hubs. Second, APIs are a key enabler of data products (digital products or services built using data as a core value proposition), a core component of any data sharing system. APIs provide a standardized way for different applications to interact with the data product. For example, an API could be used to allow a mobile app to access data from a weather forecast or a recommendation engine data product. Beyond data products, APIs also provide easy and standardized access various data management platform

APIs continue to play a critical role in every application, experience, and ecosystem. Robust API strategies help organizations adapt to any architecture, business model, or environment in the face of changing technology landscape. Learn more about how Apigee is driving innovation and helping companies future proof their architectures to stay ahead of the top API trends.

Read More for the details.

2023 01 05

GCP – Imagine the possibilities: Exploring industry use cases for Google Distributed Cloud Edge Appliance

Cloud, Google Cloud gcp

While many organizations are driving digital transformation by migrating to the cloud, there are some industries, geographies, and use cases that require a different approach to cloud modernization. Regulated industries such as healthcare, insurance, pharmaceutical, energy, telecommunication, and banking have stringent data residency and sovereignty requirements. Other industries need to meet local data processing requirements, while others require real-time data processing with sub-millisecond latencies, for example to detect defects on manufacturing lines. These use cases demand a combination of edge, on-premises and cloud services for their infrastructure.

With these requirements in mind, Google Cloud launched Google Distributed Cloud powered by Anthos to extend the power of Google Cloud infrastructure and services to the edge (or closer). The underlying infrastructure for this service comes in two variants: a 42U rack filled with compute, storage, and networking devices called Google Distributed Cloud Edge Rack and a 1U appliance called Google Distributed Cloud Edge Appliance.

In this blog post, we discuss the Google Distributed Cloud Edge Appliance and how manufacturing, retail, and automotive industry verticals can use it to address common use cases.

How the appliance works

But first, let’s talk about the Google Distributed Cloud Edge Appliance itself.

Google Distributed Cloud Edge Appliances comprises two components: (1) Distributed Cloud Edge infrastructure and (2) the Distributed Cloud Edge service.

The Google Distributed Cloud Edge service runs on Google Cloud and serves as a control plane for the nodes and clusters running on your appliance. In order to perform remote management of the appliance and to collect metrics, the Distributed Cloud Edge service must be connected to Google Cloud at all times, allowing you to manage your workloads on the edge hardware through the Google Cloud Console. For customers who can’t be connected at all times for data residency or sovereignty reasons, we highly recommend that the appliance be connected to the cloud at least once a month to allow for needed security patches and updates.

Google Distributed Cloud Edge Appliances come with built-in network ports that provide connectivity to the control plane via the internet, Cloud VPN, or Dedicated Interconnect, and to your on-prem network. Each Google Distributed Cloud Edge Appliance is homed to a specific Google Cloud region but it is designed to also use any public Google Cloud endpoint to communicate with the control plane in Google Cloud, allowing you to move these appliances between different geographic locations.

Figure 1 – Logical design of Google Distributed Cloud Edge Appliance

There are two NFS shares on each appliance; one is offline, meaning it does not transfer data to Google Cloud, and the other is online, meaning data saved to that share is synced to Cloud Storage on Google Cloud for further processing. The appliance supports Server Message Block (SMB) and Secure File Transfer Protocols (SFTP) for communication.

Each Google Distributed Cloud Edge Appliance runs Google Distributed Cloud Virtual, enabling you to build a single-node Kubernetes cluster with access to the underlying file system of the appliance. This allows you to build containerized applications on the underlying appliance hardware to address use cases in the following verticals.

Vertical use cases

Now that you understand how Google Cloud Edge Appliance is configured, let’s consider some of the industry use cases where it can provide unique value.

Manufacturing

In the manufacturing industry, quality control and safety is a crucial factor. Businesses need to ensure products are manufactured to the highest standards to remain competitive in their markets, to retain customers, and to keep factory workers safe. To do this, manufacturers need real-time data about the products being manufactured on the production lines, ensuring quality control and gaining a real-time view of where people are on the factory floor.

In manufacturing environments, Google Distributed Cloud Edge Appliance can be used to detect hazards or manufacturing defects in real-time. Figure 1 is a reference architecture for a hazard detection solution running off a Google Distributed Cloud Edge Appliance on a factory floor.

Figure 2 – Hazard detection architecture using Google Distributed Cloud Edge Appliance

In this architecture, cameras on the factory floor stream live video into the Google Distributed Cloud Edge Appliance. Depending on the number of cameras and appliances, cameras could be split or mapped to different appliances. This architecture makes it possible to initially transfer video data to Google Cloud using an online NFS share. Once in Google Cloud, you can use AutoML to train and build models that can be used as part of the hazard detection solution.

With these trained models, the cameras can stream video data into the appliance using the real-time streaming protocol (RTSP). You can then use AutoML inference to analyze the real-time video streaming data.

For example, in this reference architecture, if an individual comes too close to the fork lift, a function is triggered by the microservices running on the edge appliance that pushes a notification either to a messaging service, or to an enterprise resource planning tool. This alerts factory managers to factory floor hazards in real time so they can take corrective action.

You can also review messages and videos later on for preventive planning purposes, or push streamed videos to Cloud Storage for archive, to use the appliance’s storage space more efficiently.

Data transfers to Google Cloud can be done over Google Cloud DedicatedInterconnect, or VPN between the region and your site. This connectivity also allows you to send the appliance’s control-plane network traffic to the region.

You could also use the reference architecture in figure 2 for a product anomaly detection solution running off a Google Distributed Cloud Edge Appliance on a factory floor or manufacturing line. In this instance, machine learning models are trained to detect anomalies on finished products before final packaging.

Retail

In the retail industry, the Google Distributed Cloud Edge Appliance reference architecture in Figure 2 enables a number of transformative capabilities for retail operations, including:

contactless checkout

product scans

mobile-scan-bag

cashierless checkout

unattended retail shops

visual check-out monitoring

It does all this within a retailer’s facilities with the low latency and high throughput you need to process data locally, so you can obtain actionable insights from your data.

Or, you could use Google Distributed Cloud Edge Appliance at the edge to overhaul store management operations, for example, monitoring store occupancy, queue depth and wait times, detecting slips and falls and out-of-stock items, or monitoring inventory compliance.

Automotive

Advanced Driver Assistance Systems (ADAS) are becoming standard in modern automobiles. To successfully build and roll out continued improvements around ADAS, the automotive industry continues to run extensive tests on ADAS systems that are built into the vehicles they manufacture. Automotive companies can use Google Distributed Cloud Edge Appliance to modernize and transform how they collect data for the ADAS systems they’re developing. For example, test vehicles contain several different sensors that generate data, which can be quickly offloaded to an in-vehicle edge appliance.

Then, within the appliance, you can deploy containerized workloads to transform sensor data, infer videos and images and detect events. This alleviates the need for operators to label all events and allows development teams to quickly gather insights from the tests.

If you want to focus on a subset of information, you can transfer specific data or the entire data payload into Transfer Appliances when vehicles return to the development center. All these systems, i.e., transfer appliances and edge appliances, work in tandem to reduce local system administration and operational costs through a cloud-based control plane.

This approach allows you to deploy, track, monitor and configure services that are running in data centers or at edge locations from the cloud. From the factories, the data can be moved offline or online into Google Cloud where you can use different storage classes and processing capabilities to further process or store the data. You can also deploy newly trained models and business rules back to the edge appliances. In all this, data transfers between the cloud and the appliance are performed using end-to-end encryption, to give you control over your data.

Figure 3 – ADAS implementation with a Google Distributed Cloud Edge Appliance

The reference architecture in Figure 3 shows an ADAS implementation where Google Distributed Cloud Edge Appliance is being used to gather, process and transform data at the edge in the automotive industry. It could also be applied to data capture and processing use cases in manned and unmanned vehicles. Notice how the Distributed Edge Appliance extends to the cloud by sending data there, or using other cloud-based services.

We’re just getting started

These are just a few of the use cases where organizations in the manufacturing, retail and automotive industries are using Google Distributed Cloud Edge Appliance with modern and containerized applications that are powered by Google Cloud. If you’re interested in bringing the power of Google Cloud to the edge using Google Distributed Cloud Edge Appliances to transform your business, reach out to us or any of our accredited partners.

Read More for the details.

2023 01 05

GCP – Kicking off 2023 with a new startup accelerator focused on climate change

Cloud, Google Cloud gcp

After the momentum of COP27 and the increasing commitments to climate solutions from nations and corporations throughout 2022, we’re excited to enter a more sustainable 2023. As more organizations navigate the necessary, sustainability-driven business transformations on the horizon, they’re looking for new tools and technology to help them. We believe the global startup ecosystem plays a pivotal role in accelerating their progress, which is why we’re excited to kick-off 2023 with Google for Startups Accelerator: Climate Change programs, and encourage startups to apply to North America or Europe.

These programs will focus on identifying, supporting and scaling startups that are building technologies to combat climate change. This is the first time we’re running the program in Europe, and we’re looking forward to welcoming participants in our region at a critical time when technology can unblock and accelerate decarbonization of the economy.

Europe has committed to a climate neutral economy by 2050, but to achieve that goal, technology and increasing digitalization will be key. According to a recent study we commissioned from ICG, digital technology has the potential to positively impact all sectors of the economy in achieving decarbonization targets. European companies will be looking to startups for new technologies and solutions to accelerate their transformation.

Startups accepted into the cohort will be matched with our network of experts for advice, mentorship, and support across a range of domains and subject matters. The program will have an increased focus on cloud technology, artificial intelligence and machine learning to help early-stage participants advance their business and technology. Participants will hear talks from climate, sustainability and technology experts, get access to technical mentorship and support, and receive training on the Google Cloud tools that can help accelerate their progress.

This year promises to be a boom not only for climate-tech, but technology that enables sustainable innovation across the value chain. There’s a huge opportunity for startups to address these needs and only two weeks left to apply for the program, so don’t miss out!

Read More for the details.

2023 01 05

GCP – The Home Depot orchestrates self-service cloud solutions with Workflows

Cloud, Google Cloud gcp

As one of the world’s largest home improvement retailers, The Home Depot (THD) needs to seamlessly manage the flow of work across a wide variety of IT systems to provide first-class retail experiences for our customers both online and in-store. We began our cloud migration journey back in 2017, with early successes leading us to move more of our workloads to the cloud. This shift drove engineering and technical innovation across the company, but has also increased the need for greater governance to ensure compliance and strong cybersecurity.

Our “Cloud Enablement” team needed a workflow management system that could streamline cloud resource creation projects across the company’s engineering teams. Our main challenge was to create a frictionless developer experience while still securing workflows and enforcing best practices. Most importantly, we were searching for a fully managed solution to workflow automation and governance needs. This would help cloud project creation become more efficient and cost-effective, with developers able to focus on writing and coding the resources themselves, rather than on managing infrastructure. We also wanted our engineers to be able to quickly track and trace their workflows at all stages of the development process — especially when they failed to complete — to increase internal efficiencies related to troubleshooting and resource downtime.

After rigorous testing of potential workflow systems, The Home Depot determined Google Workflows was the ideal solution to meet our needs. Workflows is a fully managed Google Cloud service that enables service integration and orchestration, lightweight data and machine learning pipeline orchestration, and cloud platform automation. This low-code, serverless solution allows developers to focus on their core business needs, rather than on their infrastructure management, which enables them to facilitate the creation of more resilient cloud solutions. Not only can Workflows call any internal or external HTTP endpoint, its workflows are also durable — a workflow supports retries and can wait up to one year for operations to complete without incurring any charges. Other key features include support for human-in-the-middle approval flows and for parallel iteration and branching using atomic variables. Workflow executions can be triggered by an event, allowing developers to build applications and pipelines with event-driven architectures and allowing IT and cloud platform teams to trigger automation workflows based on an event, such as a request from a developer for cloud resources.

The Home Depot successfully launched our Galaxy system, which builds upon two previous generations of testing and development, by leveraging the powerful automation and orchestration functionalities of Workflows. One of the biggest benefits of Workflows’ processes is their composability. We can now deploy a microservices approach that enables our IT teams to use this new self-service system to quickly compose new cloud services by drawing from a series of pre-built and custom workflow tasks.

Google Workflows forms the backbone of THD’s Galaxy system, which orchestrates approvals in ServiceNow and cloud resource creation through a GitOps workflow that uses Terraform Enterprise. Galaxy’s user-friendly interface supports the speedy provisioning of cloud resources and makes valuable use of Workflows’ subworkflows feature to ensure authentication and authorization requirements are met and that common error handling issues can be easily resolved. The system even automates pull request flows and provides status endpoints for those requests to populate a real time tracker for developer feedback.

With this self-service Galaxy platform, The Home Depot’s developers can now easily integrate all the required subworkflows as part of their cloud project creation process. This enables developers to focus on writing the resources they need to tackle their business problems and service their customers’ needs, rather than losing valuable time rewriting generic request flows or securing all the necessary permissions for governance purposes. Although THD briefly explored authoring our own custom workflow automation engine, we realized that by utilizing Workflows we could easily achieve our workflow orchestration needs, all with GCP native services. Our developers are also able to draw upon Google’s existing bank of high-quality technical documentation and training resources, rather than expending developer time and resources producing our own custom documentation.

The Home Depot’s Galaxy system empowers our engineering teams by ensuring an accelerated path for the development and testing of new resources and services within our cloud community. Harnessing the efficiency and composability of Google Workflows, this new self-service system is already delivering feature parity with THD’s existing governance and workflow solutions. We are now turning our attention to building easy self-service paths to service web applications in multi- and single-tenant GKE clusters, while ensuring these resources remain fully conformant with THD’s security and compliance policies.

To learn more about Workflows, Google’s serverless orchestration engine, visit the Workflows landing page today, or go directly to the Cloud Console to try out three of its most common patterns.

Read More for the details.

2023 01 05

GCP – How to optimize training performance with the TensorFlow Profiler on Vertex AI

Cloud, Google Cloud gcp

Training ML models can be computationally expensive. If you’re training models on large datasets, you might be used to model training taking hours, or days, or even weeks. But it’s not just a large volume of data that can increase training time. Nonoptimal implementations such as an inefficient input pipeline or low GPU usage can dramatically increase your training time.

Making sure your programs are running efficiently and without bottlenecks is key to faster training. And faster training makes for faster iteration to reach your modeling goals. That’s why we’re excited to introduce the TensorFlow Profiler on Vertex AI, and share five ways you can gain insights into optimizing the training time of your model. Based on the open source TensorFlow Profiler, this feature allows you to profile jobs on the Vertex AI training service in just a few steps.

Let’s dive in and see how to set this feature up, and what insights you can gain from inspecting a profiling session.

Setting up the TensorFlow Profiler

Before you can use the TensorFlow Profiler, you’ll need to configure Vertex AI TensorBoard to work with your custom training job. You can find step by step instructions on this setup here. Once TensorBoard is set up, you’ll make a few changes to your training code, and your training job config.

Modify training code

First, you’ll need to install the Vertex AI Python SDK with the cloud_profiler plugin as a dependency for your training code. After installing the plugin, there are three changes you’ll make to your training application code.

First, you’ll need to import the cloud_profiler in your training script:

code_block[StructValue([(u’code’, u’from google.cloud.aiplatform.training_utils import cloud_profiler’), (u’language’, u’lang-py’), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e849d34dd10>)])]

Then, you’ll need to initialize the profiler with cloud_profiler.init(). For example:

code_block[StructValue([(u’code’, u’import tensorflow as tfrn rn# create and compile modelrnmodel = tf.keras.models.Sequential(…)rnmodel.compile(…)rn rn# initialize profilerrncloud_profiler.init()’), (u’language’, u’lang-py’), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e84a4562ed0>)])]

Finally, you’ll add the TensorBoard callback to your training loop. If you’re already a Vertex AI TensorBoard user, this step will look familiar.

code_block[StructValue([(u’code’, u’# create callbackrn# use AIP_TENSORBOARD_LOG_DIR to update where logs are writtenrntensorboard_callback = tf.keras.callbacks.TensorBoard(rn log_dir=os.environ[“AIP_TENSORBOARD_LOG_DIR”], histogram_freq=1)rn rn# pass callback to model.fitrnmodel.fit(x_train,rn y_train,rn epochs = EPOCHS,rn callbacks=[tensorboard_callback],)’), (u’language’, u’lang-py’), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e84a45625d0>)])]

You can see an example training script here in the docs.

Configure Custom Job

After updating your training code, you can create a custom job with the Vertex AI Python SDK.

code_block[StructValue([(u’code’, u’from google.cloud import aiplatformrn rn# create custom jobrnjob = aiplatform.CustomJob(project=MY_PROJECT_ID,rn location=REGION,rn display_name=JOB_NAME,rn worker_pool_specs=WORKER_POOL_SPEC,rn base_output_dir=MODEL_DIR,)’), (u’language’, u’lang-py’), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e84a45623d0>)])]

Then, run the job specifying your service account and TensorBoard instance.

code_block[StructValue([(u’code’, u’# run custom jobrnjob.run(service_account=SERVICE_ACCOUNT,rn tensorboard=TENSORBOARD_INSTANCE_NAME)’), (u’language’, u’lang-py’), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e84a4562f90>)])]

Capture Profile

Once you launch your custom job, you’ll be able to see it in the Custom jobs tab on the Training page.

When your training job is in the Training / Running state, a new experiment will appear in the experiments page, click it to open your TensorBoard Instance.

Open TensorBoard

Once you’re there go to the Profiler tab and click Capture profileIn the Profile Service URL(s) or TPU name field, enter workerpool0-0
Select IP address for the Address typeClick CAPTURE

Capture Profile

Note that you can only complete the above steps when your job is in the Training/Running state.

Using the TensorBoard Profiler to analyze performance

Once you’ve captured a profile, there are numerous insights you can gain from analyzing the hardware resource consumption of the various operations in your model. These insights can help you to resolve performance bottlenecks and, ultimately, make the model execute faster.

The TensorFlow Profiler provides a lot of information and it can be difficult to know where to start. So to make things a little easier, we’ve outlined five ways you can get started with the profiler to better understand your training jobs.

Get a high level understanding of performance with the overview page

The TensorFlow Profiler includes an overview page that provides a summary of your training job performance.

Overview Page

Don’t get overwhelmed by all the information on this page! There are three key numbers that can tell you a lot: Device Compute Time, TF Op placement, and Device Compute Precision.

The device compute time lets you know how much of the step time is from actual device execution. In other words, how much time did your device(s) spend on the computation of the forward and backward passes, as opposed to sitting idle waiting for batches of data to be prepared. In an ideal world, most of the step time should be spent on executing the training computation instead of waiting around.

The TF op placement tells you the percentage of ops placed on the device (eg GPU), vs host (CPU). In general you want more ops on the device because that will be faster.

Lastly, the device compute precision shows you the percentage of computations that were 16 bit vs 32 bit. Today, most models use the float32 dtype, which takes 32 bits of memory. However, there are two lower-precision dtypes–float16 and bfloat16– which take 16 bits of memory instead. Modern accelerators can run operations faster in the 16-bit dtypes. If a reduced accuracy is acceptable for your use case, you can consider using mixed precision by replacing more of the 32 bit opts by 16 bit ops to speed up training time.

You’ll notice that the summary section also provides some recommendations for next steps. So in the following sections we’ll take a look at some more specialized profiler features that can help you to debug.

Deep dive into the performance of your input pipeline

After taking a look at the overview page, a great next step is to evaluate the performance of your input pipeline, which generally includes reading the data, preprocessing the data, and then transferring data from the host (CPU) to the device (GPU/TPU).

GPUs and TPUs can reduce the time required to execute a single training step. But achieving high accelerator utilization depends on an efficient input pipeline that delivers data for the next step before the current step has finished. You don’t want your accelerators sitting idle as the host prepares batches of data!

The TensorFlow Profiler provides an Input-pipeline analyzer that can help you determine if your program is input bound. For example, the profile shown here indicates that the training job is highly input bound. Over 80% of the step time is spent waiting for training data. By preparing the batches of data before the next step is finished, you can reduce the amount of time each step takes, thus reducing total training time overall.

Input-pipeline analyzer

This section of the profiler also provides more insights into the breakdown of step time for both the device and host.

For the device-side graph, the red area corresponds to the portion of the step time the devices were sitting idle waiting for input data from the host. The green area shows how much of the time the device was actually working. So a good rule of thumb here is that if you see a lot of red, it’s time to debug your input pipeline!

The Host-side analysis graph shows you the breakdown of processing time on the CPU. For example, the graph shown here is majority green indicating that a lot of time is being spent preprocessing the data. You could consider performing these operations in parallel or even preprocess the data offline.

The Input-pipeline analyzer even provides specific recommendations. But to learn more about how you can optimize your input pipeline, check out this guide or refer to the tf.data best practices doc.

Use the trace viewer to maximize GPU utilization

The profiler provides a trace viewer, which displays a timeline that shows the durations for the operations that were executed by your model, as well as which part of the system (host or device) the op was executed. Reading traces can take a bit of time to get used to, but once you do you’ll find that they are an incredibly powerful tool for understanding the details of your program.

When you open the trace viewer, you’ll see a trace for the CPU and for each device. In general, you want to see the host execute input operations like preprocessing training data and transferring it to the device. On the device, you want to see the ops that relate to actual model training.

On the device, you should see timelines for three stream:

Stream 13 is used to launch compute kernels and device-to-device copies Stream 14 is used for host-to-device copiesStream 15 for device to host copies.

Trace viewer streams

In the timeline, you can see the duration for your training steps. A common observation when your program is not running optimally is gaps between training steps. In the image of the trace view below, there is a small gap between the steps.

Trace viewer steps

But if you see a large gap as shown in the image below, your GPU is idle during that time. You should double check your input pipeline, or make sure you aren’t doing unnecessary calculations at the end of each step (such as executing callbacks).

Gap in steps

For more ways to use the trace viewer to understand GPU performance, check out the guide in the official TensorFlow docs.

Debug OOM issues

If you suspect your training job has a memory leak, you can diagnose it on the memory profile page. In the breakdown table you can see the active memory allocations at the point of peak memory usage in the profiling interval.

Memory breakdown table

In general, it helps to maximize the batch size, which will lead to higher device utilization, and if you’re doing distributed training, amortize the costs of communication across multiple GPUs. Using the memory profiler helps get a sense of how close your program is to peak memory utilization.

Optimize gradient AllReduce for distributed training jobs

If you’re running a distributed training job and using a data parallelism algorithm, you can use the trace viewer to help optimize the AllReduce operation. For synchronous data parallel strategies, each GPU computes the forward and backward passes through the model on a different slice of the input data. The computed gradients from each of these slices are then aggregated across all of the GPUs and averaged in a process known as AllReduce. Model parameters are updated using these averaged gradients.

When going from training with a single GPU to multiple GPUs on the same host, ideally you should experience the performance scaling with only the additional overhead of gradient communication and increased host thread utilization. Because of this overhead, you will not have an exact 2x speedup if you move from 1 to 2 GPUs, for example.

You can check the GPU timeline in your program’s trace view for any unnecessary AllReduce calls, as this results in a synchronization across all devices. But you can also use the trace viewer to get a quick check as to whether the overhead of running a distributed training job is as expected, or if you need to do further performance debugging.

The time to AllReduce should be:

(number of parameters * 4bytes)/ (communication bandwidth)

Note that each model parameter is 4 bytes in size since TensorFlow uses fp32 (float32) to communicate gradients. Even when you have fp16 enabled, NCCL AllReduce utilizes fp32 parameters. You can get the number of parameters in your model from Model.summary.

If your trace indicates that the time to AllReduce was much longer than this calculation, that means you’re incurring additional and likely unnecessary overheads.

Azure, Cloud Azure

The ITSM connector provides a bi-directional connection between Azure and ITSM tools to help track and resolve issues faster.

Read More for the details.

Cloud

What are VPC Service Controls

How VPC Service Controls can help stop attackers

How VPC Service Controls can enable trusted sharing

How VPC Service Controls support Zero Trust access

Next steps with VPC

Best kept security secrets: How Cloud EKM can help resolve the cloud trust paradox

Start by looking at your experience

Define where you need to fill in gaps

Boost your targeted hands-on skills

Build hands on projects

Get valuable cloud knowledge for non-technical roles

Commit to learning in the New Year

#1 “Shift left” in API security

#2 “Knowledge graph” for your enterprise APIs

#3 The imminent need for “omni” control planes

#4 API governance rising through the priority ranks

#5 Evolution of design patterns with multiple API gateways

#6 Driving green value chains with digital twins

#7 Commercializing access to data products

How the appliance works

Vertical use cases

Manufacturing

Retail

Automotive

We’re just getting started

Setting up the TensorFlow Profiler

Using the TensorBoard Profiler to analyze performance

What’s next?