AWS – Amazon MWAA now supports Apache Airflow version 2.4 with Python 3.10
You can now create Apache Airflow version 2.4 environments on Amazon Managed Workflows for Apache Airflow (MWAA) with Python 3.10 support.
Read More for the details.
You can now create Apache Airflow version 2.4 environments on Amazon Managed Workflows for Apache Airflow (MWAA) with Python 3.10 support.
Read More for the details.
Azure Data Explorer now supports ingestion of data from Apache Log4j 2.
Read More for the details.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) C5n instances are available in AWS Region Asia Pacific (Osaka). C5n instances, powered by 3.0 GHz Intel® Xeon® Scalable processors (Skylake) and based on the AWS Nitro System, offer customers up to 100Gbps networking for network-bound workloads, while continuing to take advantage of the security, scalability and reliability of Amazon’s Virtual Private Cloud (VPC). Customers can also take advantage of C5n instances network performance to accelerate data transfer to and from S3, reducing the data ingestion wait time for applications and speeding up delivery of results. They are built for workloads such as High Performance Computing (HPC), analytics, machine learning, Big Data and data lake applications.
Read More for the details.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) R6i instances are available in AWS Region Asia Pacific (Osaka), Middle East (Bahrain), and Africa (Cape Town). R6i instances are powered by 3rd generation Intel Xeon Scalable processors, with an all-core turbo frequency of 3.5 GHz. These instances come with always-on memory encryption using Intel Total Memory Encryption (TME) and are built on AWS Nitro System. The AWS Nitro System is a collection of AWS designed hardware and software innovations that enables the delivery of efficient, flexible, and secure cloud services with isolated multi-tenancy, private networking, and fast local storage. R6i instances are SAP Certified and are built for workloads such as SQL and NoSQL databases, distributed web scale in-memory caches (Memcached and Redis), in-memory databases (SAP HANA), and real time big data analytics (Apache Hadoop and Apache Spark clusters).
Read More for the details.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) R6g instances are available in AWS Region Asia Pacific (Osaka). R6gd instances are available in AWS Regions Asia Pacific (Osaka), Asia Pacific (Seoul) and Europe (Stockholm). M6gd instances are available in AWS Regions Asia Pacific (Osaka) and Africa (Cape Town). These instances are powered by AWS Graviton2 processors, and they are built on AWS Nitro System. The Nitro System is a collection of AWS designed hardware and software innovations that enables the delivery of efficient, flexible, and secure cloud services with isolated multi-tenancy, private networking, and fast local storage. R6g instances are built for running memory-intensive workloads such as open-source databases, in-memory caches, and real time big data analytics. R6gd instances provide local SSD storage and are ideal for memory-intensive workloads that need access to high-speed, low latency storage. M6gd offer a balance of compute, memory, networking, and local SSD resources for a broad set of workloads. They are built for applications such as application servers, microservices, gaming servers, mid-size data stores, and caching fleets that also need access to high-speed, low latency storage.
Read More for the details.
Azure Sphere Security Service EU data processing and storage is now generally available.
Read More for the details.
While cloud security skeptics might believe that data in the cloud is just one access configuration mistake away from a breach, the reality is that a well-designed set of defense in depth controls can help minimize the risk of configuration mistakes and other security issues. Our Virtual Private Cloud (VPC) Service Controls can play a vital role in creating an additional layer of security while also making it easier to manage your data in a way that most cloud services can’t do today.
Organizations across industries and business models use cloud services for activities such as processing their data, performing analytics, and deploying systems. VPC Service Controls can empower an organization when deciding how users and data can cross the perimeter of the supported cloud services, if at all. While VPC Service Controls are designed to help stop attackers, they can also enable contextual trusted data sharing (similar to how Zero Trust allows contextual access).
VPC Service Controls help administrators define a security perimeter around Google-managed services, which can control communication to and between those services. The Service Controls isolate your Google Cloud resources from unauthorized networks, including the internet. For example, this can help you keep a clear separation between services that are allowed to run in production and services that are not.
VPC Service Controls can help you prevent mistakes that lead to costly data breaches because they control access to your data at a granular level. They add context-aware access controls on these services, and can help you achieve your organization’s Zero Trust access goals.
Like wearing two layers of clothing made from different fabrics to protect you from winter weather, VPC Service Controls may appear similar to Identity and Access Management (IAM) but they come from a different approach to implementing security. IAM enables granular identity based access control; VPC Service Controls create a security perimeter that protects your cloud resources and sets up private connectivity to Google Cloud’s APIs and services. While it’s recommended to use both, VPC Service Controls have an added bonus: They can support blocking data theft during a breach.
The additional layer of security that VPC Service Controls offer customers is challenging to achieve with on-premise systems or even with other cloud providers. You can think of it as an firewall for APIs that also adds a logical security control around three paths that data can take:
From the public internet to your resources
Inside your VPC and the cloud service perimeter
For service-to-service communication (for example, denying access to someone who wants to load data to BigQuery or exfiltrate data from a BigQuery instance.)
VPC Service Controls are used to enforce a security perimeter. They can help isolate resources of multi-tenant Google Cloud services, which can help reduce the risk of data exfiltration or a data breach.
For example, a bank that migrated financial data processing to Google Cloud can use VPC Service Controls to isolate their processing pipeline from public access (or any unauthorized access) by defining a trusted service perimeter.
VPC Service Controls are used to securely share data across service perimeters with full control over what resource can connect to other resources, or outside the perimeter. This can help mitigate data exfiltration risks stemming from stolen identities, IAM policy misconfigurations, some insider threats, and compromised virtual machines.
Returning to our bank example, that same bank using VPC Service Controls may securely share or access data across Service Perimeters and Organizations. They may allow access to specific partners and for specific operations.
VPC Service Controls deliver Zero Trust access to multi-tenant Google Cloud services. Clients can restrict access to authorized IPs, client context, user identity, and device parameters while connecting to multi-tenant services from the internet and other services.
A bank can use moving its services to the public cloud as an opportunity to abandon outdated access management approaches and adopt Zero Trust access. VPC Service Controls let them create granular access control policies in Access Context Manager based on attributes such as user location and IP address. For example, it would allow an analyst to only access Big Query from a corporate device on the corporate network during business hours. These policies can help ensure the appropriate security controls are in place when granting access to cloud resources from the Internet.
Check out these pages to learn more about VPC Service Controls for your sensitive cloud deployments, especially for regulated workloads. This blog is the third in our Best Kept Security Secrets series, which includes how to tap into the power of Organization Policy Service and how Cloud EKM can help resolve the cloud trust paradox.
Read More for the details.
Cloud is a great place to grow your career in 2023. Opportunity abounds, with cloud roles offering strong salaries and scope for growth as a constantly evolving field.1 Some positions do not require a technical background, like project managers, product owners and business analysts. For others, like solutions architects, developers and administrators, coding and technical expertise are a must.
Either way, cloud knowledge and experience are required to land that dream job. But where do you start? And how do you keep up with the fast pace of ever-changing cloud technology? Check out these tips below. There are also suggested training opportunities to help support your growth, including no-cost options below!
Your experience can be a great way to get into cloud, even if it seems non-traditional. Think creatively about transferable skills and opportunities. Here are a few scenarios where you might find yourself today:
You already work in IT, but in legacy systems or the data center. Forrest Brazeal, Head of Content Marketing at Google Cloud, talks about that in detail in this video.
Use your sales experience to become a sales engineer, or your communications experience to become a developer advocate. Stephanie Wong, Developer Advocate at Google Cloud, discusses that here.
You don’t have that college degree that is included in the job requirements. I’ve talked about that in a recent video here.
Your company has a cloud segment, but your focus is in another area. Go talk to people! Access your colleagues who do what you want to do. Get their advice for skilling up.
If you are looking at a technical position, you will need to show cloud applicable experience, so learn about the cloud and build a portfolio of work. Here are a few key skills we recommend everyone have to start1:
Code is non-negotiable. People who come from software development backgrounds typically find it easier to get into and maneuver through the cloud environment because of their coding experience. Automation, basic data manipulation and scaling is a daily requirement. If you don’t have a language you already know, learning Python is a great place to begin.
Understand Linux. You’ll need to know the Linux filesystem, basic Linux commands and fundamentals of containerization.
Learn core networking concepts like the IP Protocol and the others that layer on top of it, DNS, and subnets.
Make sure you understand the cloud itself, and in particular the specifics about Google Cloud for a role at Google.
Familiarity with open source tooling. Terraform for automation and Kubernetes for containers are portable between clouds and are worth taking the time to learn.
Check out Google Cloud Skills Boost for a comprehensive collection of training to help you upskill into a cloud role, including hands-on labs that get you real-world experience in Google Cloud. New users can start off with a 30 day no-cost trial2. Take a look at these recommendations:
No-cost labs and courses
A Tour of Google Cloud Hands-on Labs– 45 minutes
A Tour of Google Cloud Sustainability – 60 minutes
Introduction to SQL for BigQuery and Cloud SQL – 60 minutes
Infrastructure and Application Modernization with Google Cloud – Introductory course with three modules
Preparing for Google Cloud certification – Courses to help you prepare for Google Cloud certification exams
This part is critical for the interview portion. Take the cloud skills you have learned and create something tangible that you can use as a story during an interview. Consider building a project on Github so others can see it working live, and document it well. Be sure to include your decision making process. Here is an example:
Build an API or a web application
Develop the code for the application
Pick the infrastructure to deploy that application in the cloud, choose your storage option, and a database with which it will interact
For tech-adjacent roles, like those in business, sales or administration, having a solid knowledge of cloud principles is critical. We recommend completing the Cloud Digital Leader training courses, at no cost. Or go the extra mile and consider taking the Google Cloud Digital Leader Certification exam once you complete the training:
No-cost course
Cloud Digital Leader Learning Path – understand cloud capabilities, products and services and how they benefit organizations
$99 registration fee
Google Cloud Digital Leader Certification – validate your cloud expertise by earning a certification
A couple of other resources we have are the Google Cloud Innovators Program, which will help you grow on Google Cloud and connect you with other community members. There is no-cost to join, and it will give you access build your skills and the future of cloud! Join today.
Start your new year strong, whether you are exploring Google Cloud Data, DevOps or Networking certifications by completing Arcade games each week. This January play to win in The Arcade while you learn new skills and earn prizes on Google Cloud Skills Boost. Each week we will feature a new game to help you show and grow your cloud skills, while sampling certification-based learning paths.
Make 2023 the year to build your cloud career and commit to learning all year, with our $299/year annual subscription. This subscription includes $500 of Google Cloud credits (and a bonus $500 of Google Cloud credits after you successfully certify), a $200 certification voucher, $299 annual subscription to Google Cloud Skills Boost with access to the entire training catalog, live-learning events and quarterly technical briefings with executives.
1. Starting your career in cloud from IT – Forrest Brazeal, Head of Content Marketing, Google Cloud
2. Credit card required to activate a 30 day no-cost trial for new users.
Read More for the details.
Paper currency — which started gaining prominence in the 1600s — changed the face of global economics and ushered in a new era of international monetary regulation. The primary reason currency created such disruption was its ability to standardize the “medium of exchange”. APIs created a similar effect in the world of technology and digitalization by creating a standardized, reusable, and secure way to exchange information.
Modern web APIs took shape in the early 2000s and played a key role in “.com”mercializing every business. APIs started as a connective tissue primarily relegated to a technical context and quickly evolved into a gateway to new business models, revenue streams, and ecosystems. In 2017, McKinsey estimated a total of $1 Trillion in profit could be up for grabs in the API economy. And in 2022, GGV Capital created an index of API-first startups — a generation of stylistically divergent SaaS companies with leaner operating cost structures and organic usage growth. Just as currency is going through an evolution from banknotes to digital wallets, the world of API management is also on the brink of change.
With more than 15 years of experience managing APIs at Google-scale, we’ve got a unique vantage point from which to observe that change. In today’s post, we will spotlight seven API management use cases that we see growing in prominence — and how you can take advantage of these trends to future-proof your architecture.
As a gateway to a wealth of information, APIs have also quickly become the primary attack vector in security incidents. When we surveyed 500 technology leaders, we learned that more than 50% of organizations experienced an API security incident in the last 12 months. Adding to the increasing magnitude of attacks, there are an increasing number of vectors for potential API security incidents like misconfigurations, outdated APIs/data/components, and bots/spam/abuse.
These security issues aren’t just in production APIs, but at every stage in the API lifecycle. Notably, we found that 67% of the issues are discovered during testing as part of the release management process. This trend ushers in the need for forward-thinking organizations to “shift left with security” — moving controls earlier into the production workflow — by bringing security teams and API teams closer. To stay ahead of security threats, many organizations are actively looking for solutions that allow them to be proactive while minimizing the burden on their security teams. According to our research, integrating capabilities that proactively identify security threats (60%) is top of mind for most IT leaders for the next year.
It comes as no surprise that every organization is relying on APIs to expand and even ground their digital ecosystem — a network of partners, developers, and customers facilitated by modern, cloud-first technologies. There is a growing magnitude and variety of middleware assets, contributing to the growth of IT complexity.
As the number of APIs continues to increase, there is a need to simplify consumption for internal and external developers. Even the most objectively useful APIs remain unseen by most of the organization. In turn this results in redundant code, reduced developer productivity, or worse — turned into a potential security attack vector. This complexity is shifting focus towards consolidation of all middleware assets, growing adoption, and improving education (see below) to improve developer efficiency and de-mystify the IT complexity.
This sprawl is a growing problem in the world of APIs, but it has a lot in common with an age-old phenomenon in the world of web pages and content—search. Google was born out of this problem to help organize the world’s information. Similar to Google’s knowledge graph for web pages, there is a need to index, organize, and instantly present API information for developers that need it. Although it is an emerging practice, we see an increasing number of digital leaders and security teams in larger organizations with mature API programs invest in solutions that help consolidate all APIs, organize their information, and manage their lifecycle.
APIs have taken on such a vital role in the modern application stack that they have slowly become the neural links across the entire enterprise architecture — bridging legacy and modern applications, shifting architectures towards microservices, and enabling operations across heterogeneous environments. To support all these technological decisions without sacrificing speed, organizations adopted multiple API gateways and fragmented API management solutions. However, this led to a lack of universal visibility, consistent governance, comprehensive security, and meaningful analytics across ALL the enterprise APIs (not just the ones within the confines of a given API management solution). And it increases the maintenance costs — fundamentally undercutting the value of APIs. With this evolution there is a growing need for an omni control plane — analogous to the brain in a human body — across all enterprise APIs.
Despite the clear need for governance, there is still no unified understanding on a good (or right) approach to API governance. With the rapid adoption of APIs without appropriate standardization or quality standards, API governance is top of mind for IT leaders, again.
According to our research, 45% of IT leaders identified API governance as a critical component of their API program. The top three components of API security, performance analytics, and governance demonstrate the critical need for visibility, quality, and security across all APIs.
As digital consumers, we have seen this phenomenon across many industries and digital products. For example AirBnB disrupted the short-term rental market by providing standardized listings, detailed information, and high-resolution photos. In fact, the same governance phenomenon is ubiquitous in the world of e-commerce where there is a clear correlation between a high-quality website or product listing and increasing sales.
The same analogy holds true in the world of APIs, as ~90% of developers use APIs in their work there is a direct correlation between the use of APIs and developer productivity. Digital officers and CIOs need to add appropriate governance controls to standardize API design and improve reuse without adding friction to development timelines.
Adoption of new API architectural styles and microservices increased the complexity of the modern application stack. Our research found that 54% of organizations use a service mesh and API management in conjunction today to support the API gateway design pattern. In parallel, there is broad adoption of new protocols like GraphQL or AsyncAPI, outpacing the innovation in API gateways. For example, in a recent survey from DZone found that GraphQL accounted for 22.7% of application integrations.
In response to this challenge IT teams are adopting multiple API gateways — by design — which is creating the need for complex communication patterns for future scalability. But the existing design patterns were mostly sufficient when client applications used homogeneous API protocols (Ex: REST). Although patterns like Backend For FrontEnd (BFF) intended to provide specific API interactions that are relevant on a per-client basis, they still did not account for complexities from multiple gateways and protocols. In response to the adoption of new protocols, there is a need to evolve the existing BFF pattern to account for multiple API gateways and protocols.
A digital twin is an effectively indistinguishable virtual representation of a physical object, system, or a process. For example the digital twin of a wind turbine (the object being studied) might be used to capture data like performance, rpm (revolutions per minute), or output captured by various sensors outfitted on the turbine. Digital twin adoption is growing and McKinsey estimates investments in digital twins will reach $58 billion by 2026 with a 58% CAGR. Every digital twin uses APIs to monitor, engage, and possibly control the physical asset. For example, Google created the Digital Buildings project — an open source, Apache-licensed effort to manage applications and analyses between a large heterogeneous portfolio of buildings.
Sustainability is one of the driving forces behind the increased use of digital twins. As the need to reach net zero emissions accelerates, many organizations are tying performance (and in some cases even executive pay) to environmental, social, and governance goals. APIs help connect the dots between digital twins and sustainability. For example, an organization operating a manufacturing process could build a digital twin with APIs to collect behavioral data from sensors, monitoring systems, or other sources — which can eventually be integrated into the organization’s digital platform or applications. These digital twins could be used to analyze and optimize the use of materials and energy, to minimize waste and emissions. Additionally, digital twins could be used to monitor and analyze the performance of systems over time, to identify opportunities for continuous improvement.
Overall, APIs play a valuable role in supporting sustainability efforts by enabling digital twins, effectively driving more efficient operation of systems, and providing insights to improve environmental impact. For further examples, check out this video about driving a green value chain with APIs.
The growing use of data-rich services (like IoT, ML models, remote access services, and web scraping, etc.) coupled with massive ingestion of data everyday is creating massive growth in data delivery paradigms like data lakehouses, data marketplaces, and data streaming systems (global data marketplaces alone are poised to reach $3.5 billion by 2028). Unfortunately, most of these systems are fragmented with almost no relationship or interoperability.
APIs are filling this critical gap for organizations in two critical ways. First, APIs are providing standard and easy access to systems like data lakehouses or analytics hubs. Second, APIs are a key enabler of data products (digital products or services built using data as a core value proposition), a core component of any data sharing system. APIs provide a standardized way for different applications to interact with the data product. For example, an API could be used to allow a mobile app to access data from a weather forecast or a recommendation engine data product. Beyond data products, APIs also provide easy and standardized access various data management platform
APIs continue to play a critical role in every application, experience, and ecosystem. Robust API strategies help organizations adapt to any architecture, business model, or environment in the face of changing technology landscape. Learn more about how Apigee is driving innovation and helping companies future proof their architectures to stay ahead of the top API trends.
Read More for the details.
While many organizations are driving digital transformation by migrating to the cloud, there are some industries, geographies, and use cases that require a different approach to cloud modernization. Regulated industries such as healthcare, insurance, pharmaceutical, energy, telecommunication, and banking have stringent data residency and sovereignty requirements. Other industries need to meet local data processing requirements, while others require real-time data processing with sub-millisecond latencies, for example to detect defects on manufacturing lines. These use cases demand a combination of edge, on-premises and cloud services for their infrastructure.
With these requirements in mind, Google Cloud launched Google Distributed Cloud powered by Anthos to extend the power of Google Cloud infrastructure and services to the edge (or closer). The underlying infrastructure for this service comes in two variants: a 42U rack filled with compute, storage, and networking devices called Google Distributed Cloud Edge Rack and a 1U appliance called Google Distributed Cloud Edge Appliance.
In this blog post, we discuss the Google Distributed Cloud Edge Appliance and how manufacturing, retail, and automotive industry verticals can use it to address common use cases.
But first, let’s talk about the Google Distributed Cloud Edge Appliance itself.
Google Distributed Cloud Edge Appliances comprises two components: (1) Distributed Cloud Edge infrastructure and (2) the Distributed Cloud Edge service.
The Google Distributed Cloud Edge service runs on Google Cloud and serves as a control plane for the nodes and clusters running on your appliance. In order to perform remote management of the appliance and to collect metrics, the Distributed Cloud Edge service must be connected to Google Cloud at all times, allowing you to manage your workloads on the edge hardware through the Google Cloud Console. For customers who can’t be connected at all times for data residency or sovereignty reasons, we highly recommend that the appliance be connected to the cloud at least once a month to allow for needed security patches and updates.
Google Distributed Cloud Edge Appliances come with built-in network ports that provide connectivity to the control plane via the internet, Cloud VPN, or Dedicated Interconnect, and to your on-prem network. Each Google Distributed Cloud Edge Appliance is homed to a specific Google Cloud region but it is designed to also use any public Google Cloud endpoint to communicate with the control plane in Google Cloud, allowing you to move these appliances between different geographic locations.
There are two NFS shares on each appliance; one is offline, meaning it does not transfer data to Google Cloud, and the other is online, meaning data saved to that share is synced to Cloud Storage on Google Cloud for further processing. The appliance supports Server Message Block (SMB) and Secure File Transfer Protocols (SFTP) for communication.
Each Google Distributed Cloud Edge Appliance runs Google Distributed Cloud Virtual, enabling you to build a single-node Kubernetes cluster with access to the underlying file system of the appliance. This allows you to build containerized applications on the underlying appliance hardware to address use cases in the following verticals.
Now that you understand how Google Cloud Edge Appliance is configured, let’s consider some of the industry use cases where it can provide unique value.
In the manufacturing industry, quality control and safety is a crucial factor. Businesses need to ensure products are manufactured to the highest standards to remain competitive in their markets, to retain customers, and to keep factory workers safe. To do this, manufacturers need real-time data about the products being manufactured on the production lines, ensuring quality control and gaining a real-time view of where people are on the factory floor.
In manufacturing environments, Google Distributed Cloud Edge Appliance can be used to detect hazards or manufacturing defects in real-time. Figure 1 is a reference architecture for a hazard detection solution running off a Google Distributed Cloud Edge Appliance on a factory floor.
In this architecture, cameras on the factory floor stream live video into the Google Distributed Cloud Edge Appliance. Depending on the number of cameras and appliances, cameras could be split or mapped to different appliances. This architecture makes it possible to initially transfer video data to Google Cloud using an online NFS share. Once in Google Cloud, you can use AutoML to train and build models that can be used as part of the hazard detection solution.
With these trained models, the cameras can stream video data into the appliance using the real-time streaming protocol (RTSP). You can then use AutoML inference to analyze the real-time video streaming data.
For example, in this reference architecture, if an individual comes too close to the fork lift, a function is triggered by the microservices running on the edge appliance that pushes a notification either to a messaging service, or to an enterprise resource planning tool. This alerts factory managers to factory floor hazards in real time so they can take corrective action.
You can also review messages and videos later on for preventive planning purposes, or push streamed videos to Cloud Storage for archive, to use the appliance’s storage space more efficiently.
Data transfers to Google Cloud can be done over Google Cloud DedicatedInterconnect, or VPN between the region and your site. This connectivity also allows you to send the appliance’s control-plane network traffic to the region.
You could also use the reference architecture in figure 2 for a product anomaly detection solution running off a Google Distributed Cloud Edge Appliance on a factory floor or manufacturing line. In this instance, machine learning models are trained to detect anomalies on finished products before final packaging.
In the retail industry, the Google Distributed Cloud Edge Appliance reference architecture in Figure 2 enables a number of transformative capabilities for retail operations, including:
contactless checkout
product scans
mobile-scan-bag
cashierless checkout
unattended retail shops
visual check-out monitoring
It does all this within a retailer’s facilities with the low latency and high throughput you need to process data locally, so you can obtain actionable insights from your data.
Or, you could use Google Distributed Cloud Edge Appliance at the edge to overhaul store management operations, for example, monitoring store occupancy, queue depth and wait times, detecting slips and falls and out-of-stock items, or monitoring inventory compliance.
Advanced Driver Assistance Systems (ADAS) are becoming standard in modern automobiles. To successfully build and roll out continued improvements around ADAS, the automotive industry continues to run extensive tests on ADAS systems that are built into the vehicles they manufacture. Automotive companies can use Google Distributed Cloud Edge Appliance to modernize and transform how they collect data for the ADAS systems they’re developing. For example, test vehicles contain several different sensors that generate data, which can be quickly offloaded to an in-vehicle edge appliance.
Then, within the appliance, you can deploy containerized workloads to transform sensor data, infer videos and images and detect events. This alleviates the need for operators to label all events and allows development teams to quickly gather insights from the tests.
If you want to focus on a subset of information, you can transfer specific data or the entire data payload into Transfer Appliances when vehicles return to the development center. All these systems, i.e., transfer appliances and edge appliances, work in tandem to reduce local system administration and operational costs through a cloud-based control plane.
This approach allows you to deploy, track, monitor and configure services that are running in data centers or at edge locations from the cloud. From the factories, the data can be moved offline or online into Google Cloud where you can use different storage classes and processing capabilities to further process or store the data. You can also deploy newly trained models and business rules back to the edge appliances. In all this, data transfers between the cloud and the appliance are performed using end-to-end encryption, to give you control over your data.
The reference architecture in Figure 3 shows an ADAS implementation where Google Distributed Cloud Edge Appliance is being used to gather, process and transform data at the edge in the automotive industry. It could also be applied to data capture and processing use cases in manned and unmanned vehicles. Notice how the Distributed Edge Appliance extends to the cloud by sending data there, or using other cloud-based services.
These are just a few of the use cases where organizations in the manufacturing, retail and automotive industries are using Google Distributed Cloud Edge Appliance with modern and containerized applications that are powered by Google Cloud. If you’re interested in bringing the power of Google Cloud to the edge using Google Distributed Cloud Edge Appliances to transform your business, reach out to us or any of our accredited partners.
Read More for the details.
After the momentum of COP27 and the increasing commitments to climate solutions from nations and corporations throughout 2022, we’re excited to enter a more sustainable 2023. As more organizations navigate the necessary, sustainability-driven business transformations on the horizon, they’re looking for new tools and technology to help them. We believe the global startup ecosystem plays a pivotal role in accelerating their progress, which is why we’re excited to kick-off 2023 with Google for Startups Accelerator: Climate Change programs, and encourage startups to apply to North America or Europe.
These programs will focus on identifying, supporting and scaling startups that are building technologies to combat climate change. This is the first time we’re running the program in Europe, and we’re looking forward to welcoming participants in our region at a critical time when technology can unblock and accelerate decarbonization of the economy.
Europe has committed to a climate neutral economy by 2050, but to achieve that goal, technology and increasing digitalization will be key. According to a recent study we commissioned from ICG, digital technology has the potential to positively impact all sectors of the economy in achieving decarbonization targets. European companies will be looking to startups for new technologies and solutions to accelerate their transformation.
Startups accepted into the cohort will be matched with our network of experts for advice, mentorship, and support across a range of domains and subject matters. The program will have an increased focus on cloud technology, artificial intelligence and machine learning to help early-stage participants advance their business and technology. Participants will hear talks from climate, sustainability and technology experts, get access to technical mentorship and support, and receive training on the Google Cloud tools that can help accelerate their progress.
This year promises to be a boom not only for climate-tech, but technology that enables sustainable innovation across the value chain. There’s a huge opportunity for startups to address these needs and only two weeks left to apply for the program, so don’t miss out!
Read More for the details.
As one of the world’s largest home improvement retailers, The Home Depot (THD) needs to seamlessly manage the flow of work across a wide variety of IT systems to provide first-class retail experiences for our customers both online and in-store. We began our cloud migration journey back in 2017, with early successes leading us to move more of our workloads to the cloud. This shift drove engineering and technical innovation across the company, but has also increased the need for greater governance to ensure compliance and strong cybersecurity.
Our “Cloud Enablement” team needed a workflow management system that could streamline cloud resource creation projects across the company’s engineering teams. Our main challenge was to create a frictionless developer experience while still securing workflows and enforcing best practices. Most importantly, we were searching for a fully managed solution to workflow automation and governance needs. This would help cloud project creation become more efficient and cost-effective, with developers able to focus on writing and coding the resources themselves, rather than on managing infrastructure. We also wanted our engineers to be able to quickly track and trace their workflows at all stages of the development process — especially when they failed to complete — to increase internal efficiencies related to troubleshooting and resource downtime.
After rigorous testing of potential workflow systems, The Home Depot determined Google Workflows was the ideal solution to meet our needs. Workflows is a fully managed Google Cloud service that enables service integration and orchestration, lightweight data and machine learning pipeline orchestration, and cloud platform automation. This low-code, serverless solution allows developers to focus on their core business needs, rather than on their infrastructure management, which enables them to facilitate the creation of more resilient cloud solutions. Not only can Workflows call any internal or external HTTP endpoint, its workflows are also durable — a workflow supports retries and can wait up to one year for operations to complete without incurring any charges. Other key features include support for human-in-the-middle approval flows and for parallel iteration and branching using atomic variables. Workflow executions can be triggered by an event, allowing developers to build applications and pipelines with event-driven architectures and allowing IT and cloud platform teams to trigger automation workflows based on an event, such as a request from a developer for cloud resources.
The Home Depot successfully launched our Galaxy system, which builds upon two previous generations of testing and development, by leveraging the powerful automation and orchestration functionalities of Workflows. One of the biggest benefits of Workflows’ processes is their composability. We can now deploy a microservices approach that enables our IT teams to use this new self-service system to quickly compose new cloud services by drawing from a series of pre-built and custom workflow tasks.
Google Workflows forms the backbone of THD’s Galaxy system, which orchestrates approvals in ServiceNow and cloud resource creation through a GitOps workflow that uses Terraform Enterprise. Galaxy’s user-friendly interface supports the speedy provisioning of cloud resources and makes valuable use of Workflows’ subworkflows feature to ensure authentication and authorization requirements are met and that common error handling issues can be easily resolved. The system even automates pull request flows and provides status endpoints for those requests to populate a real time tracker for developer feedback.
With this self-service Galaxy platform, The Home Depot’s developers can now easily integrate all the required subworkflows as part of their cloud project creation process. This enables developers to focus on writing the resources they need to tackle their business problems and service their customers’ needs, rather than losing valuable time rewriting generic request flows or securing all the necessary permissions for governance purposes. Although THD briefly explored authoring our own custom workflow automation engine, we realized that by utilizing Workflows we could easily achieve our workflow orchestration needs, all with GCP native services. Our developers are also able to draw upon Google’s existing bank of high-quality technical documentation and training resources, rather than expending developer time and resources producing our own custom documentation.
The Home Depot’s Galaxy system empowers our engineering teams by ensuring an accelerated path for the development and testing of new resources and services within our cloud community. Harnessing the efficiency and composability of Google Workflows, this new self-service system is already delivering feature parity with THD’s existing governance and workflow solutions. We are now turning our attention to building easy self-service paths to service web applications in multi- and single-tenant GKE clusters, while ensuring these resources remain fully conformant with THD’s security and compliance policies.
To learn more about Workflows, Google’s serverless orchestration engine, visit the Workflows landing page today, or go directly to the Cloud Console to try out three of its most common patterns.
Read More for the details.
Training ML models can be computationally expensive. If you’re training models on large datasets, you might be used to model training taking hours, or days, or even weeks. But it’s not just a large volume of data that can increase training time. Nonoptimal implementations such as an inefficient input pipeline or low GPU usage can dramatically increase your training time.
Making sure your programs are running efficiently and without bottlenecks is key to faster training. And faster training makes for faster iteration to reach your modeling goals. That’s why we’re excited to introduce the TensorFlow Profiler on Vertex AI, and share five ways you can gain insights into optimizing the training time of your model. Based on the open source TensorFlow Profiler, this feature allows you to profile jobs on the Vertex AI training service in just a few steps.
Let’s dive in and see how to set this feature up, and what insights you can gain from inspecting a profiling session.
Before you can use the TensorFlow Profiler, you’ll need to configure Vertex AI TensorBoard to work with your custom training job. You can find step by step instructions on this setup here. Once TensorBoard is set up, you’ll make a few changes to your training code, and your training job config.
Modify training code
First, you’ll need to install the Vertex AI Python SDK with the cloud_profiler plugin as a dependency for your training code. After installing the plugin, there are three changes you’ll make to your training application code.
First, you’ll need to import the cloud_profiler in your training script:
Then, you’ll need to initialize the profiler with cloud_profiler.init(). For example:
Finally, you’ll add the TensorBoard callback to your training loop. If you’re already a Vertex AI TensorBoard user, this step will look familiar.
You can see an example training script here in the docs.
Configure Custom Job
After updating your training code, you can create a custom job with the Vertex AI Python SDK.
Then, run the job specifying your service account and TensorBoard instance.
Capture Profile
Once you launch your custom job, you’ll be able to see it in the Custom jobs tab on the Training page.
When your training job is in the Training / Running state, a new experiment will appear in the experiments page, click it to open your TensorBoard Instance.
Once you’re there go to the Profiler tab and click Capture profileIn the Profile Service URL(s) or TPU name field, enter workerpool0-0
Select IP address for the Address typeClick CAPTURE
Note that you can only complete the above steps when your job is in the Training/Running state.
Once you’ve captured a profile, there are numerous insights you can gain from analyzing the hardware resource consumption of the various operations in your model. These insights can help you to resolve performance bottlenecks and, ultimately, make the model execute faster.
The TensorFlow Profiler provides a lot of information and it can be difficult to know where to start. So to make things a little easier, we’ve outlined five ways you can get started with the profiler to better understand your training jobs.
Get a high level understanding of performance with the overview page
The TensorFlow Profiler includes an overview page that provides a summary of your training job performance.
Don’t get overwhelmed by all the information on this page! There are three key numbers that can tell you a lot: Device Compute Time, TF Op placement, and Device Compute Precision.
The device compute time lets you know how much of the step time is from actual device execution. In other words, how much time did your device(s) spend on the computation of the forward and backward passes, as opposed to sitting idle waiting for batches of data to be prepared. In an ideal world, most of the step time should be spent on executing the training computation instead of waiting around.
The TF op placement tells you the percentage of ops placed on the device (eg GPU), vs host (CPU). In general you want more ops on the device because that will be faster.
Lastly, the device compute precision shows you the percentage of computations that were 16 bit vs 32 bit. Today, most models use the float32 dtype, which takes 32 bits of memory. However, there are two lower-precision dtypes–float16 and bfloat16– which take 16 bits of memory instead. Modern accelerators can run operations faster in the 16-bit dtypes. If a reduced accuracy is acceptable for your use case, you can consider using mixed precision by replacing more of the 32 bit opts by 16 bit ops to speed up training time.
You’ll notice that the summary section also provides some recommendations for next steps. So in the following sections we’ll take a look at some more specialized profiler features that can help you to debug.
Deep dive into the performance of your input pipeline
After taking a look at the overview page, a great next step is to evaluate the performance of your input pipeline, which generally includes reading the data, preprocessing the data, and then transferring data from the host (CPU) to the device (GPU/TPU).
GPUs and TPUs can reduce the time required to execute a single training step. But achieving high accelerator utilization depends on an efficient input pipeline that delivers data for the next step before the current step has finished. You don’t want your accelerators sitting idle as the host prepares batches of data!
The TensorFlow Profiler provides an Input-pipeline analyzer that can help you determine if your program is input bound. For example, the profile shown here indicates that the training job is highly input bound. Over 80% of the step time is spent waiting for training data. By preparing the batches of data before the next step is finished, you can reduce the amount of time each step takes, thus reducing total training time overall.
Input-pipeline analyzer
This section of the profiler also provides more insights into the breakdown of step time for both the device and host.
For the device-side graph, the red area corresponds to the portion of the step time the devices were sitting idle waiting for input data from the host. The green area shows how much of the time the device was actually working. So a good rule of thumb here is that if you see a lot of red, it’s time to debug your input pipeline!
The Host-side analysis graph shows you the breakdown of processing time on the CPU. For example, the graph shown here is majority green indicating that a lot of time is being spent preprocessing the data. You could consider performing these operations in parallel or even preprocess the data offline.
The Input-pipeline analyzer even provides specific recommendations. But to learn more about how you can optimize your input pipeline, check out this guide or refer to the tf.data best practices doc.
Use the trace viewer to maximize GPU utilization
The profiler provides a trace viewer, which displays a timeline that shows the durations for the operations that were executed by your model, as well as which part of the system (host or device) the op was executed. Reading traces can take a bit of time to get used to, but once you do you’ll find that they are an incredibly powerful tool for understanding the details of your program.
When you open the trace viewer, you’ll see a trace for the CPU and for each device. In general, you want to see the host execute input operations like preprocessing training data and transferring it to the device. On the device, you want to see the ops that relate to actual model training.
On the device, you should see timelines for three stream:
Stream 13 is used to launch compute kernels and device-to-device copies Stream 14 is used for host-to-device copiesStream 15 for device to host copies.
Trace viewer streams
In the timeline, you can see the duration for your training steps. A common observation when your program is not running optimally is gaps between training steps. In the image of the trace view below, there is a small gap between the steps.
Trace viewer steps
But if you see a large gap as shown in the image below, your GPU is idle during that time. You should double check your input pipeline, or make sure you aren’t doing unnecessary calculations at the end of each step (such as executing callbacks).
For more ways to use the trace viewer to understand GPU performance, check out the guide in the official TensorFlow docs.
Debug OOM issues
If you suspect your training job has a memory leak, you can diagnose it on the memory profile page. In the breakdown table you can see the active memory allocations at the point of peak memory usage in the profiling interval.
In general, it helps to maximize the batch size, which will lead to higher device utilization, and if you’re doing distributed training, amortize the costs of communication across multiple GPUs. Using the memory profiler helps get a sense of how close your program is to peak memory utilization.
Optimize gradient AllReduce for distributed training jobs
If you’re running a distributed training job and using a data parallelism algorithm, you can use the trace viewer to help optimize the AllReduce operation. For synchronous data parallel strategies, each GPU computes the forward and backward passes through the model on a different slice of the input data. The computed gradients from each of these slices are then aggregated across all of the GPUs and averaged in a process known as AllReduce. Model parameters are updated using these averaged gradients.
When going from training with a single GPU to multiple GPUs on the same host, ideally you should experience the performance scaling with only the additional overhead of gradient communication and increased host thread utilization. Because of this overhead, you will not have an exact 2x speedup if you move from 1 to 2 GPUs, for example.
You can check the GPU timeline in your program’s trace view for any unnecessary AllReduce calls, as this results in a synchronization across all devices. But you can also use the trace viewer to get a quick check as to whether the overhead of running a distributed training job is as expected, or if you need to do further performance debugging.
The time to AllReduce should be:
(number of parameters * 4bytes)/ (communication bandwidth)
Note that each model parameter is 4 bytes in size since TensorFlow uses fp32 (float32) to communicate gradients. Even when you have fp16 enabled, NCCL AllReduce utilizes fp32 parameters. You can get the number of parameters in your model from Model.summary.
If your trace indicates that the time to AllReduce was much longer than this calculation, that means you’re incurring additional and likely unnecessary overheads.
The TensorFlow Profiler is a powerful tool that can help you to diagnose and debug performance bottlenecks, and make your model train faster. Now you know five ways you can use this tool to understand your training performance. To get a deeper understanding of how to use the profiler, be sure to check out the GPU guide and data guide from the official TensorFlow docs. It’s time for you to profile some training jobs of your own!
Read More for the details.
Application Auto Scaling now offers customers more visibility about the scaling decisions it takes for an auto scaled resource. Application Auto Scaling (AAS) is a service that offers standardized experience across 13 different AWS services beyond Amazon EC2, for example Amazon DynamoDB provisioned read and write capacity, and Amazon Elastic Container Service (ECS) services. Application Auto Scaling takes scaling actions based on the customer-defined scaling policies that act as a guideline for scaling decisions. Until now, customers only got details about successful and not about deferred scaling actions. With this feature, customers get more insights about scaling decisions that do not lead to a scaling action in both descriptive and machine-readable format.
Read More for the details.
Starting today, Amazon CloudWatch Logs is removing the 5 requests per second log stream quota when calling Amazon CloudWatch Logs PutLogEvents API. There will be no new per log stream quota. With this change we have removed the need for splitting your log ingestion across multiple log streams to prevent log stream throttling.
Read More for the details.
Amazon Kinesis Data Streams for Amazon DynamoDB is now available in 11 additional AWS Regions around the world. With Amazon Kinesis Data Streams, you can capture item-level changes in your DynamoDB tables as a Kinesis data stream with a single click in the DynamoDB console, or by using the AWS API, CLI or CloudFormation templates.
Read More for the details.
Amazon Aurora MySQL Version 3 (Compatible with MySQL 8.0) now offers support for Backtrack. Backtrack allows you to move your database to a prior point in time without needing to restore from a backup, and it completes within seconds, even for large databases.
Read More for the details.
This insight provides information on key items worn by individuals within a video and the timestamp in which the clothing appears.
Read More for the details.
In Dec 2022, the following updates and enhancements were made for Azure Backup for SAP HANA – a backint certified database backup solution for SAP HANA databases in Azure VMs.Long term retention for…
Read More for the details.
The ITSM connector provides a bi-directional connection between Azure and ITSM tools to help track and resolve issues faster.
Read More for the details.