GCP – Announcing partner-delivered professional services on Google Cloud Marketplace
Post Content
Read More for the details.
Post Content
Read More for the details.
Amazon EC2 now supports automated recovery of Microsoft SQL Server databases from Volume Shadow Copy Services (VSS) based EBS snapshots. Customers can use an AWS Systems Manager Automation Runbook and specify a restore point of time to automate the recovery process without needing to stop a running Microsoft SQL Server database.
Volume Shadow Copy Services (VSS) allows application data to be backed up while applications are still running. This new feature allows customers to automate the recovery from VSS-based EBS snapshots and ensure rapid recovery of large databases within minutes. This feature also offers customers the flexibility to restore to a new database or achieve point-in-time recovery.
This feature is available in all commercial AWS Regions and the AWS GovCloud (US) Regions.
To learn more, visit this technical document in the Microsoft SQL Server on Amazon EC2 User Guide.
Read More for the details.
AWS CodeBuild now offers native support for self-hosted Buildkite runners, enabling you to execute Buildkite pipeline jobs within the CodeBuild environment. AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages ready for deployment.
Buildkite is a continuous integration and continuous delivery platform. With this feature, your Buildkite jobs can access CodeBuild’s comprehensive suite of instance types and managed images, and utilize native integrations with AWS services. You have control over the build environment, without the overhead of manually provisioning and scaling the compute resources.
The Buildkite runner feature is available in all regions where CodeBuild is offered. For more information about the AWS Regions where CodeBuild is available, see the AWS Regions page.
To use the self-hosted Buildkite runners, follow the tutorial to set up a runner project in CodeBuild. To learn more about how to get started with CodeBuild, visit the AWS CodeBuild product page.
Read More for the details.
A new minor version of Microsoft SQL Server is now available on Amazon RDS for SQL Server, providing performance enhancements and security fixes. Amazon RDS for SQL Server now supports this latest minor version of SQL Server 2019 across the Express, Web, Standard, and Enterprise editions.
We encourage you to upgrade your Amazon RDS for SQL Server database instances at your convenience. You can upgrade with just a few clicks in the Amazon RDS Management Console or by using the AWS CLI. Learn more about upgrading your database instances from the Amazon RDS User Guide. The new minor version is SQL Server 2019 CU30 – 15.0.4415.2.
This minor version is available in all AWS commercial regions where Amazon RDS for SQL Server databases are available, including the AWS GovCloud (US) Regions.
Amazon RDS for SQL Server makes it simple to set up, operate, and scale SQL Server deployments in the cloud. See Amazon RDS for SQL Server Pricing for pricing details and regional availability.
Read More for the details.
Amazon Connect now provides the ability to choose which states an agent can be in when adhering to their schedule, making it easier for you to customize adherence tracking to match your unique operational needs. With this launch, you can now define custom mappings between agent statuses and schedule activities. For example, schedule activity “Work” can be mapped to multiple agent statuses such as “Available” and “Back-office work.” An agent scheduled for “Work” from 8 AM to 10 AM will be considered adherent if they are either in “Available” or “Back-office work” status. Additionally, you can now view the actual name of the scheduled activity in the real-time adherence dashboard (as opposed to only Productive/Non-productive). With custom mappings and enhanced real-time dashboard, this launch provides more accurate and flexible agent adherence monitoring.
This feature is available in all AWS Regions where Amazon Connect agent scheduling is available. To learn more about Amazon Connect agent scheduling, click here.
Read More for the details.
Written By: Jacob Paullus, Daniel McNamara, Jake Rawlins, Steven Karschnia
Mandiant exploited flaws in the Microsoft Software Installer (MSI) repair action of Lakeside Software’s SysTrack installer to obtain arbitrary code execution.
An attacker with low-privilege access to a system running the vulnerable version of SysTrack could escalate privileges locally.
Mandiant responsibly disclosed this vulnerability to Lakeside Software, and the issue has been addressed in version 11.0.
Building upon the insights shared in a previous Mandiant blog post, Escalating Privileges via Third-Party Windows Installers, this case study explores the ongoing challenge of securing third-party Windows installers. These vulnerabilities are rooted in insecure coding practices when creating Microsoft Software Installer (MSI) Custom Actions and can be caused by references to missing files, broken shortcuts, or insecure folder permissions. These oversights create gaps that inadvertently allow attackers the ability to escalate privileges.
As covered in our previous blog post, after software is installed with an MSI file, Windows caches the MSI file in the C:WindowsInstaller
folder for later use. This allows users on the system to access and use the “repair” feature, which is intended to address various issues that may be impacting the installed software. During execution of an MSI repair, several operations (such as file creation or execution) may be triggered from an NT AUTHORITYSYSTEM
context, even if initiated by a low-privilege user, thereby creating privilege escalation opportunities.
This blog post specifically focuses on the discovery and exploitation of CVE-2023-6080, a local privilege escalation vulnerability that Mandiant identified in Lakeside Software’s SysTrack Agent version 10.7.8.
Mandiant began by using Microsoft’s Process Monitor (ProcMon) to analyze and review file operations executed during the repair process of SysTrack’s MSI. While running the repair process as a low-privileged user, Mandiant observed file creation and execution within the user’s %TEMP%
folder from MSIExec.exe
.
Figure 1: MSIExec.exe copying and executing .tmp file in user’s %TEMP% folder
Each time Mandiant ran the repair functionality, MSIExec.exe
wrote a new .tmp
file to the %TEMP%
folder using a formula-based name, and then executed it. Mandiant discovered, through dynamic analysis of the installer, that the name generated by the repair function would consist of the string “wac” followed by four randomly chosen hex characters (0-9, A-F). With this naming scheme, there were 65,535 possible filename options.
Due to the %TEMP%
folder being writable by a low-privilege user, Mandiant tested the behavior of the repair tool when all possible filenames already existed within the %TEMP%
folder. Mandiant created a PowerShell script to copy an arbitrary test executable to each possible file name in the range of wac0000.tmp
to wacFFFF.tmp
.
# Path to the permutations file
$csvFilePath = ‘.permutations.csv’
# Path to the executable
$exePath = ‘.test.exe’
# Target directory (using the system’s temp directory)
$targetDirectory = [System.IO.Path]::GetTempPath()
# Read the csv file content
$csvContent = Get-Content -Path $csvFilePath
# Split the content into individual values
$values = $csvContent -split “,”
# Loop through each value and copy the exe to the target directory with the new name
Foreach ($value in $values) {
$newFilePath = Join-Path -Path $targetDirectory -ChildPath ($value + “.tmp”)
Copy-Item -Path $exePath -Destination $newFilePath
}
Write-Output “Copy operation completed to $targetDirectory”
Figure 2: Creating all possible .tmp files in %TEMP%
Figure 3: Excerpt of .tmp files created in %TEMP%
After filling the previously identified namespace, Mandiant reran the MSI repair function to observe its subsequent behavior. Upon review of the ProcMon output, Mandiant observed that when the namespace was filled, the application would failover to an incrementing filename pattern. The pattern began with wac1.tmp
and incremented the number each time in a predictable pattern, if the previous file existed. To prove this theory, Mandiant manually created wac1.tmp
and wac2.tmp
, then observed the MSI repair action in ProcMon. When running the MSI repair function, the resulting filename was wac3.tmp
.
Figure 4: MSIExec.exe writing and executing a predicted .tmp file
Additionally, Mandiant observed that there was a small delay between the file write action and the file execution action, which could potentially result in a race condition vulnerability. Since Mandiant could now force the program to use a predetermined filename, Mandiant wrote another PowerShell script designed to attempt to win the race condition by copying a file (test.exe
) to the %TEMP%
folder, using the predicted filename, between the file write and execution in order to overwrite the file created by MSIExec.exe
. In this test, test.exe
was a simple proof-of-concept executable that would start notepad.exe
.
while ($true) {
if (Test-Path -Path "C:UsersUSERAppDataLocalTempwac3.tmp") {
Copy-Item -Path "C:UsersUSERDesktoptest.exe" -Destination
"C:UsersUSERAppDataLocalTempwac3.tmp" -Force
}
}
Figure 5: PowerShell race condition script to copy arbitrary file into %TEMP%
With the %TEMP%
folder prepared with the wac1.tmp
and wac2.tmp
files created, Mandiant ran both the PowerShell script and MSI repair action targeting wac3.tmp
. With the race condition script running, execution of the repair action resulted in the test.exe
file overwriting the intended binary and subsequently being executed by MSIExec.exe
, opening cmd.exe
as NT AUTHORITYSYSTEM
.
Figure 6: Obtaining NT AUTHORITY SYSTEM command prompt
As discussed in Mandiant’s previous blog post, misconfigured Custom Actions can be trivial to find and exploit, making them a significant security risk for organizations. It is essential for software developers to follow secure coding practices and review their implemented Custom Actions to prevent attackers from hijacking high-privilege operations triggered by the MSI repair functionality. Refer to the original blog post for general best practices when configuring Custom Actions. In discovery of CVE-2023-6080, Mandiant identified several misconfigurations and oversights that allowed for privilege escalation to NT AUTHORITYSYSTEM
.
The SysTrack MSI performed file operations including creation and execution in the user’s %TEMP%
folder, which provides a low-privilege user the opportunity to alter files being actively used in a high-privilege context. Software developers should keep folder permissions in mind and ensure all privileged file operations are performed from folders that are appropriately secured. This can include altering the read/write permissions for the folder, or using built-in folders such as C:Program Files
or C:Program Files (x86)
, which are inherently protected from low-privilege users.
Additionally, the software’s filename generation schema included a failover mechanism that allowed an attacker to force the application into using a predetermined filename. When using randomized filenames, developers should use a sufficiently large length to ensure that an attacker cannot exhaust all possible filenames and force the application into unexpected behavior. In this case, knowing the target filename before execution made it significantly easier to beat the race condition, as opposed to dynamically identifying and replacing the target file between the time of its creation by MSIExec.exe
and the time of its execution.
Something security professionals must also consider is the safety of the programs running on corporate machines. Many approved applications may inadvertently contain security vulnerabilities that increase the risk in our environments. Mandiant recommends that companies consider auditing the security of their individual endpoints to ensure that defense in depth is maintained at an organizational level. Furthermore, where possible, companies should monitor the spawning of administrative shells such as cmd.exe
and powershell.exe
in an elevated context to alert on possible privilege escalation attempts.
Domain privilege escalation is often the focus of security vendors and penetration tests, but it is not the only avenue for privilege escalation or compromise of data integrity in a corporate environment. Compromise of integrity on a single system can allow an attacker to mount further attacks throughout the network; for example, the Network Access Account used by SCCM can be compromised through a single workstation and when misconfigured can be used to escalate privileges within the domain and pivot to additional systems within the network.
Mandiant offers dedicated endpoint security assessments, during which customer endpoints are tested from multiple contexts, including the perspective of an adversary with low-privilege access attempting to escalate privileges. For more information about Mandiant’s technical consulting services, including comprehensive endpoint security assessments, visit our website.
We would like to extend a special thanks to Andrew Oliveau, who was a member of the testing team that discovered this vulnerability during his time at Mandiant.
June 13, 2024 – Vulnerability reported to Lakeside Software
July 1, 2024 – Lakeside Software confirmed the vulnerability
August 7, 2024 – Confirmed vulnerability fixed in version 11.0
Read More for the details.
AWS Transfer Family web apps are now available in the following additional Regions: North America (N. California, Canada West, Canada Central), South America (São Paulo), Europe (London, Paris, Zurich, Milan, Spain), Africa (Cape Town), Israel (Tel Aviv), Middle East (Bahrain, UAE), and Asia Pacific (Osaka, Hong Kong, Hyderabad, Jakarta, Melbourne, Seoul, Mumbai). This expansion allows you to create Transfer Family web apps in additional commercial Regions where Transfer Family is available.
AWS Transfer Family web apps provide a simple interface for accessing your data in Amazon S3 through a web browser. With Transfer Family web apps, you can provide your workforce with a fully managed, branded, and secure portal for your end users to browse, upload, and download data in S3.
To learn more about AWS Transfer Family web apps, read our blog and visit the Transfer Family User Guide. For complete regional availability information, see the AWS Region Table.
Read More for the details.
Dashboard Q&A by Amazon Q in QuickSight enables QuickSight Authors to add Data Q&A to their dashboards in one-click. With dashboard Q&A, QuickSight users can ask and answer questions about their data using natural language.
Dashboard Q&A capabilities of Q in QuickSight automatically extract semantic information presented in dashboards and use it to enable Q&A over specific data and improves existing Topic based Q&A experiences by automatically using semantics from dashboards to improve Q&A answers. With Dashboard Q&A Authors can quickly deliver self-service access to customized data insights for the entire organization.
Dashboard Q&A is launching to all regions in which QuickSight’s generative data Q&A is available today, as documented here.
To learn more, visit our documentation.
Read More for the details.
Amazon Elastic Block Store (Amazon EBS) now supports additional resource-level permissions for creating EBS volumes from snapshots. With this launch, you now have more granular controls to set resource-level permissions for the creation of a volume and selection of the source snapshot when calling the CreateVolume action in your IAM policy. This allows you to control the IAM identities that can create EBS volumes from source snapshots, and the conditions that they can use these snapshots to create EBS volumes.
To meet your specific permission needs on the source snapshots, you can also specify any of 5 EC2-specific condition keys in your IAM policy: ec2:Encrypted, ec2:VolumeSize, ec2:Owner, ec2:ParentVolume, and ec2:SnapshotTime. Additionally, you can use global condition keys for the source snapshot.
This new resource-level permission model is available in all AWS Regions where EBS volumes are available. To learn more about using resource-level permissions to create EBS volume, or transitioning to the new resource-level permission model from previous permission model, please visit the launch blog. For more information about Amazon EBS, please visit the product page.
Read More for the details.
Today, Amazon Q Developer announces an improved software development agent capable of running build and test scripts on generated code to validate the code before the developers review. This new capability detects errors, ensures generated code is in sync with the project’s current state, and accelerates the development process by producing higher quality code on the first iteration.
With the developer’s natural language input request and project-specific context, the Amazon Q Developer agent is designed to assist in implementing complex multi-file features and bug fixes. The agent will analyze the existing codebase, make necessary code changes, and run the selected build and test commands to ensure the code is working as expected. Where errors are found, the agent will iterate on the code prior to requesting the developer’s review. Throughout the process, the agent maintains a real-time connection with the developer, providing updates as changes are made. With control over what commands Amazon Q runs through a Devfile, you can customize the development process for better accuracy.
The Amazon Q Developer agent for software development is available for JetBrains and Visual Studio Code IDEs in all AWS regions where Q Developer is available.
To learn more about Amazon Q Developer, visit the service overview page. For more details about this announcement and how to get started using the Amazon Q Developer agent for software development, read the AWS DevOps & Developer Productivity blog.
Read More for the details.
AWS Deadline Cloud now includes the ability to specify a limit for a specific resource, like a floating license, and also constrain the maximum number of workers that work on a job. AWS Deadline Cloud is a fully managed service that simplifies render management for teams creating computer-generated graphics and visual effects, for films, television and broadcasting, web content, and design.
By adding a limit to your Deadline Cloud farm, you can specify a maximum amount of concurrent usage of resources by workers in your farm. Capping resource usage ensures tasks don’t start until the resources needed to run are available. For example, if you have 50 floating licenses for a particular plugin required by your rendering workflow, a Deadline Cloud limit allows you to ensure no more than 50 tasks requiring that limit are started, preventing tasks from failing due to the license being unavailable. Additionally, setting a maximum number of workers on a job enables you to prevent any single job from consuming all the available workers so that you can efficiently run multiple jobs concurrently when there are a limited number of workers available.
Limits are available in all AWS Regions where Deadline Cloud is available.
To learn more, visit the AWS Deadline Cloud documentation.
Read More for the details.
Amazon Connect now includes the ability for agents to schedule time off up to 24 months in the future, making it easier for managers and agents to plan ahead of time. With this launch, agents can now book time off in Connect up to 24 months ahead of time (an increase from 13 months). Additionally, you can now upload pre-approved time off windows for a scheduling group (group allowance) for up to 27 months at a time (an increase from 13 months). These increased limits provide agents more flexibility to plan their personal time and also provide managers better visibility into future staffing needs, thus enabling more efficient resource allocation.
This feature is available in all AWS Regions where Amazon Connect agent scheduling is available. To learn more about Amazon Connect agent scheduling, click here.
Read More for the details.
Amazon AppStream 2.0 now allows administrator to control whether admin consent is required when users link their OneDrive for Business accounts as a persistent storage option.
The new capability simplifies the management of AppStream 2.0 persistent storage and the admin consent process. After enabling OneDrive for Business for an AppStream 2.0 stack and specifying the OneDrive domains, administrators can now configure whether admin consent is needed for each OneDrive domain. If admin consent is required, administrators must approve users’ OneDrive connections within their Azure Active Directory environment when users attempt to link their account to AppStream 2.0.
This feature is available at no additional cost in all AWS Regions where AppStream 2.0 is offered. It is supported only on AppStream stacks using single-session Windows fleets.
To get started, open the AppStream 2.0 console and create a stack. In the Enable storage step, enable OneDrive for business and configure the admin consent settings. For more details, refer to Administer OneDrive for business. You can also programmatically manage the setting using AppStream 2.0 APIs. For API details, see the CreateStack API documentation.
Read More for the details.
AWS Glue announces 14 new connectors for applications, expanding its connectivity portfolio. Customers can now use AWS Glue native connectors to ingest data from Blackbaud Raiser’s Edge NXT, CircleCI, Docusign Monitor, Domo, Dynatrace, Kustomer, Mailchimp, Microsoft Teams, Monday, Okta, Pendo, Pipedrive, Productboard and Salesforce Commerce Cloud.
As enterprises increasingly rely on data-driven decisions, they need to integrate with data from various applications. With 14 new connectors, customers have more options to easily establish a connection to their applications using the AWS Glue console or AWS Glue APIs without the need to learn application-specific APIs. Glue native connectors provide the scalability and performance of the AWS Glue Spark engine along with support for standard authorization and authentication methods like OAuth 2. With these connectors, customers can test connections, validate their connection credentials, preview data, and browse metadata.
AWS Glue native connectors to Blackbaud, CircleCI, Docusign Monitor, Domo, Dynatrace, Kustomer, Mailchimp, Microsoft Teams, Monday, Okta, Pendo, Pipedrive, Productboard, Salesforce Commerce Cloud are available in all AWS commercial regions.
To get started, create new AWS Glue connections with these connectors and use them as source in AWS Glue studio. To learn more, visit AWS Glue documentation for connectors.
Read More for the details.
Amazon RDS Custom for SQL Server now offers enhanced storage and performance capabilities, supporting up to 64TiB of storage and 256,000 I/O operations per second (IOPS) with io2 Block Express volumes. This represents an improvement from the previous limit of 16 TiB and 64,000 IOPS with io2 Block Express. These enhancements enable transactional databases and data warehouses to handle larger workloads on a single Amazon RDS Custom for SQL Server database instance.
The support for 64TiB and 256,000 IOPS with io2 Block Express for Amazon RDS Custom for SQL Server is now generally available in all AWS regions where both Amazon RDS io2 Block Express volumes and Amazon RDS Custom for SQL Server are currently supported.
Amazon RDS Custom for SQL Server is a managed database service that allows customization of the underlying operating system and includes the ability to bring your own licensed SQL Server media or use SQL Server Developer Edition while providing the time-savings, durability, and scalability benefits of a managed database service. To get started, visit the Amazon RDS Custom for SQL Server User Guide. See Amazon RDS Custom Pricing for up-to-date pricing of instances, storage, data transfer and regional availability.
Read More for the details.
Amazon Connect Cases now allows agents and supervisors to filter cases in the agent workspace by custom field values, making it easier to narrow down search results and find relevant cases. Users can also customize the case list view and search results layout by adding custom columns, hiding or rearranging existing columns, and adjusting the number of cases per page. These enhancements enable users to tailor the case list view to meet their needs and manage their case workloads more effectively.
For region availability, please see the availability of Amazon Connect features by Region. To learn more and get started, visit the Amazon Connect Cases webpage and documentation.
Read More for the details.
The Amazon EventBridge console now displays the source and detail type of all available AWS service events when you create a rule in the EventBridge console. This makes it easier for customers to discover and utilize the full range of AWS service events when building event-driven architectures. Additionally, the EventBridge documentation now includes an automatically updated list of all AWS service events, facilitating access to the most current information.
Amazon EventBridge Event Bus is a serverless event router that enables you to create highly scalable event-driven applications by routing events between your own applications, third-party SaaS applications, and other AWS services. With this update, developers can quickly search and filter through all available AWS service events, including event types, within the EventBridge console, when configuring event patterns in the sandbox and rules, and in the documentation, enabling customers to more efficiently create event-driven integrations and reduce misconfiguration.
This feature in the EventBridge console is available in all commercial AWS Regions. To learn more about discovering and using AWS service events in Amazon EventBridge, see the updated list of AWS service events in the documentation here.
Read More for the details.
Amazon Managed Service for Prometheus collector, a fully-managed agentless collector for Prometheus metrics, adds support for cross-account ingestion. Starting today, you can agentlessly scrape metrics from Amazon Elastic Kubernetes Service clusters in different accounts than your Amazon Managed Service for Prometheus workspace.
While it was previously possible to apply AWS multi-account best practices for centralized observability with Amazon Managed Service for Prometheus workspaces, you had to use self-managed collection. This meant that you had to run, scale, and patch telemetry agents yourself to scrape metrics from Amazon Elastic Kubernetes Service clusters in various accounts in order to ingest them into a central Amazon Managed Service for Prometheus workspaces in a different account. With this launch, you can now use the Amazon Managed Service for Prometheus collector to get rid of this heavy lifting and ingest metrics in a cross-account setup without having to self-run a collector. In addition, you can now also use the Amazon Managed Service for Prometheus collector to scrape metrics from for Amazon Elastic Kubernetes Service clusters to ingest them into Amazon Managed Service for Prometheus workspaces created with customer managed keys.
Amazon Managed Service for Prometheus collector is available in all regions where Amazon Managed Service for Prometheus is available. To learn more about Amazon Managed Service for Prometheus collector, visit the user guide or product page.
Read More for the details.
For developers who want to use the PyTorch deep learning framework with Cloud TPUs, the PyTorch/XLA Python package is key, offering developers a way to run their PyTorch models on Cloud TPUs with only a few minor code changes. It does so by leveraging OpenXLA, developed by Google, which gives developers the ability to define their model once and run it on many different types of machine learning accelerators (i.e., GPUs, TPUs, etc.).
The latest release of PyTorch/XLA comes with several improvements that improve its performance for developers:
A new experimental scan
operator to speed up compilation for repetitive blocks of code (i.e., for loops)
Host offloading to move TPU tensors to the host CPU’s memory to fit larger models on fewer TPUs
Improved goodput for tracing-bound models through a new base Docker image compiled with the C++ 2011 Standard application binary interface (C++ 11 ABI) flags
In addition to these improvements we’ve also re-organized the documentation to make it easier to find what you’re looking for!
Let’s take a look at each of these features in greater depth.
Have you ever experienced long compilation times, for example when working with large language models and PyTorch/XLA — especially when dealing with models with numerous decoder layers? During graph tracing, where we traverse the graph of all the operations being performed by the model, these iterative loops are completely “unrolled” — i.e., each loop iteration is copied and pasted for every cycle — resulting in large computation graphs. These larger graphs lead directly to longer compilation times. But now there’s a new solution: the new experimental scan
function, inspired by jax.lax.scan.
The scan
operator works by changing how loops are handled during compilation. Instead of compiling each iteration of the loop independently, which creates redundant blocks, scan
compiles only the first iteration. The resulting compiled high-level operation (HLO) is then reused for all subsequent iterations. This means that there is less HLO or intermediate code that is being generated for each subsequent loop. Compared to a for loop, scan
compiles in a fraction of the time since it only compiles the first loop iteration. This improves the developer iteration time when working on models with many homogeneous layers, such as LLMs.
Building on top of torch_xla.experimental.scan
, the torch_xla.experimental.scan_layers
function offers a simplified interface for looping over sequences of nn.Modules.
Think of it as a way to tell PyTorch/XLA “These modules are all the same, just compile them once and reuse them!” For example:
One thing to note is that custom pallas kernels do not yet support scan
. Here is a complete example of using scan_layers
in an LLM for reference.
Another powerful tool for memory optimization in PyTorch/XLA is host offloading. This technique allows you to temporarily move tensors from the TPU to the host CPU’s memory, freeing up valuable device memory during training. This is especially helpful for large models where memory pressure is a concern. You can use torch_xla.experimental.stablehlo_custom_call.place_to_host
to offload a tensor and torch_xla.experimental.stablehlo_custom_call.place_to_device
to retrieve it later. A typical use case involves offloading intermediate activations during the forward pass and then bringing them back during the backward pass. Here’s an example of host offloading for reference.
Strategic use of host offloading, such as when you’re working with limited memory and are unable to use the accelerator continuously, may significantly improve your ability to train large and complex models within the memory constraints of your hardware.
Have you ever encountered a situation where your TPUs are sitting idle while your host CPU is heavily loaded tracing your model execution graph for just-in-time compilation? This suggests your model is “tracing bound,” meaning performance is limited by the speed of tracing operations.
The C++11 ABI image offers a solution. Starting with this release, PyTorch/XLA offers a choice of C++ ABI flavors for both Python wheels and Docker images. This gives you a choice for which version of C++ you’d like to use with PyTorch/XLA. You’ll now find builds with both the pre-C++11 ABI, which remains the default to match PyTorch upstream, and the more modern C++11 ABI.
Switching to the C++11 ABI wheels or Docker images can lead to noticeable improvements in the above-mentioned scenarios. For example, we observed a 20% relative improvement in goodput with the Mixtral 8x7B model on v5p-256 Cloud TPU (with a global batch size of 1024) when we switched from the pre-C++11 ABI to the C++11 ABI! ML Goodput gives us an understanding of how efficiently a given model utilizes the hardware. So if we have a higher goodput measurement for the same model on the same hardware, that indicates better performance of the model.
An example of using a C++11 ABI docker image in your Dockerfile might look something like:
Alternatively, if you are not using Docker images, because you’re testing locally for instance, you can install the C++11 ABI wheels for version 2.6 using the following command (Python 3.10 example):
The above command works for Python 3.10. We have instructions for other versions within our documentation.
The flexibility to choose between C++ ABIs lets you choose the optimal build for your specific workload and hardware, ultimately leading to better performance and efficiency in your PyTorch/XLA projects!
So, what are you waiting for, go try out the latest version of PyTorch/XLA! For additional information check out the latest release notes.
A note on GPU support
We aren’t offering a PyTorch/XLA:GPU wheel in the PyTorch/XLA 2.6 release. We understand this is important and plan to reinstate GPU support by the 2.7 release. PyTorch/XLA remains an open-source project and we welcome contributions from the community to help maintain and improve the project. To contribute, please start with the contributors guide.
The latest stable version where a PyTorch/XLA:GPU wheel is available is torch_xla 2.5.
Read More for the details.
Modern AI workloads require powerful accelerators and high-speed interconnects to run sophisticated model architectures on an ever-growing diverse range of model sizes and modalities. In addition to large-scale training, these complex models need the latest high-performance computing solutions for fine-tuning and inference.
Today, we’re excited to bring the highly-anticipated NVIDIA Blackwell GPUs to Google Cloud with the preview of A4 VMs, powered by NVIDIA HGX B200. The A4 VM features eight Blackwell GPUs interconnected by fifth-generation NVIDIA NVLink, and offers a significant performance boost over the previous generation A3 High VM. Each GPU delivers 2.25 times the peak compute and 2.25 times the HBM capacity, making A4 VMs a versatile option for training and fine-tuning for a wide range of model architectures, while the increased compute and HBM capacity makes it well-suited for low-latency serving.
The A4 VM integrates Google’s infrastructure innovations with Blackwell GPUs to bring the best cloud experience for Google Cloud customers, from scale and performance, to ease-of-use and cost optimization. Some of these innovations include:
Hudson River Trading, a multi-asset-class quantitative trading firm, will leverage A4 VMs to train its next generation of capital market model research. The A4 VM, with its enhanced inter-GPU connectivity and high-bandwidth memory, is ideal for the demands of larger datasets and sophisticated algorithms, accelerating Hudson River Trading’s ability to react to the market.
Effectively scaling AI model training requires precise and scalable orchestration of infrastructure resources. These workloads often stretch across thousands of VMs, pushing the limits of compute, storage, and networking.
Hypercompute Cluster enables you to deploy and manage these large clusters of A4 VMs with compute, storage and networking as a single unit. This makes it easy to manage complexity while delivering exceptionally high performance and resilience for large distributed workloads. Hypercompute Cluster is engineered to:
We’re excited to be the first hyperscaler to announce preview availability of an NVIDIA Blackwell B200-based offering. Together, A4 VMs and Hypercompute Cluster make it easier for organizations to create and deliver AI solutions across all industries. If you’re interested in learning more, please reach out to your Google Cloud representative.
Read More for the details.