Starting today, Amazon Elastic Compute Cloud (Amazon EC2) M8g and R8g instances are available in AWS Asia Pacific (Hong Kong) region. These instances are powered by AWS Graviton4 processors and deliver up to 30% better performance compared to AWS Graviton3-based instances. Amazon EC2 M8g instances are built for general-purpose workloads, such as application servers, microservices, gaming servers, midsize data stores, and caching fleets. Amazon EC2 R8g instances are ideal for memory-intensive workloads such as databases, in-memory caches, and real-time big data analytics. These instances are built on the AWS Nitro System, which offloads CPU virtualization, storage, and networking functions to dedicated hardware and software to enhance the performance and security of your workloads.
AWS Graviton4-based Amazon EC2 instances deliver the best performance and energy efficiency for a broad range of workloads running on Amazon EC2. These instances offer larger instance sizes with up to 3x more vCPUs and memory compared to Graviton3-based instances. AWS Graviton4 processors are up to 40% faster for databases, 30% faster for web applications, and 45% faster for large Java applications than AWS Graviton3 processors.
AWS is expanding service reference information to include information about which service actions are supported by the IAM Last Accessed and IAM Access Analyzer Policy Generation features. IAM Last Accessed and Policy Generator features help you journey towards least privilege permissions, and now you can easily reference which service actions are supported by these features in machine-readable files.
You can automate the retrieval of service reference information, eliminating manual effort and ensuring your policies align with the latest service updates. You can also incorporate this service reference directly into your policy management tools and processes for a seamless integration. This feature is offered at no additional cost. To get started, refer to the documentation on programmatic service reference information.
Amazon Connect now supports AWS CloudFormation for quick responses, enabling customers to deploy and manage quick responses using AWS CloudFormation templates. Quick responses allow contact center agents to access pre-configured messages to respond consistently and efficiently to common customer inquiries. With AWS CloudFormation, administrators can now define and deploy these quick responses across environments in a scalable and repeatable way.
Using AWS CloudFormation, organizations can standardize agent communications and reduce manual configuration between Amazon Connect instances. For example, you can use AWS CloudFormation templates to roll out updated response sets for seasonal campaigns or regulatory compliance across multiple Connect environments. This launch simplifies deployment and integrates seamlessly with continuous delivery pipelines.
This feature is available in all AWS Regions where Amazon Connect is offered. For a full list of supported Regions, see the AWS Region table. To learn more, see Quick responses in the Amazon Connect Administrator Guide or visit the Amazon Connect product page.
Amazon ElastiCache now supports Bloom filters as a new data type in ElastiCache version 8.1 for Valkey and above. Bloom filters are a space efficient probabilistic data structure that lets you quickly check whether an item is possibly in a set. This new feature is fully compatible with the valkey-bloom module and API compatible with the Bloom filter command syntax of the Valkey client libraries, such as valkey-py, valkey-java, and valkey-go. Previously, to find whether elements were added to your cache, you used the Set data type to write items to a set and then check if that item already existed. Bloom filters achieve the same outcome using a probabilistic approach and are over 98% more memory efficient than using sets without compromising performance.
Bloom filters are available today in Amazon ElastiCache version 8.1 for Valkey in all AWS Regions and for serverless and node-based offerings at no additional cost. To learn more about Bloom filters on ElastiCache for Valkey, check out the ElastiCache documentation. For the full documentation and list of supported commands, see the Bloom filter documentation.
Amazon Aurora PostgreSQL Limitless Database is now available in the US West (N. California), Africa (Cape Town), Asia Pacific (Hyderabad, Jakarta, Malaysia, Melbourne, Mumbai, Osaka, Seoul, Thailand), Canada (Central), Canada West (Calgary), Europe (London, Milan, Paris, Spain, Zurich), Israel (Tel Aviv), Mexico (Central), Middle East (Bahrain, UAE), and South America (Sao Paulo) Regions.
Aurora PostgreSQL Limitless Database makes it easy for you to scale your relational database workloads by providing a serverless endpoint that automatically distributes data and queries across multiple Amazon Aurora Serverless instances while maintaining the transactional consistency of a single database. Aurora PostgreSQL Limitless Database offers capabilities such as distributed query planning and transaction management, removing the need for you to create custom solutions or manage multiple databases to scale. As your workloads increase, Aurora PostgreSQL Limitless Database adds additional compute resources while staying within your specified budget, so there is no need to provision for peak, and compute automatically scales down when demand is low.
Aurora PostgreSQL Limitless Database is available with PostgreSQL 16.6 and 16.8 compatibility in these regions.
AWS Glue now offers a new native connector for Microsoft Dynamics 365, enabling data engineers to easily integrate data from this enterprise resource planning (ERP) and customer relationship management (CRM) platform. This connector allows AWS Glue users to build efficient extract, transform, and load (ETL) jobs that seamlessly connect to Microsoft Dynamics 365 as a data source.
With this new connector, users can streamline their data integration processes, reducing the complexity and time required to incorporate Microsoft Dynamics 365 data into their AWS-based analytics and business intelligence workflows. Organizations can now leverage the power of AWS Glue’s fully-managed ETL service in conjunction with their Microsoft Dynamics 365 data, enabling more comprehensive insights and data-driven decision-making.
The AWS Glue connector for Microsoft Dynamics 365 is available in all regions where AWS Glue is supported.
To learn more about this new connector and how to get started, visit the AWS Glue documentation.
In April, we released Cluster Director, a unified management plane that makes deploying and managing large-scale AI infrastructure simpler and more intuitive than ever before, putting the power of an AI supercomputer at your fingertips. Today, we’re excited to release new features in preview including an intuitive interface, managed Slurm experience, and observability dashboard that intercepts performance anomalies.
From complex configuration to easy creation
AI infrastructure users can spend weeks wrestling with complex configurations for compute, networking, and storage. Because distributed training workloads are highly synchronized jobs across thousands of nodes and are highly sensitive to network latency, performance bottlenecks can be difficult to diagnose and resolve. Cluster Director solves these challenges with a single, unified interface that automates the complex setup of AI and HPC clusters, integrating Google Cloud’s optimized compute, networking, and storage into a cohesive, performant, and easily managed environment.
LG Research uses Google Cloud to train their large language models, most recently Exaone 3.5. They have significantly reduced the time it takes to have a cluster running with their code — from over a week to less than one day. That’s hundreds of GPU hours saved for real workloads.
“Thanks to Cluster Director, we’re able to deploy and operate large-scale, high-performance GPU clusters flexibly and efficiently, even with minimal human resources.” – Jiyeon Jung, AI Infra Sr Engineer, LG AI Research
Biomatter uses Google Cloud to scale their in silico design processes. Cluster Director has made the cluster deployment and management smooth, enabling them to dedicate more focus to the scientific challenges at the core of their work.
“Cluster Director on Google Cloud has significantly simplified the way we create, configure, and manage Slurm-based AI and HPC clusters. With an intuitive UI and easy access to GPU-accelerated instances, we’ve reduced the time and effort spent on infrastructure.” – Irmantas Rokaitis, Chief Technology Officer, Biomatter
Read on for what’s new in the latest version of Cluster Director.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e03bd28f970>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>
Simplified cluster management across compute, network, and storage
Use a new intuitive view in the Google Cloud console to easily create, update, and delete clusters. Instead of a blank slate, you start with a choice of validated, optimized reference architectures. You can add one or more machine configurations from a range of VM families (including A3 and A4 GPUs) and specify the machine type, the number of GPUs, and the number of instances. You can choose your consumption model, selecting on-demand capacity (where supported), DWS Calendar or Flex start modes, Spot VMs for cost savings, or attaching a specific reservation for capacity assurance.
Cluster Director also simplifies networking by allowing you to deploy the cluster on a new, purpose-built VPC network or an existing one. If you create a new network, the firewall rules required for internal communication and SSH access are configured automatically, removing a common pain point. For storage, you can create and attach a new Filestore or Google Cloud Managed Lustre instance, or connect to an existing Cloud Storage bucket. These integrations help ensure that your high-performance file system is correctly mounted and available to all nodes in the cluster from the moment they launch.
Powerful job scheduling with Managed Slurm
Cluster Director provides fault-tolerant and highly scalable job scheduling out of the box with a managed, pre-configured Slurm environment. The controller node is managed for you, and you can easily configure the login nodes, including machine type, source image, and boot-disk size. Partitions and nodesets are pre-configured based on your compute selections, but you retain the flexibility to customize them, now or in the future.
Topology-aware placement
To maximize performance, Cluster Director is deeply integrated with Google’s network topology. This begins when clusters are created, when VMs are placed in close physical proximity. Crucially, this intelligence is also built directly into the managed Slurm environment. The Slurm scheduler is natively topology-aware, meaning it understands the underlying physical network and automatically co-locates your job’s tasks on nodes with the lowest-latency paths between them. This integration of initial placement and ongoing job scheduling is a key performance enhancer, dramatically reducing network contention during large, distributed training jobs.
Comprehensive visibility and insights
Cluster Director’s integrated observability dashboard provides a clear view of your cluster’s health, utilization, and performance, so you can quickly understand your system’s behavior and diagnose issues in a single place. The dashboard is designed to easily scale to tens of thousands of VMs.
Advanced diagnostics to detect performance anomalies
In distributed ML training, stragglers refer to small numbers of faulty or slow nodes that eventually slow down the entire workload. Cluster Director makes it easy to quickly find and replace stragglers to avoid performance degradation and wasted spend.
Try out Cluster Director today!
We are excited to invite you to be among the first to experience Cluster Director. To learn more and express your interest in joining the preview, talk to your Google Cloud account team or sign up here. We can’t wait to see what you will build.
Building applications is sometimes messy, it’s always iterative, and it often works best when it’s collaborative. As a developer, you regularly experience the frustration of a cryptic error message and the quiet triumph of finding a clever workaround. Either way, finding help or sharing success is best facilitated by a community of builders.
That’s why we are excited to launch the Google Developer Program forums at at discuss.google.dev. The new forums are designed to help people build with Google technology. You will find discussion groups to engage with other developers and Google experts; how-to articles, reference architectures and use cases; and a community of users looking to help.
We’re also migrating the existing Google Cloud, Workspace Developer, AppSheet, and Looker communities, channels and content from googlecloudcommunity.com over to discuss.google.dev. So, existing knowledge isn’t lost – it’s just moving to a new home. And by migrating the community we’re able to focus on two core principles in the new design: high trust and high utility.
Signal over noise
Your Google Developer Program profile is how you will access the forums. By unifying our sign-in, and connecting forum profiles directly to Google Developer Program profiles, we can programmatically display your earned credentials and reputation which you’ve earned through learning, events, and meetups that happen across the Google ecosystem.
aside_block
<ListValue: [StructValue([(‘title’, ‘Not a Google Developer Program member yet?’), (‘body’, <wagtail.rich_text.RichText object at 0x3e03bd2a8580>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
We’re starting with the Google Developer Expert flair icon next to a user’s name and we plan to extend this to other programs in the near future. Additionally, if you are part of a private product beta or Early Access Program (EAP), your forum account is automatically granted access to the corresponding private discussion groups. No more filling out forms or waiting for permissions. Your Developer Program profile is your passport.
Why we chose Discourse for our new forums
While we were tempted to build a custom solution from scratch we chose Discourse for a few key reasons:
Built by and for developers: Discourse is an open-source platform that prioritizes function over flash with markdown, code formatting, keyboard navigation, and structured conversations.
Extensibility: Its robust API and plugin architecture allow us to integrate our own Google technologies—like Gemini-powered spam filtering and the Google Developer Program—without reinventing the wheel.
This is your invitation!
This new community is a space for all of us. Come say hello! Ask a question, or answer one. Share what you’re working on, or get help with what you’re stuck on. This is where the real work happens, and we want to be a part of it with you.
In the coming months, you’ll see more of our engineers, product managers, and developer advocates join the conversation to not only help answer questions, but also ask them, share their own ideas, and engage with the same passion as you do. They won’t always have a perfect solution to a tricky question, but they’re committed to listen, engage, and work with the community to find the best path forward.
How to Get Started
Explore Now: Visit https://discuss.google.dev. Browse the categories, read ongoing discussions, and find your community.
Join the Conversation: If you’re a Google Developer Program member, sign in and dive in! Ask those tough questions, share your solutions, and contribute your expertise. Not a member yet? Visit developers.google.com/program to learn more and join at no-cost.
For googlecloudcommunity.com users: We’re working to make the transition as smooth as possible. You’ll find familiar topics and a wealth of historical discussions here. We encourage you to explore and continue your conversations on this new, unified platform.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) X8g instances are available in US East (Ohio) region. These instances are powered by AWS Graviton4 processors and deliver up to 60% better performance than AWS Graviton2-based Amazon EC2 X2gd instances. X8g instances offer up to 3 TiB of total memory and increased memory per vCPU compared to other Graviton4-based instance. They have the best price performance among EC2 X-series instances, and are ideal for memory-intensive workloads such as electronic design automation (EDA) workloads, in-memory databases (Redis, Memcached), relational databases (MySQL, PostgreSQL), real-time big data analytics, real-time caching servers, and memory-intensive containerized applications.
X8g instances offer larger instance sizes with up to 3x more vCPU (up to 48xlarge) and memory (up to 3TiB) than Graviton2-based X2gd instances. They offer up to 50 Gbps enhanced networking bandwidth and up to 40 Gbps of bandwidth to the Amazon Elastic Block Store (Amazon EBS). Elastic Fabric Adapter (EFA) networking support is offered on 24xlarge, 48xlarge, and bare metal sizes, and Elastic Network Adapter (ENA) Express support is available on instance sizes larger than 12xlarge.
X8g instances are currently available in the following AWS Regions: US East (N. Virginia, Ohio), US West (Oregon), and Europe (Frankfurt).
Amazon CloudWatch adds support for Internet Protocol version 6 (IPv6) addresses to ingest and query metrics, as well as manage alarms, dashboards, and Observability Access Manager (OAM) cross-account observability settings. Customers moving to IPv6 can simplify their network stack by running their CloudWatch monitoring on a dual-stack network that supports both IPv4 and IPv6.
The addition of IPv6 support provides customers with a vastly expanded address space, eliminating concerns about address exhaustion and simplifying network architecture for IPv6-native applications. With simultaneous support for both IPv4 and IPv6 clients on CloudWatch public endpoints, customers can gradually transition from IPv4 to IPv6-based systems and applications without needing to switch all systems at once. This enhancement is particularly valuable for modern cloud-native applications and organizations transitioning to IPv6 as part of their modernization efforts.
Support for IPv6 on CloudWatch endpoints is now available in all commercial AWS Regions, the AWS GovCloud (US) Regions, and the China Regions.
Starting today, Amazon EC2 gives customers the option to skip the graceful operating system (OS) shutdown during an instance stop or terminate. Previously, customers waited by default for a graceful OS shutdown attempt when stopping or terminating their instances. Customers can now skip the graceful operating system shutdown attempt during stop or terminate for a faster application recovery when instance data preservation is not critical. For example, customers with high-availability clusters where instance data is replicated elsewhere can skip the graceful OS shutdown during failover, enabling faster instance state transitions. Customers can enable the option to skip the graceful OS shutdown when stopping or terminating instances using the AWS CLI or EC2 Console. To learn more, please refer to our documentation here for StopInstances and here for TerminateInstances.
We are announcing major enhancements to the AWS IoT SiteWise Query API (ExecuteQuery) to support a suite of advanced SQL capabilities, empowering customers to perform sophisticated data analysis on their industrial data and business intelligence tool integration. Customers can now leverage advanced SQL operations including string manipulation (such as pattern matching and substring extraction), aggregation functions (e.g., grouping, SUM, COUNT, and more on telemetry streams), multi-field sorting, and robust datetime operations for time-based analytics. Furthermore, we are introducing an ODBC driver, enabling direct integration with BI tools like Tableau, Power BI, and Excel for enhanced data visualization and reporting on operational data—without custom development or middleware.
These enhancements enable customers to execute advanced queries such as “SELECT avg(rts.double_value), a.asset_name FROM raw_time_series rts, asset a, asset_property ap WHERE a.asset_name LIKE ‘Boiler%’ AND ap.property_name = ‘Temperature’ AND rts.event_timestamp > TIMESTAMP ‘2025-04-15 09:00:00’ AND rts.event_timestamp < TIMESTAMP ‘2025-04-15 17:00:00’ GROUP BY a.asset_name ORDER BY 1” to analyze average boiler temperatures during business hours. This helps streamline the path from industrial data collection to actionable business insights.
This enhanced feature is available in the following AWS Regions: Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Europe (Ireland), Europe (Frankfurt), US East (N. Virginia), US West (Oregon). The ODBC driver is compatible with Windows environments.
AWS IoT SiteWise is a managed service that makes it easy to collect, store, organize and analyze data from industrial equipment at scale to help you make data-driven decisions. To learn more about the enhanced Query API and download the ODBC driver, please visit the user guide.
Cost Optimization Hub, a feature within the Billing and Cost Management Console, announces support for account names in cost optimization opportunities. This enhancement allows you to use account names to easily view, filter, consolidate, and prioritize cost optimization recommendations.
This launch simplifies the process of identifying and managing cost optimization opportunities across complex account structures. With an account name, you can now quickly recognize which accounts are associated with specific recommendations, making it easier to take action on cost-saving opportunities. This enhancement is particularly valuable for large organizations and AWS Partners managing multiple accounts, as it streamlines the cost optimization workflow and enhances visibility into potential savings across the entire account portfolio.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) P6-B200 instances accelerated by NVIDIA Blackwell GPUs are available in US East (N. Virginia) region. These instances offer up to 2x performance compared to P5en instances for AI training and inference.
P6-B200 instances feature 8 Blackwell GPUs with 1440 GB of high-bandwidth GPU memory and a 60% increase in GPU memory bandwidth compared to P5en, 5th Generation Intel Xeon processors (Emerald Rapids), and up to 3.2 terabits per second of Elastic Fabric Adapter (EFAv4) networking. P6-B200 instances are powered by the AWS Nitro System, so you can reliably and securely scale AI workloads within Amazon EC2 UltraClusters to tens of thousands of GPUs.
Amazon RDS for Oracle zero-ETL integration with Amazon Redshift enables near real-time analytics and machine learning (ML) to analyze petabytes of transactional data in Amazon Redshift without complex data pipelines for extract-transform-load (ETL) operations. Within seconds of data being written to Amazon RDS for Oracle, the data is replicated to Amazon Redshift. With Zero-ETL integrations, analyzing data from Amazon RDS for Oracle is simple, helping you derive holistic insights across many applications.
With this launch, you can use the AWS management console, API, CLI, and AWS CloudFormation to create and manage zero-ETL integrations between RDS for Oracle and Amazon Redshift. You can choose specific pluggable databases (PDBs) to selectively replicate them. In addition, you can choose specific tables and tailor replication to your needs.
RDS for Oracle zero-ETL integration with Redshift is available with Oracle Database versions 19c in supported AWS Regions. To learn more, referAmazon RDS and Amazon Redshift documentation.
Amazon Relational Database Service (Amazon RDS) for PostgreSQL zero-ETL integration with Amazon Redshift is now generally available, enabling near real-time analytics and machine learning (ML) on petabytes of transactional data. With this launch, you can create multiple zero-ETL integrations from a single Amazon RDS database, and you can apply data filtering for each integration to include or exclude specific databases and tables, tailoring replication to your needs. You can also use AWS CloudFormation to automate the configuration and deployment of resources needed for zero-ETL integration.
Zero-ETL integrations make it simpler to analyze data from Amazon RDS to Amazon Redshift by removing the need for you to build and manage complex data pipelines and helping you derive holistic insights across many applications. Within seconds of data being written to Amazon RDS for PostgreSQL, the data is replicated to Amazon Redshift. Using zero-ETL, you can enhance data analysis on near real-time data with the rich analytics capabilities of Amazon Redshift, including integrated ML, Spark support, and materialized views.
This zero-ETL integration is available for all RDS for PostgreSQL versions 15.4 and later, Amazon Redshift Serverless, and Amazon Redshift RA3 instance types in supported AWS Regions. To learn more about this zero-ETL integration, visit the documentation for Amazon RDS and Amazon Redshift.
Starting today, Amazon EC2 Instance Connect and EC2 Serial console are also available in AWS Asia Pacific (Malaysia), AWS Asia Pacific (Thailand) and AWS Mexico (Central) regions.
EC2 Instance Connect allows customers to connect to their instances over Secure Shell (SSH) with a single click from the EC2 console and single command from the AWS CLI. Customers can manage access to instances with AWS Identity and Access Management (IAM) policies as well as generate one-time use SSH keys enhancing the security posture for connectivity. EC2 Serial Console provides customers a simple way to interactively troubleshoot boot and network connectivity issues by establishing a connection to the serial port of an instance. EC2 Serial Console is ideal for situations where you are unable to connect to your instance via normal SSH or RDP connections.
To get started, you can navigate to the Amazon EC2 console, select the instance you want to connect to and with a single click connect using EC2 Serial Console or Instance Connect. You can also connect from the AWS CLI or API. To learn more see our documentation.
The evolution of the cloud has been tremendous over the past decade. Every step of the way, Google Kubernetes Engine (GKE) has been there to meet new challenges. From giving DevOps more scalable foundations to supporting the rise of cloud-native AI, we took Kubernetes’ brilliance and gave it the fully managed service it deserved to thrive.
GKE turns 10 this year, and to celebrate, we’ve launched 10 years of GKE, an ebook that explores this incredible decade and how customers have built global businesses powered on this managed platform. We released Kubernetes as open source in 2014, and one million contributions later, we couldn’t be prouder of what Kubernetes has become, its history, and its future with GKE.
GKE’s leading lights
One of the earliest GKE customers was Signify, a global leader in lighting for professionals and the company behind Philips Hue. Ten years on, it continues to thrive on the service. Growing from 200 million to 3.5 billion daily transactions, Signify scaled from one GKE cluster to seven, and is looking to leverage GKE for new workloads, including platform engineering and AI for multi-cluster supervision.
“The constant improvements made by GKE over the past 10 years profoundly changed the way we design, deploy, and evolve our services,” says Leon Bouwmeester, Director of Engineering and Head of Hue Platform at Signify. “We spend less time on infrastructure management and can focus our efforts on what really matters: the quality of the user experience and the speed of innovation.”
However, what put GKE on the map was Pokémon GO, Niantic’s ground-breaking geolocation game. As millions took to the streets to catch ‘em all, GKE brought to life and kept up with its explosive launch. “Never have I taken part in anything close to the growth that Google Cloud customer Niantic experienced with the launch of Pokémon GO,” says Luke Stone, director of customer reliability engineering at Google Cloud.
Target vs. worst case vs. actual traffic to GKE during Niantic’s launch of Pokémon Go.
AI for tomorrow on GKE today
Today, GKE supports brand new businesses in the rapidly evolving world of AI. Customers report how their AI initiatives are made more powerful on GKE, helping them manage the complex demands of their deployments. This means flexibility and scale for AI workloads and cost-efficient inference — so you can focus on training, not managing.
With GKE Autopilot, AI can also help you optimize your configurations and workloads. In the ebook, learn more about how GKE Autopilot mode frees up teams to focus on innovation, with businesses sharing how they automatically improved performance and cost savings — with the stability and security they expect from Google Cloud.
Join the celebration by exploring 10 years of GKE for yourself. We‘ve distilled a decade of insights into what makes GKE so effective, thoughts from customers on how GKE is supporting their work at scale, and why we’re ready for everything AI has in store for the decade ahead. It’s been an amazing ride, and with AI reshaping the future of application development, we’re just getting started.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e1ececfbf40>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
Securing sensitive data is a crucial part of moving workloads to the cloud. While encrypting data at rest and in transit are standard security practices, safeguarding data in use — while it’s actively being processed in memory — can present unique security and privacy challenges.
To make sure that data in use is also protected, we developed Confidential Computing with our hardware partners to use hardware-based Trusted Execution Environments (TEEs) to isolate and safeguard data in use, even from the cloud provider hosting the data.
To help build a secure and reliable cloud environment, we’ve partnered with SUSE, a global leader in open source and secure enterprise solutions. Together, we’ve developed targeted solutions that can enable organizations to run their sensitive workloads in the cloud, combining the hardware-based security of Google Cloud Confidential Virtual Machines (Confidential VMs) with the security of SUSE Linux Enterprise Server (SLES).
Today, we are excited to announce that SUSE Linux Enterprise Server now supports Google Cloud Confidential VMs that have Confidential Computing technologies AMD SEV, AMD SEV-SNP, or Intel TDX enabled. Previously, SLES was only generally available on AMD SEV and AMD SEV-SNP-based Confidential VMs, but now SLES is also generally available on Intel TDX-based Confidential VMs which run on the performant C3 machine series. This new offering provides customers more choice and flexibility in securing sensitive workloads, while expanding Confidential VM support for guest operating system images.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e1ececa2790>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
At Google Cloud, we strongly advocate for a layered approach to security. Here, SUSE Linux Enterprise Server (SLES) strengthens the guest OS layer, while Confidential VMs strengthen the infrastructure layer. Additionally, the comprehensive SLES security portfolio can help support compliance, risk mitigation, and cybersecurity best practices:
Meeting compliance requirements: SLES is designed to help organizations meet regulatory requirements through its security features. SLES comes with Federal Information Processing Standards (FIPS) 140-3 certified cryptographic modules.
Reducing evaluation effort: Utilizing SLES with supplier certifications can help customers streamline their evaluation processes by referencing existing certifications.
Hardening automatically: SLES includes an automated hardening process that can help with Security Technical Implementation Guide (STIG)-compliant hardening during setup with YAST or AutoYAST, which can be adjusted as needed.
The combination of SLES within Google Cloud Confidential VMs can offer several benefits:
Complementing encryption with a secure OS: With its security focus and certifications, SLES can provide a hardened operating system in a trusted environment, making both applications and the OS less susceptible to vulnerabilities.
Supporting integrity and trustworthiness: Customers can have greater confidence that both the hardware and the operating system are working as expected. Confidential VMs offer remote attestation, allowing verification of the VM identity and state. Running a secure OS, such as SLES, on an attested Confidential VM can support overall data and code integrity.
Supporting Confidential Computing technologies: By providing a consistent and secure operating system across all Google Cloud Confidential Computing types (AMD SEV, AMD SEV-SNP, and Intel TDX), SLES can help simplify the deployment and management of sensitive cloud workloads.
Enhancing compliance in sensitive environments: For workloads that require a notable level of data protection due to compliance regulations, this joint security solution of SLES on Confidential VMs can help alleviate cloud migration concerns from internal auditors.
Addressing internal and external threats: While Confidential Computing primarily can help protect against external threats like hypervisors, running a security-focused OS like SLES in a Confidential VM can offer an additional layer of protection against potential internal vulnerabilities in the guest OS itself.
Reinforcing data ownership and control: Confidential Computing can help provide technical assurances that you have retained control and effective ownership of your data, even when your data is processed in the cloud. By encrypting data in use and limiting access to only your authorized workloads within a TEE, you can gain stronger assurances for your digital sovereignty.
Extending Zero Trust to execution: By encrypting data in memory on the CPU, this solution extends the Zero Trust principle of “never trust, always verify” to data even when it’s actively being processed. This ensures data remains secure and encrypted throughout its lifecycle, including during execution, helping to enforce a real Zero Trust environment.
Establishing a secure foundation for cloud-native workloads: With SLES providing a secure base and Google Cloud Confidential VMs offering hardware-level protection, this environment together with SUSE Cloud Native solutions can deliver a robust foundation for your most sensitive cloud-native applications. By securing the underlying compute resources, you can extend data-in-use protection to higher level containerized and cloud-native workloads.
Organizations can confidently move regulated and confidential applications to Google Cloud, knowing their data is supported throughout its lifecycle, including while in use and with a secure guest OS, to bolster their digital sovereignty.
Broadcom’s VMware vSphere product remains a popular choice for private cloud virtualization, underpinning critical infrastructure. Far from fading, organizations continue to rely heavily on vSphere for stability and control. We’re also seeing a distinct trend where critical workloads are being repatriated from public cloud services to these on-premises vSphere environments, influenced by strategies like bimodal IT and demands for more operational oversight.
The common practice of directly integrating vSphere with Microsoft Active Directory (AD), while simplifying administration tasks, creates an attack path frequently underestimated due to a misunderstanding of the inherent risks presented today. This configuration extends the AD attack surface directly to the hypervisor. From a threat actor’s perspective, this integration constitutes a high-value opportunity. It transforms the relatively common task of compromising AD credentials into a potential high value scenario, granting access to the underlying infrastructure hosting the servers and in turn allowing them to gain privileged administrative control over ESXi hosts and vCenter and ultimately seize complete command of the virtualized infrastructure.
Ransomware aimed at vSphere infrastructure, including both ESXi hosts and vCenter Server, poses a uniquely severe risk due to its capacity for immediate and widespread infrastructure paralysis. With the end of general support for vSphere 7.x approaching in October 2025—the version Mandiant has observed to be running by a large majority of organizations—the threat of targeted ransomware has become urgent. As recovering from such an attack requires substantial time and resources, proactive defense is paramount. It is therefore critical for organizations to understand the specific threats against these core components and implement effective, unified countermeasures to prevent their compromise, especially before support deadlines introduce additional risk.
This blog post will logically break down the inherent risks and misunderstandings with integrating vSphere with Microsoft AD. Using Mandiant’s deep experience of both vSphere ransomware incidents and proactive assessments of both AD and vSphere, we will provide a directive for understanding risk and increasing security posture aligned with today’s threats in respect of enterprise vSphere management.
To understand the security risks in a vSphere environment, it’s essential to understand its architecture. A compromise at one layer can have cascading effects throughout the entire virtualized environment.
At its core, vSphere is a platform that pools physical datacenter resources like compute, storage, and networking into a flexible layer of virtual infrastructure, a task primarily accomplished by two key components, ESXi and vCenter, as shown in the following diagram:
ESXi (The Hypervisor): This is the foundational layer of vSphere. ESXi is a bare metal hypervisor, meaning it installs directly onto the physical server hardware without requiring an underlying operating system. Its core job is to partition that server into multiple, isolated virtual machines (VMs). Each VM, which is essentially just a collection of files, runs its own operating system and applications, acting like an independent computer. The hypervisor’s minimal design is intentional, aiming to reduce its own attack surface while efficiently managing the server’s resources.
vCenter (The Control Plane): If ESXi hosts are the workers, the vCenter Server is the “brain” or control plane for the entire environment. It provides a single web-based interface to manage all connected ESXi hosts and the VMs they run. ESXi hosts are registered with vCenter, which uses agents on each host to manage operations and enable advanced features like automatic workload balancing and high availability for failover protection.
Integrating vSphere with AD creates a flexible environment that simplifies identity management, yet it introduces profound security risks. This direct link can turn an AD compromise into a significant threat against the entire vSphere deployment.
An Outdated Blueprint: Re-examining Foundational vSphere Security
Virtualization has been a cornerstone of enterprise IT for nearly two decades, solving server sprawl and delivering transformative operational agility. Alongside it, AD remains a pillar of enterprise IT. This has led to a long-standing directive that all enterprise technology, including critical infrastructure like vSphere, must integrate with AD for centralized authentication. The result is a risky dependency—the security of foundational infrastructure is now directly tied to the security of AD, meaning any compromise within AD becomes a direct threat to the entire virtualization environment.
In the past, vSphere security was often approached in distinct, siloed layers. Perimeter security was stringent, and threats were typically viewed as internal, such as configuration errors, rather than from external threat actors. This, combined with the newfound ease of image-based backups, often led to security efforts becoming primarily focused on robust business continuity and disaster recovery capabilities over proactive defense. As environments expanded, managing local user accounts created significant administrative overhead, so support for AD integration was introduced for centralized identity management.
Mandiant’s observation, based on extensive incident response engagements, is that many vSphere environments today still operate on this foundational architecture, carrying forward security assumptions that haven’t kept pace with the evolving threat landscape. As Mandiant’s assessments frequently identify, these architectures often prioritize functionality and stability over a security design grounded in today’s threats.
So what’s changed? Reliance solely on perimeter defenses is an outdated security strategy. The modern security boundary focuses on the user and device, typically protected by agent-based EDR solutions. But here lies the critical gap: The ESXi hypervisor, a purpose-built appliance, which, contrary to what many people believe, is not a standard Linux distribution. This specialized architecture inherently prevents the installation of external software, including security tools like EDR agents. vSphere documentation explicitly addresses this, stating:
“The ESXi hypervisor is a specialized, purpose-built solution, similar to a network router’s firmware. While this approach has several advantages, it also makes ESXi unable to run “off-the-shelf” software, including security tools, designed for general-purpose operating systems as the ESXi runtime environment is dissimilar to other operating systems.
The use of Endpoint Detection and Response (EDR) and other security practices inside third-party guest operating systems is supported and recommended.”
Consequently, most organizations focus their security efforts and EDR deployment inside the guest operating systems. This leaves the underlying ESXi hypervisor—the foundation of the entire virtualization environment—as a significant blind spot for security teams.
The vSphere Threat Landscape
The security gap at the hypervisor layer, which we detailed in the previous section, has not gone unnoticed by threat actors. As security for Windows-based operating systems matured with advanced EDR solutions, threat actors have pivoted to a softer, higher-value target—the ESXi hypervisor itself.
This pivot is amplified by common operational realities. The critical role of ESXi hosts often leads to a hesitancy to apply patches promptly for fear of disruption. Many organizations face a rapidly closing window to mitigate risks; however, threat actors aren’t just relying on unpatched vulnerabilities. They frequently leverage compromised credentials, a lack of MFA, and simple misconfigurations to gain access.
The Rise of Hypervisor-Aware Ransomware
Ransomware targeting vSphere is fundamentally more devastating than its traditional Windows counterpart. Instead of encrypting files on servers or end user compute, these attacks aim to cripple the entire infrastructure by encrypting virtual disk files (VMDKs), disabling dozens of VMs at once.
This is not a theoretical threat. According to Google Threat Intelligence Group (GTIG), the focus on vSphere is rapidly increasing. Of the new ransomware families observed, the proportion specifically tailored for vSphere ESXi systems grew from ~2% in 2022 to over 10% in 2024. This demonstrates a clear and accelerating trend that threat actors are actively dedicating resources to build tooling that specifically targets the hypervisor. In incidents investigated by GTIG, threat actors most frequently deployed REDBIKE, RANSOMHUB, and LOCKBIT.BLACK variants.
GTIG analysts have also noted a recent trend for threat actors to gain persistence to vSphere environments via reverse shells deployed on Virtual center. This enables a foothold to be obtained within the vSphere control plane and thus complete control over all infrastructure. This would typically manifest in into a two-pronged approach: a tactical data exfiltration such as an AD database (NTDS.dit) and then the deployment of ransomware and mass encryption of all VMs.
Understanding the Active Directory Integration in vSphere
The decision to integrate vSphere with AD often overlooks the specifics of how this connection actually works. To properly assess the risk, we must look beneath the surface at the technical components that enable this functionality. This analysis will deconstruct those key pieces: the legacy agent responsible for authentication, its inherent inability to support modern security controls like multi-factor authentication (MFA), and the insecure default trust relationships it establishes. By examining these foundational mechanisms, we can expose the direct line from a credential compromise to an infrastructure takeover.
vSphere’s Likewise Agent
When discussing vSphere’s integration with AD, it’s essential to distinguish between two separate components: vCenter Server and the ESXi hosts. Their respective AD integration options are independent and possess different capabilities. This connection is entirely facilitated by the Likewise agent.
The Likewise agent was originally developed by Likewise Software to allow Linux and Unix-based systems to join AD environments, enabling centralized identity management using standard protocols like Kerberos, NTLM, and LDAP/(S). The open-source edition, Likewise Open, included tools such as domainjoin-cli and system daemons like lsassd, which are still found under the hood in ESXi and the vCenter Server Appliance (VCSA). vSphere embedded this agent starting with ESX 4.1 (released in 2010) to facilitate Integrated Windows Authentication (IWA). However, its function differs:
In ESXi, the Likewise agent actively handles AD user authentication when configured.
In vCenter, it is only used for the initial domain join when Integrated Windows Authentication (IWA) is selected as the identity source—all actual authentication is then handled by the vCenter Single Single On (SSO) subsystem.
The original Likewise Software was eventually absorbed by BeyondTrust, and the open-source edition of the agent is no longer actively maintained publicly. The Likewise OSS project is now archived and marked as inactive. It is understood the codebase is only maintained internally. Note: The agent’s build version remains identical at Likewise Version 6.2.0 across both ESXi 7 and 8.
Figure 1: ESXi Likewise Agent versions
The following table lists comparisons between native AD connection methods for both Virtual Center and ESXi.
Feature / Capability
ESXi Host
vCenter Server (VCSA)
AD Integration Method
Integrated Windows Authentication (IWA) only
IWA and LDAP/LDAPS
Federated Identity (SAML, OIDC)
Likewise Agent Used
Yes – exclusively for IWA domain join and authentication
Yes – Used for IWA domain join only
Authentication Protocols Supported
Kerberos (via IWA only)
Kerberos (IWA), LDAP(S), SAML, OIDC
Modern Auth Support (OIDC, SAML, FIDO2)
Not supported
Not supported via AD
Supported onlywhen using federated IdPs
MFA Support
Not supported
Not supported viaAD DS
Supported via Identity Federation (ADFS, Azure AD, etc.)
Granular Role-Based Access Control (RBAC)
Limited (via host profile or CLI only)
Advanced RBAC with vCenter SSO
Why Not to Use Likewise-Based AD Integration (ESXi/vCenter)
The following list contains considerations when using AD-based connections managed by the vSphere Likewise agent:
Deprecated software: Likewise is legacy software, no longer maintained or supported upstream.
No support for modern authentication: Likewise only supports Integrated Windows Authentication (Kerberos) and offers no support for SAML, OIDC, or FIDO2.
No MFA: Likewise cannot enforce contextual policies such as MFA, geolocation restrictions, or time-based access.
Credential material stored locally: Kerberos keytabs and cached credentials are stored unencrypted on disk.
VMware recommends leveraging identity federation with modern identity providers, bypassing the limitations of the legacy Likewise-based stack. Broadcom announced on March 25 that IWA will be removed in the next major release.
The MFA Gap
While AD integration offers administrative convenience, it introduces significant security limitations, particularly regarding MFA. Traditional AD authentication methods, including Kerberos and NTLM, are inherently single-factor. These protocols do not natively support MFA, and the vCenter Likewise integration does not extend AD MFA enforcement to vCenter or ESXi.
Critically, ESXi does not support MFA in any form, nor does it support identity federation, SAML, or modern protocols such as OIDC or FIDO2. Even for vCenter, MFA can only be applied to users within the vSphere.local domain (using mechanisms like RSA SecurID or RADIUS), but not to AD-joined users authenticated through IWA or LDAP/S.
Ancillary solutions can offer proxy-based MFA that integrate with AD to enforce MFA to vSphere. AuthLite extends the native AD login process by requiring a second factor during Windows authentication, which can indirectly secure vCenter access when Integrated Windows Authentication is used. Silverfort operates at the domain controller level, enforcing MFA on authentication flows in real time without requiring agents on endpoints or changes to vCenter. Both solutions can help enforce MFA into vSphere environments that lack native support for it, but they can also introduce caveats such as added complexity and potential authorization loops if AD becomes dependent on the same infrastructure they protect and the need to treat their control planes or virtual appliances as Tier 0 systems within the vSphere environment.
As a result, in organizations that integrate vSphere with traditional Active Directory, all access to critical vSphere infrastructure (ESXi and Virtual Center) remains protected by password alone and no MFA.
While it is technically possible to enforce MFA in vSphere through Active Directory Federation Services (ADFS), this approach requires careful consideration. It is important to note that ADFS is still a feature included in Windows Server 2025 and is not on any official deprecation list with an end-of-life date. However, the lack of significant new feature development compared to the rapid innovation in Microsoft Entra ID speaks to its status as a legacy technology. This is underscored by the extensive migration resources Microsoft now provides to move applications away from AD FS and into Entra ID.
Therefore, while ADFS remains a supported feature, for the purposes of securing vSphere it is a complex workaround that doesn’t apply to direct ESXi access and runs contrary to Microsoft’s clear strategic direction toward modern, cloud-based identity solutions.
Another common approach involves Privileged Access Management (PAM). While a PAM-centric strategy offers benefits like centralized control and session auditing, several caveats warrant consideration. PAM systems add operational complexity, and the vCenter session itself is typically not directly federated with the primary enterprise identity provider (like Entra ID or Okta). Consequently, context-aware conditional access policies are generally applied only at the initial PAM logon, not within the vCenter session itself.
Ultimately, these workarounds do not address the core issue: vSphere’s reliance on the Likewise agent and traditional AD protocols prevents native MFA enforcement for AD users, leaving the environment vulnerable.
There is a reliance on a delegated logon based on AD password complexity, and any MFA would have to be at the network access layer or workstation login, not at the vCenter login prompt for those users.
The ‘ESX Admins’ Problem Is Not an ESXi Issue, It’s a Trust Issue
In July 2024, Microsoft published a blog post on CVE-2024-37085, an “ESXi vulnerability” that was considered a critical issue, and one that vSphere promptly addressed in a patch release. The CVE, present in vSphere ESXi for many years, involved several ESXi advanced settings utilizing insecure default configurations. Upon joining an ESXi host to an AD domain, the “ESX Admins” AD group is automatically granted an ESXi Admin role, potentially expanding the scope of administrative access beyond the intended users.
These settings are configured by the following ESXi controls:
What it does: This setting controls whether users from a designated administrators group are automatically added to the host’s local administrative group.
What it does: This setting defines the time interval at which the host’s management services validate the authentication credentials (or tickets) of connected clients.
Config.HostAgent.plugins.hostsvc.esxAdminsGroup
What it does: This parameter specifies the name (or identifier) of the group whose members are to be automatically considered for host administrative privileges (when auto-add is enabled by the first setting).
Config.HostAgent.plugins.hostsvc.esxAdminsGroupAutoAdd from true to false
Config.HostAgent.plugins.vimsvc.authValidateInterval from 1440 to 90
Config.HostAgent.plugins.hostsvc.esxAdminsGroup from “ESX Admins” to “”
The following is a configuration fix to default settings in vSphere ESXi 8.0 Update 3:
Config.HostAgent.plugins.hostsvc.esxAdminsGroupAutoAdd from true to false
Config.HostAgent.plugins.vimsvc.authValidateInterval from 1440 to 90
Config.HostAgent.plugins.hostsvc.esxAdminsGroup no change “ESX Admins”
Integrating an ESXi host with Microsoft AD introduces a fundamental security issue that is often overlooked—the IdP’s administrators effectively gain administrative control over the ESXi host and any other system relying on that trust. While a common perception, sometimes reinforced by narratives focusing on the endpoint, suggests the ESXi host itself is the primary vulnerability, the more critical security concern is the implicit, far-reaching administrative power wielded by the administrators of the trusted IdP, particularly when using AD authentication with ESXi.
Administrators of Active Directory implicitly become administrators of any ESXi host that trusts it.
Consequently, neither workarounds nor configuration fixes, which only adjust default settings, resolve this core problem when an ESXi host is joined to AD. The issue transcends specific CVEs; it stems from the inherent security implications of the implicit trust model itself, particularly when it involves systems like ESXi and AD, which already possess their own security vulnerabilities and are frequent targets for threat actors.
In respect of ESXi, context should be applied to the following:
Automatic full administrative access: When ESXi hosts are joined to AD, a default (or custom configured) AD group (e.g., “ESX Admins”) is granted full root-level administrative privileges on the ESXi hosts. Any member of this AD group instantly gains unrestricted control of the ESXi host.
Group name: If AD is compromised, threat actors can manipulate anygroup name used for via the the Config.HostAgent.plugins.hostsvc.esxAdminsGroup advanced setting, This is not limited to the group name “ESX Admins.”
Lack of security identifier (SID) tracking: AD group names (not limited to “ESX Admins”) added to ESXi are not tracked by their SIDs. This means that a threat actor could rename or recreate a deleted AD group such as “ESX Admins” maintaining the same name in ESXi via Config.HostAgent.plugins.hostsvc.esxAdminsGroup and retain the elevated privileges. This is a limitation of the Likewise ESXi agent.
Active Directory group management. Any threat actor looking to access a domain-joined ESXi host would need to simply require sufficient permissions to add themselves to the AD group defined via Config.HostAgent.plugins.hostsvc.esxAdminsGroup.
Recent discussions around vulnerabilities like CVE-2024-37085 have brought this security issue to the forefront: the inherent dangers of joining vSphere ESXi hosts directly to an AD domain. While such integration offers perceived management convenience, it establishes a level of trust that can be easily exploited.
Why Your ESXi Hosts Should Never Be Active Directory Domain Joined
Based on previous discussions we can confidently establish that joining an ESXi host to AD carries substantial risk. This is further endorsed where there is an absence of comprehensive ESXi security controls such as Secure Boot, TPM, execInstalledOnly, vCenter integration, comprehensive logging and SIEM integration. Compromised AD credentials tied to an ESXi-joined group will allow remote threat actors to readily exploit the elevated privileges, executing actions such as virtual machine shutdown and ransomware deployment via SSH. These risks can be summarized as follows:
No MFA support: ESXi does not support MFA for AD users. Domain joining exposes critical hypervisor access to single-factor password-based authentication.
Legacy authentication protocols: ESXi relies on IWA and Kerberos / NTLM / Windows Session Authentication (SSPI)—outdated protocols vulnerable to various attacks, including pass-the-hash and credential relay.
Likewise agent is deprecated: The underlying Likewise agent is a discontinued open-source project. Continued reliance on it introduces maintenance and security risks.
No modern authentication integration: ESXi does not support federated identity, SAML, OIDC, FIDO2, or conditional access.
AD policy enforcement is absent: Group Policy Objects (GPOs), conditional access, and login time restrictions do not extend to ESXi via AD join, undermining centralized security controls.
Complexity without benefit: Domain joining adds administrative overhead without offering meaningful security gains — especially when using vCenter as the primary access point.
Limited role mapping granularity: Group-based role mappings on ESXi are basic and cannot match the RBAC precision available in vCenter, reducing access control fidelity.
To securely remove ESXi hosts from AD, a multistep process is required to shift access management explicitly to vCenter. This involves assessing current AD usage, designing granular vCenter roles, configuring vCenter’s RBAC, removing hosts from the domain via PowerCLI, and preventing future AD re-integration. All management then moves to vCenter, with direct ESXi access minimized. This comprehensive approach prioritizes security and efficiency by moving away from AD reliance for ESXi authentication and authorization towards a vCenter-centric, granular RBAC model. vSphere explicitly discourages joining ESXi hosts to AD:
“ESXi can be joined to an Active Directory domain as well, and that functionality continues to be supported. We recommend directing all configuration & usage through the Role-Based Access Controls (RBAC) present in vCenter Server, though.”
vSphere vCenter Server represents a strategic objective for threat actors due to its authoritative role as the centralized management for virtualized infrastructure. A compromised vCenter instance effectively cedes comprehensive administrative control over the entire virtual estate, encompassing all connected ESXi hypervisors, virtual machines, datastores, and virtual network configurations.
Through its extensive Application Programming Interfaces (APIs), adversaries can programmatically manipulate all managed ESXi hosts and their resident virtual machines, enabling actions such as mass ransomware deployment, large-scale data exfiltration, the provisioning of rogue virtual assets, or the alteration of security postures to evade detection and induce widespread operational disruption.
Furthermore, the vCenter Server appliance itself can be subverted by implanting persistent backdoors, thereby establishing covert command-and-control (C2) channels that allow for entrenched persistence and continued malicious operations. Consequently, its critical function renders vCenter a high-value target. The following should be considered:
Coupled security dependency (compromise amplification risk): Directly linking vCenter to AD makes vSphere security dependent on AD’s integrity. As AD is a prime target, compromising privileged AD accounts mapped to vCenter grants immediate, potentially unrestricted administrative access to the virtual infrastructure, bypassing vSphere-specific security layers. Insufficient application of least privilege for AD accounts in vSphere magnifies this risk.
Single-factor authentication weakness (credential compromise risk): Relying solely on AD password validation makes vCenter highly vulnerable to common credential compromise methods (phishing, brute-force, spraying, stuffing, malware). Without mandatory MFA, a single stolen password for a privileged AD account allows complete authentication bypass, enabling unauthorized access, data breaches, ransomware, or major disruptions.
Lack of native MFA: The direct vsphere.local-to-AD integration offers no built-in enforcement of strong authentication like phishing resistant FIDO2 . While compatibility exists for external systems (Smart Cards, RSA SecurID), these require separate, dedicated infrastructure and are not inherent features, leaving a significant authentication assurance gap if unimplemented.
Facilitation of lateral movement and privilege escalation: Compromised AD credentials, even non-administrative ones with minimal vSphere rights, allow threat actors initial vCenter access. vCenter can then be exploited as a pivot point for further network infiltration, privilege escalation within the virtual environment, or attacks on guest systems via console/API access, all stemming from the initial single-factor credential compromise.
Integrating vSphere vCenter directly with AD for identity management, while common, inherently introduces significant security vulnerabilities stemming from coupled dependencies, reliance on single-factor authentication, a lack of native strong MFA, and facilitated attack pathways. These not only critically expose the virtual infrastructure but also provide avenues to exploit the VCSA appliance’s attack surface, such as its underlying Linux shell and the lack of comprehensive endpoint detection and response (EDR) capabilities.
Securing vSphere: The Tier 0 Challenge
The widespread practice of running Tier 0 services—most critically, AD domain controllers (often used for direct Identity integration)—directly on vSphere hypervisors introduces a significant and often overlooked security risk. By placing Active Directory Domain Controllers on vSphere, any successful attack against the hypervisor effectively hands threat actors the keys to the entire AD environment, enabling complete domain takeover. Mandiant observes that a general lack of awareness and proactive mitigation persists.
The danger is significant and present, for example, even for vSphere permissions that appear low-risk or are operationally common. For example, the privilege to snapshot an AD virtual machine can be weaponized for complete AD takeover. This specific vSphere capability, often assigned for backup routines, enables offline NTDS.dit (AD database) exfiltration. This vSphere-level action renders many in-guest Windows Server security controls ineffective, bypassing not only traditional measures like strong passwords and MFA, but also advanced protections such as LSASS credential guard and EDR, which primarily monitor activity within the operating system. This effectively paves a direct route to full domain compromise for a threat actor possessing this specific permission.
Mandiant observed these tactics, techniques, and procedures (TTPs) attributed to various ransomware groups across multiple incidents. The absence of VM encryption and logging makes this a relatively simple task to obtain the AD database while being undetected.
The following table contains a list of sample threats matched to related permissions:
Threat
Risk
Minimum vSphere Permission Required
Unencrypted vMotion
Memory-in-transit (e.g., LSASS, krbtgt hashes) can be captured during migration.
Role: Virtual Machine Power User or higher Permission: Host > Inventory > Migrate powered on virtual machine
Unencrypted VM Disks
AD database (NTDS.dit), registry hives, and password hashes can be stolen from VMDKs.
Role: Datastore Consumer, VM Admin or higher. Permission Datastore > Browse, Datastore > Low level file operations
Snapshot Creation
Snapshots preserve memory and disk state; can be used to extract in-memory credentials.
Role: Virtual Machine Power User or higher. Permission: Virtual Machine > State > Create Snapshot
Mounting VMDK to another VM
Enables offline extraction of AD secrets (e.g., NTDS.dit, registry, SYSVOL).
Role: VM Admin or custom with disk-level access. Permission Virtual Machine > Configuration > Add existing disk, Datastore > Browse
Exporting / Cloning VM
Enables offline AD analysis, allowing credential extraction or rollback attacks.
Delegation of trust from vSphere vCenter to AD grants implicit administrator privileges on the trusted systems to any AD domain administrator. This elevates the risk profile of AD compromise, impacting the entire infrastructure. To mitigate this, implement a two-pronged strategy: first, create a separate, dedicated vSphere environment specifically for the most critical Tier 0 assets, including AD. This isolated environment should be physically or logically separated from other systems and highly secured with robust network segmentation. Second, implement a zero-trust security model for the control plane of this environment, verifying every access request regardless of source. Within this isolated environment, deploy a dedicated “infrastructure-only” IdP (on-premises or cloud). Implementing the principle of least privilege is paramount.
A dedicated, isolated vSphere environment for Tier 0 assets (e.g., Active Directory) should have strictly limited administrative access (via a PAW), granting permissions only to those directly managing the infrastructure. This significantly reduces the impact of a breach by preventing lateral movement and minimizing damage. Unnecessary integrations should be avoided to maintain the environment’s security and adhere to the least-privilege model.
To effectively safeguard critical Tier 0 assets operating within the vSphere environment–specifically systems like Privileged Access Management (PAM), Security Information and Event Management (SIEM) virtual appliances, and any associated AD tools deployed as virtual appliances–a multilayered security approach is essential. These assets must be treated as independent, self-sufficient environments. This means not only isolating their network traffic and operational dependencies but also, critically, implementing a dedicated and entirely separate identity provider (IdP) for their authentication and authorization processes. For the highest level of assurance, these Tier 0 virtual machines should be hosted directly on dedicated physical servers. This practice of physical and logical segregation provides a far greater degree of separation than shared virtualized environments.
The core objective here is to break the authorization dependency chain, ensuring that credentials or permissions compromised elsewhere in the network cannot be leveraged to gain access to these Tier 0 systems. This design creates defense in depth security barriers, fundamentally reducing the likelihood and impact of a complete system compromise.
Conclusion
Mandiant has observed that threat actors are increasingly targeting vSphere, not just for ransomware deployment, but also as a key avenue for data exploitation and exfiltration. This shift is demonstrated by recent threat actor activity observed by GTIG, where adversaries have leveraged compromised vSphere environments to exfiltrate sensitive data such as AD databases before or alongside ransomware execution.
As this document has detailed, the widespread reliance on vSphere, coupled with often underestimated risks inherent in its integration with AD and the persistence of insecure default configurations, creates a dangerously vulnerable landscape. Threat actors are not only aware of these weaknesses but are actively exploiting them with sophisticated attacks increasingly targeting ESXi and vCenter to achieve maximum impact.
The usability and stability that make vSphere a foundational standard for on-premise and private clouds can be misleading; they do not equate to inherent security. The evolution of the threat landscape, particularly the direct targeting of the hypervisor layer which bypasses traditional endpoint defenses, necessitates a fundamental shift in how vSphere security is approached. Relying on outdated practices, backups, perimeter defenses alone, or assuming EDR on guest VMs provides sufficient protection for the underlying infrastructure creates significant security gaps and exposes an organization to severe risks.
Identity integration vulnerabilities will be exploited, therefore, organizations are strongly urged to immediately assess their vSphere environment’s AD integration status and decisively prioritize the implementation of the mitigation strategies outlined in this document. This proactive stance is crucial to effectively counter modern threats and includes:
Decoupling critical dependencies: Severing direct ESXi host integration with AD is paramount to shrinking the AD attack surface.
Modernizing authentication: Implementing robust, phishing-resistant MFA for vCenter, preferably via identity federation with modern IdPs, is no longer optional but essential.
Systematic hardening: Proactively addressing the insecure defaults for ESXi and vCenter, enabling features like execInstalledOnly, Secure Boot, TPM, Lockdown Mode, and configuring stringent firewall rules.
Enhanced visibility: Implementing comprehensive remote logging for both ESXi and vCenter, feeding into a SIEM with use cases specifically designed to detect hypervisor-level attacks.
Protecting Tier 0 assets: Strategically isolating critical workloads like Active Directory Domain Controllers in dedicated, highly secured vSphere environments with strict, minimized access controls and encrypted VMs and vMotion.
The upcoming end-of-life for vSphere 7 in October 2025 means that vast numbers of organizations will not be able to receive product support, security patches and updates for a product that underpins Infrastructure. This presents a critical juncture for organizations and a perfect storm for threat actors. The transition away from vSphere 7 should be viewed as a key opportunity to re-architect for security, not merely a routine upgrade to implement new features and obtain support. Failure to proactively address these interconnected risks by implementing these recommended mitigations will leave organizations exposed to targeted attacks that can swiftly cripple their entire virtualized infrastructure, leading to operational disruption and financial loss. The time to adopt a resilient, defense-in-depth security posture to protect these critical vSphere environments is unequivocally now.