Cloud

2022 03 07

AWS – Announcing Amplify iOS Library (Developer Preview), rewritten to entirely use Swift

Today, we are announcing the Developer Preview of the Amplify iOS Library that has been rewritten to exclusively use Swift. This initial release enables Swift developers to add cloud-based app features, including Auth, Storage, Data, and APIs, without having to transition to Objective-C to debug or contribute to the underlying open-source code. In coming releases, we plan to add support for additional Amplify use cases, as well as Swift-based language features like structured concurrency.

Read More for the details.

2022 03 07

GCP – Building cloud into your data strategy delivers higher efficiency

Cloud, Google Cloud gcp

Presently, every government agency has to take a hard look at their data capabilities and decide whether their current infrastructure supports their workflow. For many, it doesn’t. Most data systems are developed with a strict set of parameters in mind before implementation, which can limit flexibility and long-term use. Particularly during a crisis, flexible “living systems” offer tremendous advantages as they’re able to change capacity rapidly. Building living data systems with the cloud in mind allows organizations to respond to a changing world with confidence.

Last summer, the Government Business Council conducted a survey of government employees to understand the impacts of data efficiency on government operations. The report Built to Last: A Survey on Organizational Data Efficiency in Times of Crisis offers key insights into organizational efficacy and whether organizations can adapt to a crisis at speed. It also highlights differences between traditional data systems and living data systems.

Data needs to be readily available

When the pandemic first hit, many agencies needed to create or transition their systems to allow employees to work remotely. This change tested the limits of existing data systems. Even after finding a cloud service provider, agencies encountered the challenges of migrating their data to the cloud.

Government organizations had decades of data stored in paper records. Most have been working to transfer these records to a digital format, but the process has been slow. They are also faced with collecting sizable amounts of data in real time from their ongoing services, which involves interfacing with the public, external vendors, or third-party institutions.

Building the cloud into a flexible data system can solve both issues. Old records can be digitized and given an easy-to-access home for those who need them. Incoming data, both internal and external, can be made accessible as well. Migrating data to the cloud also doubles as a way to create backups of raw data, adding an extra layer of security. Most importantly, building in the cloud unlocked the capacity to scale when demand rises.

Data should be updated in real-time

One of the key takeaways from the Government Business Council report is the fact that agencies are better able to adapt at speed when data efficiencies are higher. 74% of organizations with pandemic related functions reported a moderate to severe impact to their jobs at the onset of the pandemic. Of those organizations, the ones reporting their data efficiency as “very good” have largely already recovered. That adaptability directly affects an agency’s ability to make informed decisions during a time of a crisis.

Having a real-time data solution in place lets agencies make near real-time decisions. A great example of this from early in the pandemic is vaccine distribution. Google Cloud supported multiple states, such as the State of Wyoming, in distributing vaccines efficiently while handling challenges such as reaching rural populations. Data systems that gathered real-time patient data made a difference in the number of vaccines distributed. Knowing population data and patient risk factors enabled quick and effective decision-making.

A global pandemic is far from the only crisis that needs effective data analytics. Natural disasters, food deserts, public health issues, and more can all be handled more efficiently by having real-time data at hand. Effective data analytics systems are the digital equal of “having your ear to the ground” in each community. They provide valuable insights into what people need.

Data needs to be accessible and easy to use

Making data easy to work with and understand sets phenomenal data systems apart from functional ones. Having data in the cloud is a great first step, but agencies need to be able to easily access and quickly use the data to accomplish their goals. This is where traditional data systems fail most often. Traditional IT systems and data strategies are designed for a specific purpose, usually identified before development and implementation begin. That means that when the data living in those systems needs to be used differently, adapting to new requirements can be difficult.

Data can often feel “locked” in traditional systems; the data is there, but there’s no way to get to it or work with it in a way that meets the needs of a crisis. Flexible data systems address this by allowing for greater accessibility. Google Cloud, for example, has customizable tools, such as Contact Center AI and Document AI, which let agencies work with data in ever-changing ways. This also produces greater data transparency since data sets can be worked with and accessed more easily.

Governments need to respond to the changing needs of their constituents in emergencies. While traditional data systems can handle slowly shifting demands on the system, they do not serve agencies well in a crisis. When urgency, accuracy, and accessibility all matter, flexible systems rise to the challenge. The pandemic has pushed agencies to adapt in real time, and many have realized they need a system that adapts with them.

Google Cloud has a suite of tools to create integrated data ecosystems. These ecosystems can scale with increasing demand, meet dynamic development needs, and adapt to a changing landscape. Data-first decision-making is a core tenet of “living data systems.” Google Cloud data systems have handled everything from administering vaccines to detecting fraud. In each of these applications, a core tenet of data-first decision making was implemented at scale.

For more insights on how flexible data systems help the public sector, download the full report “Built to Last: A Survey on Organizational Data Efficiency in Times of Crisis.”

Read More for the details.

2022 03 07

GCP – Micro Focus Enterprise Server blueprint available for Google Cloud

Cloud, Google Cloud gcp

For decades large enterprises have developed and operated their most critical workloads on the mainframe due to the scalability, security and performance demands of these business systems. But finding the talent to maintain and operate these workloads is increasingly more difficult, the business demand for faster innovation, the downward pressure on costs and the need to remove mainframe data data lock-in is driving enterprises to look for alternatives.

The advent of mainstream cloud computing offers a genuinely new way of looking at performance, cost and scalability that can address the challenges that many customers are facing with mainframe computing today. And the experience of customers that have moved their applications to the cloud already is that they can then modernize these business systems to take advantage of cloud native services without compromising the value these applications provide.

Micro Focus Enterprise Server Blueprint now available for Google Cloud

Micro Focus Enterprise Server is a scalable production engine for securely executing mainframe applications. When combined with Google Cloud infrastructure, Enterprise Server can provide the quality of service and continuous operations that large mainframe workloads require.

Micro Focus has successfully partnered with customers to deliver thousands of successful mainframe modernization projects – each allowing the customer to modernize their application and run it on the platform of their choice.

Google Cloud provides a high-performance, reliable and high-capacity infrastructure at a low cost that is serving thousands of customers in virtually all industries and geographies.

The Blueprint allows customers to perform an automated deployment of Enterprise Server based on best practices inside a new VPC or existing VPC. The Blueprint also installs a fully functional demonstration application (BankDemo) on Enterprise Server that uses COBOL, CICS, Job Control Language (JCL), Virtual Storage Access Method (VSAM) files, and Performance and Availability Clusters (PACs).

With this solution, enterprises and public institutions can deploy an environment that can host mainframe workloads on Google Cloud (GCP) with high security, high availability, elasticity, and robust system management.

Micro Focus Enterprise Server on Google Cloud utilizes the full power of the cloud for your mainframe business workloads

Business-critical applications that run on the mainframe typically execute large numbers of transactions securely, reliably, and with Service Level Objective (SLO) at 99.99% or higher. The quality of service that businesses demand requires

High availability with redundancy and parallel operation built in.

Scalable capacity based on business needs.

Highly secure operations with encryption of data at-rest and in-transit, centralized authentication and authorization, audit trails, key management and policy compliance.

System management and administration that provide centralized monitoring, alerting, logging, metering, patching, backup, and automation.

The fastest and lowest risk way to moving your business critical mainframe applications to the cloud is to leverage the investment and unique business value of your current applications. But to do this successfully you need an application production environment and infrastructure that will not compromise on the quality of service that you need to run your business. Micro Focus Enterprise Server on Google Cloud provides this.

Enterprise Server is a scalable production engine that delivers the high levels of security, reliability, availability and serviceability demanded by even the largest mainframe workload. Existing mainframe COBOL and PL/I applications can be moved with minimal change and support is provided for online CICS and IMS transactions as well as a batch environment to support the move of current jobs, job control and batch utilities. DB2, IMS-DB, QSAM, and VSAM data can be transitioned into alternative database and file systems and you have the choice to re-platform to different Linux distributions, Windows and UNIX.

Micro Focus Reference Architecture

A foundational, highly-available environment can be deployed automatically with Enterprise Serverdeployed on a Google Compute Engine (GCE) and application data (relational and indexed) is stored in a relational database such as Cloud SQL. In addition to infrastructure-as-a-service (IaaS) deployment using Google Compute Engine instances, Enterprise Server can also be deployed in Docker containers and orchestrated using Kubernetes with Google Kubernetes Engine (GKE).

Additional Third-party utilities for batch scheduling, output and print management can be deployed onto additional Google Compute Engine instances and integrated with the Enterprise Server environment.

Micro Focus Enterprise Server on Google Cloud architecture overview

Mainframe workloads can have stringent, non-functional requirements, especially around performance with massive throughput and I/O. A fit-for-purpose approach requires choice to identify the most appropriate compute, storage, IOPS, and networking services on GCP

GCP provides a wide selection of Google Compute Engine machine types and networking options to allow for scale-up scalability as well as cloud services like Google Kubernetes Engine or Cloud Run for scale-out scalability. Customers are not limited by the capacity of one or few machines (scalability bottleneck), nor are they limited to vertical scaling or peak capacity sizing (unused capacity). This means virtually unlimited availability to GCP resources through automatically scaling horizontally to deliver the capacity to process peaks in load at any point in time.

Enterprise Server can take advantage of these GCP resources and provides application scale-out capabilities through its Performance and Availability Cluster (PAC). This allows multiple distinct execution environments to run workloads in parallel that can be synchronized and managed as a single instance. This means COBOL and PL/I applications can run in active/active configurations across different zones. And with data including mainframe data files being deployed to relational datastores, data can be shared and replicated across multiple zones that are connected through low-latency links to meet the availability requirements and SLOs.

Micro Focus Enterprise Server blueprint deployment architecture

Scalability to adjust the infrastructure and application capacity dynamically to meet flexible workload needs is important especially in today’s environments where demand fluctuates drastically and customers want to pay only for what they use.

The Enterprise Server PAC is designed for elasticity and horizontal scaling. Combined with Cloud Load Balancing and Managed Instance Groups ,Enterprise Server Google Cloud instances can be added or removed dynamically from a cluster (that can operate across multiple Availability Zones) based on customizable thresholds such as CPU utilization. And with instances being started and stopped on demand you only pay for what you consume.

Security is also a key concern so customers moving to GCP will want to inherit best practices of policies, architecture, and operational processes built to satisfy their security requirements.

Enterprise Server itself has a robust security model that provides the high levels of authentication and resource access control mainframe customers expect. This ensures that secure access to business applications and their data can be comprehensively managed. Enterprise Serve can also leverage GCPIdentity and Access Management (IAM) to centralize access control across all GCP services and regions with detailed auditing via Cloud Logging and notifications with Cloud Monitoring.

For data confidentiality, integrity, and compliance, Enterprise Server on Google Cloud provides extensive encryption of data options both at-rest or in-transit, without application changes. Standard capabilities include the encryption of data in transit when using the web-based services to interact with the customer using TLS 1.3. By supporting the latest TLS standard in Enterprise Server, customers data is fully protected and prevents private data from being visible when it is transferred within the Cloud.

Once an application has been deployed to the cloud it needs to be operated and managed and to do this Enterprise Server provides system management features and services that can be integrated into an enterprise operations framework.

For administration purposes, Enterprise Server instances and PACs when deployed onto Google Cloud instances are configured and managed using a single web console, Enterprise Server Common Web Administration (ESCWA). This also provides secure, extensible support for RESTful APIs that can be used to integrate with or automate the configuration and management.

For centralized logging and monitoring, Google operations Suite’s Cloud Logging provides real-time log management and analysis. Cloud Logging ingests VM, Enterprise Server and GCP services log data to support performance, troubleshooting, security, and business insights using Log Analytics.

Operation Suites Cloud Monitoring provides visibility into the performance, uptime, and application health. Collect metrics, events, and metadata from Google Cloud services, hosted uptime probes, application instrumentation, visualizes this data on charts and dashboards and create alerts so you are notified when metrics are outside of expected ranges.

For centralized backup,GCP supports taking snapshots of both Enterprise Server Persistent Disks and the managed database. Backup snapshots are saved in Cloud Storage, a reliable and secure object storage.

For additional system management needs, you can explore the many out-of-the-box GCP management services. These allow comprehensive management and deployments of Enterprise Server on GCP virtual machine instances in all regions, with automation and resources readily available.

Deploying your applications to Enterprise Server on GCP delivers the quality of service demanded by large-scale critical mainframe workloads and provides the high security, high availability, elasticity, operational excellence, and cost optimization that you require. Combined with Application Analysis tools and modern development environments as part of a DevOps CI/CD pipeline, you can develop, test and modernize applications faster and deliver innovation to the business sooner.

Deploy an enterprise grade mainframe infrastructure on the press of a button

Mainframe workloads moved to Micro Focus Enterprise Server and Google Cloud means customers can take advantage of a secure, low risk, high value strategy to support application, process, and infrastructure modernization.

To get started, try it for yourself so you can learn more about Micro Focus Enterprise Server on Google Cloud using the Blueprintment environment. You can find the Blueprintment scripts here.

Read More for the details.

2022 03 07

GCP – Get more insights from your Java applications logs

Cloud, Google Cloud gcp

Today it is even easier to capture logs in your Java applications. Developers can get more data with their application logs using a new version of the Cloud Logging client library for Java. The library populates the current executing context implicitly with every ingested log entry. Read this if you want to learn how to get HTTP requests and tracing information and additional metadata in your logs without writing a single line of code.

There are three ways to ingest log data into Google Cloud Logging:

Develop a proprietary solution that directly calls the Logging API.

Leverage logging capabilities of the Google Cloud managed environments like GKE or install Google Cloud Ops agent and print your application logs to stdout and stderr.

Use Google Cloud Logging client library in one of many supported programming languages.

The library provides you with ready to use boilerplate constructs built following the best practices of using Logging API. Java applications can use the Google Cloud Logging library to ingest logs using the integrations with Java Logging and Logback framework.

If you are new to using Google Logging client libraries for Java, follow the steps to set up Cloud Logging for Java and get started.

In the version 3.6 release of the the Logging client library for Java you get many long demanding features including automatic population of the metadata about the environment’s resource supporting Cloud Run and Cloud Functions, HTTP request contextual information, tracing correlation that enables displaying grouped log entries in Logs Explorer and more. This release of the library is composed of the three packages:

google-cloud-logging — provides the hand-written layer above Cloud Logging API and the integration with legacy Java Logging solution.

google-cloud-logging-logback is the integration with the Logback framework and ingests logs using the google-cloud-logging package.

google-cloud-logging-servlet-initializer is a new addition to the library; it provides integration with servlet-based Web applications.

The features are available in the versions ≥3.6.3 and ≥0.123.3-alpha of the google-cloud-logging and google-cloud-logging-logback packages respectively.

If you are using Maven, update the packages’ versions in the pom.xml:

code_block[StructValue([(u’code’, u'<dependency>rn <groupId>com.google.cloud</groupId>rn <artifactId>google-cloud-logging</artifactId>rn <version>3.6.3</version>rn</dependency>rn<dependency>rn <groupId>com.google.cloud</groupId>rn <artifactId>google-cloud-logging-logback</artifactId>rn <version>0.123.3-alpha</version>rn</dependency>’), (u’language’, u”)])]

If you are using Gradle, , update your dependencies:

code_block[StructValue([(u’code’, u”implementation ‘com.google.cloud:google-cloud-logging:3.6.3’rnimplementation ‘com.google.cloud:google-cloud-logging-logback:0.123.3-alpha'”), (u’language’, u”)])]

You can use the official Google Cloud BOM version 0.167.0 that includes the new releases of the packages.

What is new

The Java library inserts structured information about the executing environment including resource types, HTTP request metadata, tracing and more. Using the library you can write your payloads in one of the three formats:

A text provided as a Java string

A JSON object provided as an instance of Map<String, ?> or Struct

A protobuf object provided as an instance of Any

You can use the structured logs with enhanced filtering in Logs Explorer to observe and troubleshoot their applications. The Logs Explorer uses structured logs to establish correlations between traces and logs and to group together logs that belong to the same transaction. The correlated “child” logs are displayed “under” the entry of the “parent” log:

Grouped logs display in Logs Explorer

With the previous versions of the Logging library you had to write code to explicitly populate these fields. For example, developers that use Logback framework had to write a code like below to populate the trace field of the ingested logs:

code_block[StructValue([(u’code’, u’// . . .rnString traceInfo = request.getHeader(“x-cloud-trace-context”);rnTraceLoggingEventEnhancer.setCurrentTraceId(traceInfo);rn// . . .’), (u’language’, u”)])]

And to invoke this code at the beginning of each transaction.

The new features of the Logging library makes implementing the population logic unnecessary. The new version of the library supports automatic population of following log entry fields:

resource ‒ describes the resource type and its attributes where the application is running. Along with GCE instances, it supports Google Cloud managed services such as GKE, AppEngine (both Standard and Flexible), Cloud Run and Cloud Functions.

httpRequest ‒ captures info about HTTP requests from the current application’s context. The context is defined per-thread and can be populated both explicitly in the application code or implicitly from the Jakarta servlet requests pipeline.

trace and spanId ‒ reads the tracing data from the HTTP request header. The tracing data assists in correlating multiple logs that belong to the same transaction.

sourceLocation ‒ stores info about the class and method names as well as the line of code where the application called the log ingestion method. The library retrieves the data by traversing the trace stack up until the first entry that is not part of the Logging library code or the system package.

What is left to you is to set the payload and relevant payload’s metadata labels. The only field in the log entry that the library does not automatically populate now is the operation field.

Disable information auto-population in log entries

You have full control over the auto-population functionality. The auto-population is enabled by default for your convenience. But in certain scenarios it can be desirable to disable it. For example, if your application is log intensive and has a narrow bandwidth, you may want to disable the auto-population in order to save the connection’s bandwidth for the application communication.

If you are ingesting logs using the write() method of the Logging interface, you can configure the LoggingOptions argument to disable the auto-population:

code_block[StructValue([(u’code’, u’LoggingOptions options = LoggingOptions.newBuilder()rn .setAutoPopulateMetadata(false).build();rnLogging logging = options.getService();’), (u’language’, u”)])]

If you are using Java Logging, you can disable auto population by adding the following to your logging.properties file:

code_block[StructValue([(u’code’, u’com.google.cloud.logging.LoggingHandler.autoPopulateMetadata=false’), (u’language’, u”)])]

If you are using Logback framework, you can disable auto population by adding the following to your Logback configuration:

code_block[StructValue([(u’code’, u'<autoPopulateMetadata>false</autoPopulateMetadata>’), (u’language’, u”)])]

How the current context is populated

Rich query and display capabilities of Log Explorer such as displaying correlated logs use the log entries’ fields such as httpRequest and trace. The new version of the library uses the Context class to store the information about the HTTP request and tracing data in the current application context. The context’s scope is per thread. Before the library ingests logs into Cloud Logging, it reads the HTTP request and tracing information from the current context and sets the respective fields in the log entries. The fields are populated only if the caller did not explicitly provide values in these fields. Using the ContextHandler class you can setup the HTTP request and tracing data of the current context:

code_block[StructValue([(u’code’, u’import com.google.cloud.logging.HttpRequest;rn// . . .rnHttpRequest request;rn// . . .rnContextHandler ctxHandler = new ContextHandler();rnContext ctx = Context.newBuilder()rn .setRequest(request)rn .setTraceId(traceId)rn .setSpanId(spanId)rn .build();rnctxHandler.setCurrentContext(ctx);’), (u’language’, u”)])]

After the context is set all logs that will be ingested in the same scope as the context will be populated with the HTTP request and tracing information that was set in the current context. The Context class can setup the HTTP request using partial data such as URL or request method:

code_block[StructValue([(u’code’, u’import com.google.cloud.logging.HttpRequest.RequestMethod;rn// . . .rnContextHandler ctxHandler = new ContextHandler();rnContext ctx = Context.newBuilder()rn .setRequestUrl(“https://example.com/info”)rn .setRequestMethod(RequestMethod.GET);rn .build();rnctxHandler.setCurrentContext(ctx);’), (u’language’, u”)])]

The builder of the Context class also supports setting the tracing information from the parsed values of the Google tracing context and W3C tracing context strings using the methods loadCloudTraceContext() and loadW3CTraceParentContext() respectively.

Implementation of the context population can be a complex task. Java Web servers support asynchronous execution of the request handlers. To manage the context in the right scope may require in-depth knowledge of specific implementation details about each Web server. The new version of the Logging library provides a simple way to automate the process of the current context management, saving you the effort of implementing the code by themselves. The automation supports all Web servers that are based on the Jakarta servlets such as Tomcat, Jetty or Undertow. The current implementation supports Jakarta servlets version ≥ 4.0.4. The implementation is added to the new google-cloud-logging-servlet-initializer package. All that you have to do to enable automatic capturing of the current context is to add the package to your application.

If you are using Maven add the following to your pom.xml:

code_block[StructValue([(u’code’, u'<dependency>rn <groupId>com.google.cloud</groupId>rn <artifactId>google-cloud-logging-servlet-initializer</artifactId>rn <version>0.1.7-alpha</version>rn <type>pom</type>rn</dependency>’), (u’language’, u”)])]

If you are using Gradle, add the following to your dependencies:

code_block[StructValue([(u’code’, u”implementation ‘com.google.cloud:google-cloud-logging-servlet-initializer:0.1.7-alpha'”), (u’language’, u”)])]

The added package uses the Java’s Service Provider Interface to register the ContextCaptureInitializer class which integrates into the servlet pipeline to capture information about current HTTP requests. The information is parsed to populate the HttpRequest structure. It also parses the request’s headers to retrieve tracing information. It supports “x-cloud-trace-context” (Google tracing context) and “traceparent” (W3C tracing context) headers.

Use Logging library with logging agents

Many applications utilize logging capabilities of the Google Cloud managed services. The applications output their logs to stdout and stderr, and the logs are ingested into Cloud Logging by Logging agents or the Cloud managed services with the logging agent capabilities. This approach benefits from asynchronous log processing that does not consume application resources. The drawback of the approach is that if you want to populate fields in the structured logs or provide the structured payload, they have to format their output following the special Json format that the logging agents can parse. Also, while the logging agents can detect and populate the resource information about the managed environment, they cannot help with auto population of other fields of the log entry such as traceId or sourceLocation.

The new release of the Logging library for Java introduces the support for logging agents in both of its Java Logging and Logback integrations. Now the library’s users can instruct the appropriate handler to redirect the log writing to stdout instead of Logging API.

If you are using Java Logging, add the following to your logging.properties file:

code_block[StructValue([(u’code’, u’com.google.cloud.logging.LoggingHandler.redirectToStdout=true’), (u’language’, u”)])]

If you are using Logback, add the following to the Logback configuration:

code_block[StructValue([(u’code’, u'<redirectToStdout>true</redirectToStdout>’), (u’language’, u”)])]

By default, both LoggingHandler and LoggingAppender write logs by calling the Logging API. You have to add the above configurations to make them utilize the logging agents for the log ingestion.

Some limitations of using Logging Agents

When configuring the library’s Java Logging handler or Logback adapter to redirect log writing to stdout, you should be aware of the constraints that the use of logging agents implies.

Google Cloud managed services (e.g. GKE) automatically install logging agents in the resources that they provision. For example, a GKE cluster has a logging agent installed in each worker node (GCE instance) of the cluster. As a result, logging agents are constrained with the resource they run and do not support customization of the resource field of the ingested log entries.

Additionally, the logName of all ingested logs is defined by the agent and cannot be changed*. It means that the application cannot define the log name or where the log entry will be stored (a.k.a. log’s destination name).

If it is essential for you to define a custom resource type or to control to which project the logs will be routed and/or the log name, you should not redirect the log writing to standard output.

* It is possible to customize the log name (but not the destination) by customizing the Logging agent’s configuration in GCE instances by defining the name as the “tag”.

What is next

Let’s recap the benefits of upgrading your logging client to the latest version.

Use the new Logging library if you need log correlation capabilities of Log Explorer or forward Cloud Logging structured logs to external solutions and use the data in the auto-populated fields.

Use the google-cloud-logging-servlet-initializer package to automate the context management if you run a request based application that uses Jakarta servlets. Note that it will not work with legacy Java EE servlets or Web servers that are not based on Java servlets such as Netty.

If you run your application in the Google Cloud serverless environments like Cloud Run or Cloud Functions, consider using Java Logging or Logback with the configuration that redirects formatted logs to standard output like it is described in the previous section. Leveraging logging agents for ingesting logs resolves some reliability problems about asynchronous log ingestion such as CPU throttling on Cloud Run or no grace period in Cloud Functions.

Read More for the details.

2022 03 07

GCP – What’s happening in your SAP systems? Find out with Pacemaker Alerts

Cloud, Google Cloud gcp

When critical services fail, businesses risk losing revenue, productivity, and trust. That’s why Google Cloud customers running SAP applications choose to deploy high availability (HA) systems on Google Cloud.

In these deployments Linux operating system clustering provides application and guest awareness for the application state and automates recovery actions in case of failure — including cluster node, resource or node failover or failed action.

Pacemaker is the most popular software Linux administrators use to manage their HA clusters, which includes automating notifications about events — including failover fencing and node, attribute, and resource events — and reporting on events. With automated alerts and reports, Linux administrators can not only learn about events as they happen, but they can also make sure other stakeholders are alerted to take action when critical events occur. They can even discover past events to assess the overall health of their HA systems.

Here, we break down the steps to setting up automated alerts for HA cluster events and alert reporting.

How to Deploy the Alert Script

To set up event-based alerts, you’ll need to take the following steps to execute the script.

1. Download the script file ‘gcp_crm_alert.sh’ from https://github.com/GoogleCloudPlatform/pacemaker-alerts-cloud-logging

2. Under root user, add exec flag for the script and execute deployment with:

code_block[StructValue([(u’code’, u’chmod +x ./gcp_crm_alert.shrn./gcp_crm_alert.sh -d’), (u’language’, u”)])]

3. Confirm that the deployment runs successfully. If it does, you will see the following INFO log messages:

In the Red Hat Enterprise Linux (RHEL) system:

code_block[StructValue([(u’code’, u”gcp_crm_alert.sh:2022-01-24T23:48:30+0000:INFO:’pcs alert recipient add gcp_cluster_alert value=gcp_cluster_alerts id=gcp_cluster_alert_recepient options value=/var/log/crm_alerts_log’ rc=0″), (u’language’, u”)])]

In the SUSE Linux Enterprise Server (SLES):

code_block[StructValue([(u’code’, u”gcp_crm_alert.sh:2022-01-25T00:13:27+00:00:INFO:’crm configure alert gcp_cluster_alert /usr/share/pacemaker/alerts/gcp_crm_alert.sh meta timeout=10s timestamp-format=%Y-%m-%dT%H:%M:%S.%06NZ to { /var/log/crm_alerts_log attributes gcloud_timeout=5 gcloud_cmd=/usr/bin/gcloud }’ rc=0″), (u’language’, u”)])]

Now, in the event of a cluster node, resource, node failover, or failed action, Pacemaker will start the alert mechanism. For further details on the alerting agent, check out the Pacemaker Explained documentation.

How to Use Cloud Logging for Alert Reporting

Alerted events are published in Cloud Logging. Below is an example of the log record payload, where the cluster alert key-value pairs get recorded in the jsonPayload node.

code_block[StructValue([(u’code’, u'{rn “insertId”: “ktildwg1o3fbim”,rn “jsonPayload”: {rn “CRM_alert_recipient”: “/var/log/crm_alerts_log”,rn “CRM_alert_attribute_name”: “”,rn “CRM_alert_kind”: “resource”,rn “CRM_alert_status”: “0”,rn “CRM_alert_rsc”: “STONITH-sapecc-scs”,rn “CRM_alert_rc”: “0”,rn “CRM_alert_timestamp_usec”: “”,rn “CRM_alert_interval”: “0”,rn “CRM_alert_node_sequence”: “21”,rn “CRM_alert_task”: “start”,rn “CRM_alert_nodeid”: “”,rn “CRM_alert_timestamp”: “2022-01-25T00:17:06.515313Z”,rn “CRM_alert_timestamp_epoch”: “”,rn “CRM_alert_desc”: “ok”,rn “CRM_alert_target_rc”: “0”,rn “CRM_alert_version”: “1.1.15”,rn “CRM_alert_attribute_value”: “”,rn “CRM_alert_node”: “sapecc-ers”,rn “CRM_alert_exec_time”: “”rn },rn “resource”: {rn “type”: “global”,rn “labels”: {rn “project_id”: “gcp-tse-sap-on-gcp-lab”rn }rn },rn “timestamp”: “2022-01-25T00:17:09.662557309Z”,rn “severity”: “INFO”,rn “logName”: “projects/gcp-tse-sap-on-gcp-lab/logs/sapecc-ers%2F%2Fvar%2Flog%2Fcrm_alerts_log”,rn “receiveTimestamp”: “2022-01-25T00:17:09.662557309Z”rn}’), (u’language’, u”)])]

To get notified of a resource event — for example, when the HANA topology resource monitor fails — you can use the following filter for the alerting definition:

code_block[StructValue([(u’code’, u’jsonPayload.CRM_alert_node=(“hana-venus” OR “hana-mercury”)rn-jsonPayload.CRM_alert_status=”0″rnjsonPayload.CRM_alert_rsc=”rsc_SAPHanaTopology_SBX_HDB00″rnjsonPayload.CRM_alert_task=”monitor”‘), (u’language’, u”)])]

To define an alert for a fencing event, your can apply this filter:

code_block[StructValue([(u’code’, u’jsonPayload.CRM_alert_node=(“hana-venus” OR “hana-mercury”)rnjsonPayload.CRM_alert_kind=”fencing”‘), (u’language’, u”)])]

The fencing log entry gets recorded with warning severity to give you deeper insight, and this additional information is also helpful for more specific filtering criteria:

code_block[StructValue([(u’code’, u'{rn “insertId”: “1plznskfjsxt82”,rn “jsonPayload”: {rn “CRM_alert_attribute_value”: “”,rn “CRM_alert_recipient”: “/var/log/crm_alerts_log”,rn “CRM_alert_rsc”: “”,rn “CRM_alert_rc”: “0”,rn “CRM_alert_timestamp_usec”: “529261”,rn “CRM_alert_desc”: “Operation reboot of hana-mercury by hana-venus for crmd.2361@hana-venus: OK (ref=2a9bf814-9adf-4247-af3f-94ac254fc3ca)”,rn “CRM_alert_target_rc”: “”,rn “CRM_alert_nodeid”: “”,rn “CRM_alert_kind”: “fencing”,rn “CRM_alert_node_sequence”: “33”,rn “CRM_alert_task”: “st_notify_fence”,rn “CRM_alert_status”: “”,rn “CRM_alert_exec_time”: “”,rn “CRM_alert_attribute_name”: “”,rn “CRM_alert_timestamp_epoch”: “1643072786”,rn “CRM_alert_version”: “1.1.19”,rn “CRM_alert_timestamp”: “2022-01-25T01:06:26.529261Z”,rn “CRM_alert_interval”: “”,rn “CRM_alert_node”: “hana-mercury”rn },rn “resource”: {rn “type”: “global”,rn “labels”: {rn “project_id”: “gcp-tse-sap-on-gcp-lab”rn }rn },rn “timestamp”: “2022-01-25T01:06:27.267017052Z”,rn “severity”: “WARNING”,rn “logName”: “projects/gcp-tse-sap-on-gcp-lab/logs/hana-venus%2F%2Fvar%2Flog%2Fcrm_alerts_log”,rn “receiveTimestamp”: “2022-01-25T01:06:27.267017052Z”rn}’), (u’language’, u”)])]

Alerts can be delivered through multiple channels, including text and email. Below is an example of an email notification for our earlier example, when we defined an alert for a HANA topology resource monitor failure:

You can write and apply filters to your log-based alerts to isolate certain types of incidents and analyze events over time. For example, the following script will surface a resource event occurring within a two-hour window on a specific date:

code_block[StructValue([(u’code’, u’timestamp>=”2022-01-25T00:00:00Z” timestamp<=”2022-01-25T02:00:00Z”rnjsonPayload.CRM_alert_kind=”resource”‘), (u’language’, u”)])]

With the ability to analyze these logged alerts over time, determine whether event patterns warrant any action.

[SIDEBAR]

The alert script prints details in the standard output and in the log file /var/log/crm_alerts_log, and this can grow over time. We recommend that the log file is set with the Linux logrotate service in order to limit the file system space. Use the following command to create the necessary logrotate setting for the alerting log file:

code_block[StructValue([(u’code’, u’cat > /etc/logrotate.d/crm_alerts_log << END-OF-FILErn /var/log/crm_alerts_log {rn create 0660 root rootrn rotate 7rn size 10Mrn missingokrn compressrn delaycompressrn copytruncatern dateextrn dateformat -%Y%m%d-%srn notifemptyrn}rnEND-OF-FILE’), (u’language’, u”)])]

[END SIDEBAR]

Tips for Troubleshooting

When you first deploy your alert script, how can you tell for certain that you’ve done it correctly? Use the following commands to test it out:

In RHEL:
pcs alert show

In SLES:
sudo crm config show | grep -A3 gcp_cluster_alert

You should see the following if the script is correct:

In RHEL:

code_block[StructValue([(u’code’, u’Alerts:rn Alert: gcp_cluster_alert (path=/usr/share/pacemaker/alerts/gcp_crm_alert.sh)rn Description: “Cluster alerting for hana-node-X”rn Options: gcloud_cmd=/usr/bin/gcloud gcloud_timeout=5rn Meta options: timeout=10s timestamp-format=%Y-%m-%dT%H:%M:%S.%06NZrn Recipients:rn Recipient: gcp_cluster_alert_recepient (value=gcp_cluster_alerts)rn Options: value=/var/log/crm_alerts_log’), (u’language’, u”)])]

In SLES:

code_block[StructValue([(u’code’, u’alert gcp_cluster_alert “/usr/share/pacemaker/alerts/gcp_crm_alert.sh” \rntmeta timeout=10s timestamp-format=”%Y-%m-%dT%H:%M:%S.%06NZ” \rntto “/var/log/crm_alerts_log” attributes gcloud_timeout=5 gcloud_cmd=”/usr/bin/gcloud”‘), (u’language’, u”)])]

If the commands do not display the alerts properly, re-deploy the script.

In case there is an issue with the script, or if the Cloud Logging records are not presenting as expected, examine the script log file /var/log/crm_alerts_log. The errors and warning can be filtered with:

egrep ‘(ERROR|WARN)’ /var/log/crm_alerts_log

Any Pacemaker alert failures will be recorded in the messages and/or Pacemaker log. To examine recent alert failures, use the following command:

egrep ‘(gcp_crm_alert.sh|gcp_cluster_alert)’
/var/log/messages /var/log/pacemaker.log

Keep in mind, though, that the Pacemaker log location may be different in your system from the one in the example above.

From reactive to proactive

Your SAP applications are too critical to risk outages. The most effective way to manage high availability clusters for your SAP systems on Google Cloud is to take full advantage of Pacemaker’s alerting capabilities, so you can be proactive in ensuring your systems are healthy and available.

Learn more about running SAP on Google Cloud.

Read More for the details.

2022 03 07

Azure – Public preview: On-demand capacity reservation with Azure Site Recovery safeguards VMs failover

Azure, Cloud Azure

Integrating Site Recovery with capacity reservation to reserve compute capacity in the disaster recovery region and use that for failover.

Read More for the details.

2022 03 04

AWS – Amazon RDS for MariaDB supports new minor versions 10.6.7, 10.5.15, 10.4.24, 10.3.34, 10.2.43

AWS, Cloud AWS

Amazon Relational Database Service (Amazon RDS) for MariaDB now supports MariaDB minor versions 10.6.7, 10.5.15, 10.4.24, 10.3.34 and 10.2.43. We recommend that you upgrade to the latest minor versions to fix known security vulnerabilities in prior versions of MariaDB, and to benefit from the numerous bug fixes, performance improvements, and new functionality added by the MariaDB community.

Read More for the details.

2022 03 04

AWS – Amazon SageMaker Serverless Inference (in Preview) and Asynchronous Inference add support for SageMaker Python SDK

AWS, Cloud AWS

Amazon SageMaker Serverless and Asynchronous Inference now support Amazon SageMaker Python SDK, which abstracts the steps required for deployment and thereby simplifies the model deployment workflow. The SageMaker Python SDK is an open source library for deploying machine learning models on Amazon SageMaker. You can use any of the optimized machine learning frameworks, SageMaker supported first-party algorithms, or bring your own model to deploy using the Python SDK.

Read More for the details.

2022 03 04

AWS – You can now customize how data is stored on your Amazon FSx for OpenZFS file system to optimize performance for database applications

AWS, Cloud AWS

You can now customize how data is stored on your Amazon FSx for OpenZFS file system to optimize performance for database applications and other workloads that consistently read and write data in small increments.

Read More for the details.

2022 03 04

Azure – test 1

Azure, Cloud Azure

test 1

Read More for the details.

2022 03 04

AWS – Configurable cipher suites now available for Amazon Aurora PostgreSQL

AWS, Cloud AWS

You can now configure your database connections on Amazon Aurora PostgreSQL-Compatible Edition from an allowable list of ciphers. Configurable cipher suites provide you more security control over the connection encryption that your database server accepts.

Read More for the details.

2022 03 04

GCP – How sustainability will drive industry transformation

Cloud, Google Cloud gcp

A century of increasing global industrialization created the woeful unintended consequence of crippling greenhouse gasses in our atmosphere. Now we have far less time than that to address this crisis. We have to reduce emissions five times faster in this decade than we have been over the last. How do we speed up the action?

Awareness is no longer the issue. On Monday the Intergovernmental Panel on Climate Change (IPCC) issued its latest assessment since August 2021. The IPCC, which reviews and synthesizes thousands of scientific papers on the current effects of climate, stated that “climate change is a threat to human well-being and the health of the planet. Any delay in concerted global action will miss the brief, rapidly closing window to secure a liveable future.”

Those catastrophes add to an urgency that companies around the world already know. In 2021 a joint report from the United Nations and Accenture that surveyed more than 1,200 chief executives worldwide found that nearly half of CEOs said extreme weather was seriously affecting their supply chains, and 81% said that they were already developing new products and services leveraging electrification, sustainable design and sustainable materials.

I have worked with companies on issues of climate and sustainability for many years, and can attest to this new urgency, both in the scientific reports and in the awareness among business leaders – it’s an industry transformation. We’re seeing coalitions of CEOs coming together to unlock barriers in the system and sharing data. The companies that embrace sustainability as core to their business, will be the ones that move ahead. This is the Decade of Action.

But how can things move faster? Leadership, collaboration and technology to attack the problem and unlock new business models are all a key part of the solution.

The CEOs I’ve seen steering policy, winning customer and employee loyalty, and mastering exciting new innovation and business challenges share a passionate and personal drive. Sundar Pichai has said climate change is the biggest challenge we face and one that will affect all of us, in deeply personal ways. At all levels of power, leaders speak of the personal effect climate change is having on them, whether it’s in the destruction of a beloved landscape, woe as they contemplate what their grandchildren’s world might resemble, or the frustration they hear from employees keen to do more.

Astonishingly, in a supposed era of the faceless corporation, the personal passion of leaders, their impatience and intolerance of the status quo, is a power determinant of progress and success. Leaders with conviction, and who are competitive in creating new business models that incorporate sustainability, will be at the forefront.

Next, collaboration. There is an enormous reset of relations going on because of climate change. We see it happening among nations, between governments and business, and with companies and their partners, customers and employees. Businesses can’t solve this on their own. They need to work with their supply chains to get full visibility of impacts, they need to work with customers and investors to support their business model transformation and they need to work at an industry level to ensure the rules of the game enable these new models. This is an opportunity to find new bases for cooperation, new shared goals, new standards and metrics by which to judge success. These happen best when all parties operate transparently and with a shared commitment to solving one of the most complex problems humankind has known.

Lastly, better data and technology. As strange as it may sound, COVID may have indirectly had a positive impact in the fight against climate change, since it showed how quickly individuals, governments, and businesses can change the way they operate. A great aid in this has been digital technology. Business will be at the heart of this transformation and technology will be a key enabler — through new cleaner energy sources, electrification of mobility but also through smarter, more efficient ways of working enabled by digital technologies. It will be imperative for companies to have better visibility into their data so that they can make more informed, impactful decisions – the underpinning of the sustainability transformation is data.

Unlike earlier industrial processes, digital technology enables far greater measurement of environments, processes, and remediation. That means we can analyze more complex systems, and see impacts as they happen, in system-wide views. The larger amounts of software, even in areas like manufacturing and transport, means that systems are more flexible too, adjusting to changing conditions.

I recently joined Google Cloud because I believe the Digital Era can remediate the unintended consequences of the Industrial Era. Cloud computing focuses the sensors, analytics, and software engineering of the Digital Era in the most efficient manner possible. We can optimize existing systems, and enable large system changes that extend to everything from product life extension to energy system flexibility and smarter distribution, along with sourcing and manufacturing that is fairer to populations in many parts of the world.

Google Cloud operates both the cleanest cloud in the industry today, and is offering new products, services, and research to make even more dramatic improvements ahead. At the highest level, Google’s 24/7 carbon-free energy goal by 2030 is working towards a dramatic reconfiguration of the global energy grid so we and others can run entirely on carbon-free sources less than eight years from now. Every utility and partner that joins us, and every company using Google Cloud, adds strength to this global mission.

There are a number of related issues that will involve greater use of cloud-based sensors, computer-driven climate models, Machine Learning, and AI. This might be the worsening duration and severity of wildfires because of climate change, an issue we’ve been addressing with AI to better track and predict fire. There are challenges to biodiversity and environmental integrity, something we’ve worked on with Unilever and both governmental and nongovernmental organizations. At an individual level, we’re building tools in search and software development that help companies grow their business with the lowest environmental impact.

It’s a combination of macro solutions, and micro behaviors. There are existing systems to improve, and entirely new things to envision and build. No single entity has all the answers, just as no single effect brought us to this crisis. Rather, it is a systemic view of the world, and a desire to act in more positive ways, that can help us move faster into healing. This is a transformation agenda, and Google Cloud brings the scale, reach and data that helps customers transform. Ultimately, the sustainable option becomes the better option.

Read More for the details.

2022 03 04

GCP – Google Cloud Text-to-Speech API now supports custom voices

Cloud, Google Cloud gcp

With the rise of digital assistants and conversational interfaces, people have grown accustomed to hearing and speaking to synthetic voices. But what do those voices sound like? Often, pretty repetitive. We’re all familiar with the Google Assistant voice, for example.

That’s why we are excited to announce the general availability of Custom Voice in our Cloud Text-to-Speech (TTS) API, a new feature that lets you train custom voice models with your own audio recordings to create unique experiences.

For businesses looking to build a strong brand identity, establishing a unique voice can help turn mobile app interactions or customer service based on interactive voice responses (IVR) into differentiated customer experiences. Our TTS API has included a speech synthesis service with a static list of voices for some time, but now, with Custom Voice, moving beyond these predefined options is easier than ever.

Custom Voice lets you simply submit your audio recordings to get access to the new voice directly in the TTS API. Custom Voice TTS includes guidance on the audio requirements to help make sure you generate a high quality custom TTS voice model. Once this new model is trained, all you have to do to start using the newly trained voice is reference the model ID in your calls to the Cloud TTS API.

At Google, we are committed to building safe and accountable AI products, not only because it’s the right thing to do, but because it is a critical step in ensuring successful use in production. As part of Google Cloud’s Responsible AI governance process, we conducted a deep ethical evaluation of Custom Voice TTS, and its relation to synthetic media, in order to surface and mitigate potential harms that it may create. If you are interested in Custom Voice TTS, there is a review process to help ensure each use case is aligned with our AI Principles and adequate voice actor consent is given.

Additionally, to verify that voice actors are actually the ones producing the audio, you will need to submit an audio file producing a sentence that Google Cloud chooses (for example: “I agree that my voice will be used to create a synthetic custom Text-to-Speech voice).

We’re looking forward to seeing this API help businesses solve problems in an easy, fast, and scalable way. TTS Custom Voice is now GA in these languages:

English (US)

English (AU)

English (UK)

Spanish (US)

Spanish (Spain)

French (France)

French (Canada)

Italian (Italy)

German (Germany)

Portugues (Brazil)

Japanese (Japan)

We plan to continue expanding this lineup in order to meet your needs. Ready to try for yourself? Contact your sellerto get started on your use case evaluation today!

Read More for the details.

2022 03 04

GCP – Event Monitoring with explanations on the Google Cloud

Cloud, Google Cloud gcp

Use Cases

We are describing a new production machine learning solution to monitor events in IT and industrial operations and explain their symptoms. This solution is used for a variety of industrial applications including proactively monitoring IT operations infrastructure, monitoring events in the Industrial Internet of Things (IoT) connected devices, and predictive monitoring to any IT operations management component such as hyperconverged, Clouds, virtual infrastructure, applications, networks and microservices.

The solution is deployed on Google Cloud Platform by combining the innovative research from Google’s corporate engineering and machine learning and operationalization tools of Google Cloud put together by Google Cloud’s professional services.

Key benefits of our approach are:

Google’s novel solution provides a scalable, unsupervised approach on largely unlabeled data to proactively monitor events in data streams and explain the predictions. Our approach is particularly useful when:

data is correlated and multi-modal,

failures are complex,

conditions are unpredictable, and

monitored components are too new to characterize normal and failure modes.

Our solution provides explanations of the predicted failures by using Google Research’s innovative model explainability technology.

Our solution has been deployed in a variety of industrial and IT management applications including:

Power and climate control in commercial buildings and power equipment.

IT infrastructure monitoring and management.

Badge readers and alarms in physical security systems.

Electromechanical components in power plants.

In this blog we describe how our solution is deployed to address industry critical problems of smart IT operations management for Zenoss, an IT service assurance company.

Algorithm

Imagine you’re a technician that maintains thousands of networked devices. These devices can be virtual machines, servers, HVAC units, engines, etc. that generate a data stream of timestamped, multidimensional measurement updates. Chances are high that at any given time, somewhere in the fleet there are faulty devices that require your attention. Due to complex device interactions and a dynamic environment, it may be impossible to characterize normal operating conditions with rules or even to train a machine learning classifier with labeled failure examples. Unsupervised anomaly detectors (trained without labels), like Isolation Forest, One-Class Support Vector Machines, are commonly used in those situations, but provide a nondescript alarm when the device generates unusual updates.

Detecting a faulty device is only the beginning of a technician’s task, and the repair requires more than that nondescript alarm to:

Determine if the anomaly is a true positive,

Diagnose the problem and estimate the root cause,

Triage and prioritize the problem,

Identify and apply a fix, and

Verify the fix was successful.

In the following paragraphs, we will consider three practical anomaly detection concepts that are essential to accomplishing these tasks: accuracy and explainability, sensitivity to correlation and modes, and deploying at scale.

Accuracy and Explainability
A patient might describe their ailment to a doctor with variable attribution, “my nose is congested and I have a severe headache”, and a contrastive normal “normally, I can breathe easily and I usually don’t have headaches”). Similarly, we must consider both detection accuracy (false positive and false negative error rates), and explainability. Like with human symptoms, an anomaly should be explained by:

variable attributions that assign a “blame” score to the most important variables, and

a nearest contrastive normal point to illustrate how far off the anomaly is from normal.

The chart below compares various anomaly detectors in terms of Detection Accuracy and Explainability.

Univariate Statistical methods apply outlier thresholds to each variable independently, and don’t recognize variable correlations or handle multimodal distributions.

Standard multivariate approaches, e.g., clustering, One Class Support Vector Machine (OC-SVM), Isolation Forest, or Extended Isolation Forest, provide medium to high detection accuracies, but no explanation.

Both DIFFI and Autoencoder+SHAP provide variable attributions and medium detection accuracy.

A supervised classifier trained on failure labels that uses Integrated Gradients provides contrastive explanations, but low detection accuracy.

Our solution combines both high detection accuracy and contrastive explanations.

The illustration below illustrates an explained anomaly on a Variable Air Volume (VAV) used to ventilate office spaces.

Anomaly Alert: The device registered a high Anomaly Score of 1.0 at 12:24 pm.

Variable Attribution: The anomaly is attributed to two variables: 43% of the anomaly detection is blamed on the heating water valve and 41% of the anomaly is blamed on the supply air flow rate.

Contrastive Explanation: The observed anomalous valve setting is 100% open, but the nearest normal point is 6%. Likewise the air flow rate is set at 1.2 thousand cubic feet per minute, but the nearest normal is 0.5.

With this information, the technician, like the doctor, can evaluate the symptoms, diagnose the problem and prescribe a treatment. Here, the technician may identify the root cause of this anomaly as an insufficient supply water temperature that is preventing the air from being heated properly, and fix the problem.

Achieve better detection accuracy
Under normal operations, variables are frequently correlated, sometimes nonlinearly. In a simple case, a thermostat’s room temperature reading and the setpoint should correlate linearly. However, in a diesel engine, the relationship between oil pressure, oil temperature, and RPM is much more complex.

Another complicating factor with devices that operate in different modes (idle/active, day/night, takeoff/cruise/landing) is that normal operations might generate multimodal distributions with multiple peaks in one or more dimensions. The scatterplot below illustrates how just two variables from a device generate complex interactions. The blue dots represent what the anomaly detector identifies as normal, the red dots are selected as contrastive exemplar points, and the gray dots represent anomalous regions that can appear as individual outliers or modes.

Our solution achieves better detection accuracy with its sensitivity to correlations and multimodal distributions by combining negative sampling and deep learning. A negative sample augments the unlabeled training data and represents anomalous space. A deep neural network then learns efficient decision boundaries between normal and anomalous regions, even with data that has complex correlations and multimodal distributions.

Deploy at Scale

Scale the Fleet with Cohorts. In a system with thousands of devices, it is usually impossible for a single anomaly detector to keep up with the combined data stream. It’s also not efficient to run a separate anomaly detector for each device. We need a way to distribute the anomaly detection processing. In Internet of Things (IoT) applications, devices typically fall into categories or device types. For example, in building climate control systems, there are air conditioners, blowers, hot water systems, ventilation units, shaders, etc. The devices that are of the same category and operating condition can be grouped into cohorts, and devices can be compared against their peers.

Assigning devices into cohorts provides two important advantages:

Larger histories from many similar devices provide richer statistical baselines that lead to better decision boundaries.

Launching anomaly detection processes on each cohort enables on-demand scaling from hundreds to millions of devices.

Handle device churn automatically. We have learned that whenever you monitor a large fleet, things are not stationary. Old devices are replaced, and even whole device types may be added or removed. For scalability, this “device churn” should be handled automatically. The logic for assigning cohorts tends to vary with use cases, but, in general, the cohort assignment process periodically updates a membership table that places each device to a cohort.

When a new device is added to the cohort, the appropriate anomaly detector begins to query for that device along with all the other devices in its cohort.

When a new cohort is created, a new anomaly detection process is initiated.

Likewise, when a device is removed from a cohort, the anomaly detector removes it from its query configuration.

When a cohort is dropped, the associated anomaly detection process is terminated.

Google’s first implementation on smart buildings. Originally designed to monitor tens of thousands of climate control devices in hundreds of buildings, at Google, we developed a general purpose pipeline for explained anomaly detection that operates at scale. We have open-sourced the machine learning algorithm and published a paper of our deep learning algorithm for explained anomalies. It combines negative-sampling with a deep learning classifier, and then applies Integrated Gradients to produce an explanation, selecting the “nearest normal” as a baseline point. The variable attributions, or blames, and that nearest normal point provide a rich contrastive explanation that provides the technician with the anomaly symptoms and enables the technician to reason about the root cause and select a treatment.

Machine Learning Solution

The solution brings together the following assets within Google:

A novel unsupervised anomaly detection algorithm from Google’s corporate engineering and Research.

A distinctive model explanation technology from Google’s research and implemented in over 20+ Google products including GCP.

Machine Learning (ML) and ML Operations (MLOps) tools of Google Cloud Platform (GCP).

A key feature of this solution is the MLOps pipeline implemented on GCP with its managed Kubeflow pipelines and Vertex AI Pipelines.

In the IT Operations management example, we have 1000s of physical IT devices or virtual infrastructure components or applications running on an IT infrastructure. We call these the components. We first identify a set of components that are equivalent, having the same fixed properties such as a disk cluster. We call this the cohort.

Our high level approach to the problem is to identify the cohorts which are similar components with similar behaviors and data so that we can monitor each cohort cluster for monitoring events. We then rank order the events or anomalies across the cohorts and explain the results. Figure below captures the steps of this process.

Going back to the IT Operations management example, we have 100s of devices added and deleted frequently. For your ML system to adapt to such changes, we need ML Operations (MLOps) techniques to:

Automate the execution of the ML pipeline to retrain new models on new data to capture emerging patterns. Continuous training and prediction is done with Kubeflow Pipelines or Vertex AI Pipelines.

Set up a continuous delivery system to deploy new implementations of the entire ML pipeline. CI/CD is achieved with Cloud Build.

Figure below shows a training pipeline implemented with Kubeflow and Vertex AI Pipelines:

We use the trained and deployed models in the following prediction pipeline with Kubeflow and Vertex AI Pipelines:

Zenoss AI-Driven Full-Stack Monitoring

Global enterprise customers in a number of industries are already using our novel monitoring solution to solve industry specific challenges.

For Zenoss, a Forester leader in Intelligent Application and Service Monitoring, a machine learning solution that performs at very high accuracy with minimal operational overhead is essential to its AIOps strategy. At a high level, AIOps tools do two things — they collect data and they analyze data — in the interest of accelerating problem resolution in IT operations. Users of application and service monitoring tools for IT operations expect more and more artificial intelligence in the solutions they rely on to keep their systems healthy. The promise of applying machine learning to efficiently analyze data, triage problems, and even remediate the problem without any human intervention, has now become reality.

To address this growing demand for AIOps on infrastructure monitoring platforms, Zenoss partnered with Google Cloud’s AI team to reimagine the way ML could be applied to anomaly detection. We brought our learnings from the Smart Buildings team, who developed an AI-based fault-detection solution to help find and fix problems in climate control devices in large office buildings. We also leveraged Vertex AI, which brings together the Google Cloud services for building ML under one, unified UI and API. The result was a distributed deep-learning solution inside Zenoss’ platform that provides explanations to aid understanding, prioritizing and fixing faults in IT infrastructure.

A key benefit of this solution for Zenoss’ customers is that newly monitored devices no longer need to collect months worth of data to train the model to make accurate predictions. Our open-sourced machine learning algorithm provides a scalable, unsupervised approach to proactively monitor events, giving IT operations professionals the immediate insight they need to reliably monitor IT services. Using our algorithm, Zenoss has delivered Google-powered anomaly detection to enterprise customers, helping them ensure their business critical IT infrastructure is always available.

Screenshot of Zenoss SmartView with anomaly detection

“We know that AIOps represents the future for many of our customers,” says Ani Gujrathi, CTO at Zenoss. “Our customers want highly reliable and actionable data about the state of their infrastructure surfaced to IT Operations with as little human intervention as possible—on-prem, in the public cloud, or in multiple public clouds—seamlessly and securely. We’re excited to make that possible by integrating Google’s proven anomaly detection algorithm with our full-stack monitoring platform, streamlining our MLOps using Vertex AI, and growing our partnership with Google Cloud.”

Google Cloud’s commitments to making machine learning more accessible and useful and to increasing the efficacy of machine learning for enterprises in their unique industries are at the core of everything we do. With the myriad of proven machine learning solutions within Google, the breadth of AI/ML specialists within Google Cloud, and the suite of unified machine learning tools within Vertex AI, organizations can take advantage of these capabilities to transform their businesses.

Ready to start ML modeling with Vertex AI? Start building for free. Want to know how Vertex AI Platform can help your enterprise increase return on ML investments? Contact us.

Acknowledgement: Micah Knox, Principal Architect, Google Cloud

Read More for the details.

2022 03 03

AWS – Amazon Kendra adds spell checker for queries

AWS, Cloud AWS

Amazon Kendra is an intelligent search service powered by machine learning, and helps organizations provide more relevant information to customers and employees, when they need it. Starting today, AWS customers can use Amazon Kendra’s Spell Checker to suggest spell corrections for misspelled words in a query.

Read More for the details.

2022 03 03

AWS – Amazon EMR now supports auto termination of idle clusters in Europe (Milan), Asia Pacific (Hong Kong), Africa (Cape Town), and Middle East (Bahrain) regions

AWS, Cloud AWS

Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. Today, we are excited to announce that Amazon EMR now supports auto termination of idle clusters in Europe (Milan), Asia Pacific (Hong Kong), Africa (Cape Town), and Middle East (Bahrain) regions.

Read More for the details.

2022 03 03

AWS – Amazon RDS for Oracle now supports October 2021 Patch Set Update (PSU) for Oracle Database 12.1

AWS, Cloud AWS

Amazon Relational Database Service (Amazon RDS) for Oracle now supports the October 2021 Patch Set Update (PSU) for Oracle Database 12.1. October 2021 Release Updates (RU) for Oracle Database 12.2 and 19c is already launched.

Read More for the details.

2022 03 03

AWS – Amazon Keyspaces now helps you automate resource management by using the AWS SDK

AWS, Cloud AWS

Amazon Keyspaces (for Apache Cassandra)—a scalable, highly available, and fully managed Apache Cassandra-compatible database service—now helps you automate resource management by using the AWS SDK.

Read More for the details.

2022 03 03

Azure – Public preview: Azure Percept DK February (2202) software update

Azure, Cloud Azure

The Azure Percept February update includes fixes related to security.

Read More for the details.

2022 03 03

GCP – Four Ways States Use Digital Tools to Deliver Critical Services

Cloud, Google Cloud gcp

Government leaders are reimagining the way they serve communities in the new digital era. They’ve traded manual, paper-based processes and in-person services for digitized, flexible solutions that offer better constituent experiences and improve operations. Google Cloud is working with SpringML to help public sector agencies address four pandemic-driven challenges with digital solutions.

Keeping the World Moving: Digital Immunization Pass

The speed and accuracy in processing COVID-19 documents is important to many organizations, including transportation authorities, hospitals, entertainment venues, educational institutions, fitness centers, and businesses of all sizes. Unlike paper, digital immunization credentials are easy to access and difficult to forge. SpringML’s digital immunization pass provides a safe way for individuals to demonstrate their immunization status or share test results and allows organizations to make informed decisions. Using Google Cloud as its foundation, SpringML’s digital immunization pass can provide access to multiple registries operated by states or private labs.

The solution is available on various devices and can work with existing digital wallet technology. Individuals can control who accesses their records, which can be configured to disclose only the minimum necessary information to the verifier. A call center application interface (CCAI) provides automated agent support, and advanced analytics track mobility trends via the data captured.

Enabling Telehealth: Electronic Visit Verification

Since the start of the pandemic, the number of virtual visits to healthcare providers has skyrocketed. But Medicare, Medicaid, and other healthcare payers must be able to verify such visits, or they run the risk of permitting fraud. In fact, Section 12006 of the 21st Century Cures Act mandates that states must implement an electronic visit verification (EVV) system for all Medicaid-covered home-based healthcare.

EVV verifies the date, time and site of a provider visit, as well as identifying the services provided and who provided them. SpringML’s EVV solution can use a few key data points to flag high probability fraudulent claims, using a machine learning model originally designed to detect false unemployment claims. The solution checks the claim application for anomalies and formulates a “score” for the likelihood of fraud, displaying the information on an anomaly-detection dashboard. It is built on Google Cloud, which provides flexibility and scale while allowing it to integrate with appointment management and patient billing solutions. The solution is secure and compliant with FedRAMP, HIPAA, and SOC 2.

Virtual or Live: Omni-Channel Citizen Engagement

Government call centers are often overwhelmed with requests, making it difficult for live agents to respond to each caller in a timely manner with comprehensive information and assistance. This is particularly challenging in places where citizens might not have reliable internet access, creating equity gaps in areas that often need critical information and government services the most.

SpringML’s AI-enabled call center solution allows states to integrate virtual agents and chatbots to improve customer service, automate common tasks, and boost efficiency. For example, a telephone call might be answered by a virtual agent. If the caller’s needs are not met by the bot, they can be transferred to a live agent, who has access to the previous information. Similarly, someone who connects via a computer interface can seamlessly move from a virtual to a live agent. SpringML’s omni-channel solution ensures that citizens can connect to a person no matter how their contact originated.

Opening the State of Hawaii for Business and Leisure Travel

When pandemic restrictions began to ease, Hawaii’s health officials needed a way to screen and track health data for all travelers so they could quickly identify and quarantine those with symptoms without overly burdening state resources and traveler movement—or risking safety.

The state needed a scalable digital solution that could be deployed across its systems to track traveler data in real time – while avoiding the inconvenience and expense of one-on-one human interactions. The solution needed to support a multilayered process incorporating several state departments – and officials wanted to open the state to visitors as soon as possible to help regenerate the state’s economy.

Google Cloud and SpringML took only six weeks to build Safe Travels, an electronic visit verification program that allows the state to collect and track travel and health information for all visitors. Since its launch in August 2020, more than 2.6 million travelers have used the program.

Better Together

The partnership of SpringML and Google Cloud offers many advantages in terms of rapid application development and deployment. A regional systems integrator, SpringML can move quickly, frequently providing a proof-of-concept within days and going live within weeks. Its solutions are serverless, requiring little-to-no developer experience and minimal technical staff to deploy. SpringML and Google Cloud also offer flexibility; their out-of-the-box public data, dashboards, analytics, and best-in-class models integrate any data source to the analytics platform. This flexibility is more important than ever when speed matters in delivering vital government services.

Read More for the details.

Cloud

Data needs to be readily available

Data should be updated in real-time

Data needs to be accessible and easy to use

Micro Focus Enterprise Server Blueprint now available for Google Cloud

Micro Focus Enterprise Server on Google Cloud utilizes the full power of the cloud for your mainframe business workloads

Deploy an enterprise grade mainframe infrastructure on the press of a button

Is a cloud migration on your to do list? Our top stories from 2021 can help

What is new

Disable information auto-population in log entries

How the current context is populated

Use Logging library with logging agents

Some limitations of using Logging Agents

What is next

Getting Started with Google Cloud Logging Python v3.0.0

How to Deploy the Alert Script

How to Use Cloud Logging for Alert Reporting

Tips for Troubleshooting

From reactive to proactive

Starting over on Google Cloud: BK Medical shares its SAP migration story

Reduce your cloud carbon footprint with new Active Assist recommendations

Unveiling a new visual user interface for Google Cloud’s Speech-to-Text API

Use Cases

Algorithm

Deploy at Scale

Machine Learning Solution

Zenoss AI-Driven Full-Stack Monitoring

Extending network reachability of Vertex AI Pipelines

Keeping the World Moving: Digital Immunization Pass

Enabling Telehealth: Electronic Visit Verification

Virtual or Live: Omni-Channel Citizen Engagement

Opening the State of Hawaii for Business and Leisure Travel

Better Together