Challenges With Managing Nodes

A high-performance cluster is an array of modern processors working in parallel to transform massive amounts of data. A cluster presents a new set of challenges: millions of new relationships between CPU/GPU processors, memory, network, storage, and layers of middleware and software. Like a chain, success depends on the weakest link.

Node Image 1

1,000

data points per second in each node

Node Image 2

100

nodes

Node Image 3

8.64 Billion

data points per day

How does a human being track that kind of data? The number of relationships for a human to comprehend is overwhelming.

AutonomousHPC to Manage at Scale

AutonomousHPC can record, analyze, and understand over 1000 metrics per node, per second. A small cluster can easily generate over a billion data points per day. This is beyond any human administrator’s ability to remember, understand, and action in real time.

What can AutonomousHPC help you with?

  1. Understand the bottlenecks in your system.
  2. Prioritize system upgrades efficiently.
  3. Optimize your system configuration.
  4. Keep detailed accounting records for billback.
  5. Plan for new workloads without sacrificing SLAs.
  6. Track trends.
  1. Leverage a built-in graphical display.
  2. Be aware of trends in your system.
  3. Be alerted to failures and critical changes to the function of your system.
  4. Build confidence in storage tiering, archiving, and backup.
  5. Add automation for management and optimization.
Managed at Scale

Health Care

Real-life SLAs

Genome pipeline example: Typical genomic assays in 4 to 6 hours jumped to 24 hours. These performance challenges were preventing the customer from meeting real-life service level agreements (SLAs) for cancer patients. Algorithms in AutonomousHPC helped understand the sophisticated changes in the system and indicate a path to resolution.

Analyze what a normal condition is for the HPC. What happened in the past? What changed? AutonomousHPC knows the system and how it should operate. It determined that the cache was filling up and, even though it operated fine for weeks, the situation ultimately caused the system failure.

Healthcare

Education and Research and Development

Performance

Every industry faces performance challenges. If you do not spend correctly, you could just be throwing dollars at the problem.

System planning example: Education customer has a $500,000 budget to upgrade the system. High-performance clusters are a great way to turn a computational bottleneck into a storage bottleneck.

As the speed of the processor doubles, the supporting infrastructure of bandwidth, buses, memory, network, and storage has to keep pace. Adding additional computational horsepower to a system without adequate supporting infrastructure won’t yield an increase in overall performance.

AutonomousHPC helped understand the performance bottlenecks in order to efficiently distribute the budgetary dollars to ensure maximum output from a harmonized system.

Education

Oil and Gas

Data center efficiency

Massive oil/gas supercomputing clusters: In Houston, HPC managers’ paychecks are tied to data center efficiency. Every dollar counts toward the annual bonus. This includes hardware, software, administrators’ salaries, power, hosting, etc. AutonomousHPC is used to ferret out any inefficiencies in the system to maximize the output while minimizing actual costs.

Gas and Oil

Insurance Company

Operations

Spectrum Scale provides the data layer for key claims and policy operations. Any outage results in immediate challenges to customer-facing operations. AutonomousHPC is on the lookout for configuration changes, suspicious trends, and alerts of critical issues.

Insurance

Every System Is Different

Although supercomputers are typically built from the same commodity components, each cluster is a unique blend of manufacturers, technologies, proportions, workloads, and applications. No one vendor can support the entire system, leaving the support to the incumbent engineering staff. AutonomousHPC is the tool that learns how the system functions and detects subtle trends and substantial changes, ensuring that the human administrative team is operating with the best understanding of the system they own.

Every System is Different

Built on Open Standards to Help You Do More

AutonomousHPC is built on familiar technology. It is a plug-in for Elasticsearch for analytics and Grafana for graphic reporting elements. It is a modular appliance that lives on-premises, so your data never leaves your firewall and it does not add processing overhead to your HPC. Your administrators can manage more nodes and petabytes.

Do More

You Determine How Autonomous You Want It to Run

Just like the latest generation of autonomous cars, which can drive for you but allow you to operate with help (lane notification, emergency braking, etc.), AutonomousHPC can either adjust for you or notify you of which changes are needed.

Alerts sent via email, SMS, and dashboard. You have the data necessary to manage your system the way you want to manage it.