A high-performance cluster is an array of modern processors working in parallel to transform massive amounts of data. A cluster presents a new set of challenges: millions of new relationships between CPU/GPU processors, memory, network, storage, and layers of middleware and software. Like a chain, success depends on the weakest link.
1,000
data points per second in each node
100
nodes
8.64 Billion
data points per day
How does a human being track that kind of data? The number of relationships for a human to comprehend is overwhelming.
AutonomousHPC can record, analyze, and understand over 1000 metrics per node, per second. A small cluster can easily generate over a billion data points per day. This is beyond any human administrator’s ability to remember, understand, and action in real time.
What can AutonomousHPC help you with?
Real-life SLAs
Genome pipeline example: Typical genomic assays in 4 to 6 hours jumped to 24 hours. These performance challenges were preventing the customer from meeting real-life service level agreements (SLAs) for cancer patients. Algorithms in AutonomousHPC helped understand the sophisticated changes in the system and indicate a path to resolution.
Analyze what a normal condition is for the HPC. What happened in the past? What changed? AutonomousHPC knows the system and how it should operate. It determined that the cache was filling up and, even though it operated fine for weeks, the situation ultimately caused the system failure.
Performance
Every industry faces performance challenges. If you do not spend correctly, you could just be throwing dollars at the problem.
System planning example: Education customer has a $500,000 budget to upgrade the system. High-performance clusters are a great way to turn a computational bottleneck into a storage bottleneck.
As the speed of the processor doubles, the supporting infrastructure of bandwidth, buses, memory, network, and storage has to keep pace. Adding additional computational horsepower to a system without adequate supporting infrastructure won’t yield an increase in overall performance.
AutonomousHPC helped understand the performance bottlenecks in order to efficiently distribute the budgetary dollars to ensure maximum output from a harmonized system.
Data center efficiency
Massive oil/gas supercomputing clusters: In Houston, HPC managers’ paychecks are tied to data center efficiency. Every dollar counts toward the annual bonus. This includes hardware, software, administrators’ salaries, power, hosting, etc. AutonomousHPC is used to ferret out any inefficiencies in the system to maximize the output while minimizing actual costs.
Operations
Spectrum Scale provides the data layer for key claims and policy operations. Any outage results in immediate challenges to customer-facing operations. AutonomousHPC is on the lookout for configuration changes, suspicious trends, and alerts of critical issues.
Although supercomputers are typically built from the same commodity components, each cluster is a unique blend of manufacturers, technologies, proportions, workloads, and applications. No one vendor can support the entire system, leaving the support to the incumbent engineering staff. AutonomousHPC is the tool that learns how the system functions and detects subtle trends and substantial changes, ensuring that the human administrative team is operating with the best understanding of the system they own.
AutonomousHPC is built on familiar technology. It is a plug-in for Elasticsearch for analytics and Grafana for graphic reporting elements. It is a modular appliance that lives on-premises, so your data never leaves your firewall and it does not add processing overhead to your HPC. Your administrators can manage more nodes and petabytes.
Just like the latest generation of autonomous cars, which can drive for you but allow you to operate with help (lane notification, emergency braking, etc.), AutonomousHPC can either adjust for you or notify you of which changes are needed.
Alerts sent via email, SMS, and dashboard. You have the data necessary to manage your system the way you want to manage it.