High-performance computing (HPC) applications such as numerical simulation — whether for forecasting, mechanical and structure simulation, or computational chemistry — require a large number of CPUs for processing. To meet these needs, customers must buy a large-scale system that enables parallel processing so that the simulation can be completed in the shortest possible time. Such solutions are available in two forms: scale-up and scale-out.
Traditionally, scale-up customers have had no choice but to purchase high-cost, proprietary shared-memory symmetric multiprocessing (SMP) systems for HPC needs with proprietary operating systems such as AIX, Solaris, HPUX and others. These SMP systems require significant investment in system-level architecture by computer manufacturers.
While SMP systems with up to eight processors can use off-the-shelf chipsets to provide most of the required system aspects, systems with more processors require significant investment in R&D. The result of the high R&D investment has been an expensive solution that uses proprietary technology based on custom hardware and components. Most of the SMP systems with eight processors or more utilize non-x86 processors, which has greatly contributed to the high price of SMP systems.
Then came the Beowulf project, which helped pave the path to an entirely new approach to the SMP.
Linux Helps Pioneer a Cluster Revolution
As x86 server systems became the commodity server infrastructure, users began to look for other, more accessible and affordable ways to handle their large workloads. They applied cluster technology to unify computers so that they could handle compute-intensive operations.
The Beowulf cluster project pioneered the use of off-the-shelf, commodity computers running open source, Unix-like operating systems such as BSD and GNU /Linux for HPC. It wasn’t long before this concept was adapted for clusters instead of traditional SMPs by companies like IBM and HP, which began to sell their own cluster systems, and for good reason: With Beowulf clusters, there was a lower initial purchase price, open architecture and better performance than SMP systems running proprietary Unix.
Despite Linux’s market penetration, ease of use and portability, proprietary Unix coupled with traditional SMP systems still maintained a significant footprint in the market. The reason for this was that large-memory applications, as well as multi-threaded applications, could not fit into off-the-shelf and small-scale x86 servers running Linux. Linux clusters, however, captured a significant portion of the market where Message-Passing Interface (MPI) applications were used.
Even so, regardless of their pervasiveness in the market, clusters still pose some key challenges to users, including complexity of installation and management of multiple nodes, as well as requiring distributed storage and job scheduling, which can generally be handled only by highly trained IT personnel.
That’s where virtualization for aggregation comes in.
Virtualization for Aggregation
Server virtualization and its purpose are familiar to the industry by now: By decoupling the hardware from the operating environment, users can convert one single server into multiple virtual servers to increase hardware utilization.
Virtualization for aggregation does the reverse: It combines a number of commodity x86 servers into one virtual server, providing a larger, single system resource (CPU, RAM, I/O, etc.). Users are able to manage a single operating system while leveraging virtualization for aggregation’s ability to enable a high number of processors with large, contiguous shared memory.
One of the great benefits of a system built with virtualization for aggregation is that it eliminates the complexity of managing a cluster, allowing for users to manage their systems more easily and reduce management time overall. This is especially helpful for projects that have no dedicated IT staff.
Thus, aggregation provides an affordable, virtual x86 platform with large, shared memory. Server virtualization for aggregation replaces the functionality of custom and proprietary chipsets with software and utilizes only a tiny fraction of a system’s CPUs and RAM to provide chipset-level services without sacrificing system performance.
Virtualization for aggregation can be implemented in a completely transparent manner and does not require additional device drivers or modifications to the Linux OS.
Using this technology to create a virtual machine (VM), customers can run both distributed applications as well as large-memory applications optimally, using the same physical infrastructure and open source Linux. With x86, Linux can scale-up like traditional, large-scale proprietary servers.
Linux scalability to support these large VMs is critical for the success of aggregated VMs. Recent enhancements to the Linux kernel, such as support for large NUMA systems, make it possible.
Now that Linux provides a scalable OS infrastructure, applications requiring more processing or memory for better performance can implement virtualization for aggregation, while taking advantage of the price and performance advantages of commodity components.
Even more exciting is that virtualization for aggregation can create the largest SMP systems in the world. These systems are so large that current workloads do not even use their memory and CPU capacity — meaning that in the future, users with compute-intensive needs can begin coding applications without worrying about these limitations.
Shai Fultheim is the founder and CTO of ScaleMP.