Lightweight Linux for high-performance computing

Vendors are working to overcome "operating system jitter" problems caused by daemons and services running by default

1 2 Page 2
Page 2 of 2

Other Linux memory allocation functions were also designed for capacity, not for performance. When Linux allocates memory on a per-segment basis, it is not physically contiguous. As a result, kernel drivers are less efficient at programming DMAs, and can demonstrate increased overhead and reduced I/O performance. So far, Linux is also just beginning to provide full support for RDMA, which is used frequently in HPC, though ongoing industry efforts are in the process of improving this.

Developers also must ensure that CNL provides the reliability that users and applications demand. After all, current industry reliability testing conducted on Linux focuses primarily on conventional enterprise or Web server environments -- not on Linux' reliability in large-scale HPC systems. Linux is a complicated operating system, requiring a large amount of resources, so even a "pruned-back" implementation of Linux such as CNL will encompass many more interactions and significantly more lines of code than a specialized compute node operating system such as Catamount. As a result, HPC compute nodes running Linux are more likely to exhaust resources, and are likely to exhaust them in different ways than in conventional Linux environments. Cray and others developing CNL must ensure that the operating system can handle these instances in a graceful manner.

Realizing Linux on compute nodes

Cray has been spearheading the effort to develop and refine CNL and has been working with its partners in the HPC community -- including Argonne National Laboratory and other government-sponsored researchers -- to resolve CNL's remaining issues. This work is progressing steadily. Today, developers are seeing good performance on many HPC applications running on systems that employ CNL, and many of the issues described above have already been successfully addressed. CNL developers are now working on gathering data at scale. Researchers are conducting testing today in the low thousands of processors, but they will move to the range of 10,000 to 20,000 processors soon.

Cray believes that this work will soon allow even the largest-scale supercomputing environments to benefit from using Linux at both the system level and on HPC compute nodes. In the long run, CNL may hold even greater potential, and may provide an ideal compute node operating system that can benefit all HPC users.

Kaplan is currently a system architect involved in the technical coordination of software design and development, with a focus on operating systems, supervisory systems and hardware interfaces. He earned a bachelor's degree in computer science modified with electrical engineering at Dartmouth College and a master's degree in computer science at NYU's Courant Institute. Following graduate school, he spent three years working for BBN Advanced Computers in the area of parallel operating systems, and then joined Tera Computer Co. in 1991, which became Cray in 2000. At Tera/Cray, he has played a key role in the design and development of several Cray systems, including the Cray XT3 and MTA/Eldorado, primarily as an operating system engineer.

This story, "Lightweight Linux for high-performance computing" was originally published by LinuxWorld-(US).

Copyright © 2006 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon