Bolstering U.S. Supercomputing






Bolstering U.S. Supercomputing

The nation’s needs for supercomputers to strengthen defense and national security cannot be satisfied with current policies and spending levels.

In November 2004, IBM’s Blue Gene/L, developed for U.S. nuclear weapons research, was declared the fastest supercomputer on the planet. Supercomputing speed is measured in teraflops: trillions of calculations per second. Blue Gene/L achieved on one computation 70.72 teraflops, nearly doubling the speed of Japan’s Earth Simulator, the previous recordholder at 35.86 teraflops. Despite Blue Gene/L’s blazing speed, however, U.S. preeminence in supercomputing, which is imperative for national security and indispensable for scientific discovery, is in jeopardy.

The past decade’s policies and spending levels are inadequate to meet the growing U.S. demand for supercomputing in critical national areas such as intelligence analysis, oversight of nuclear stockpiles, and tracking climate change. There has been little long-term planning for supercomputing needs and inadequate coordination among relevant federal agencies. These trends have reduced opportunities to make the most of this technology. The federal government must provide stable long-term funding for supercomputer design and manufacture and also support for vendors of supercomputing hardware and software.

Supercomputers combine extremely fast hardware with software that can solve the most complex computational problems. Among these problems are simulation and modeling of physical phenomena such as climate change and explosions, analyzing massive amounts of data from sources such as national security intelligence and genome sequencing, and designing intricate engineered products. Supercomputers are top not only in performance but also in cost: The price tag on the Earth Simulator has been estimated at $500 million.

Supercomputing has become a major contributor to the economic competitiveness of the U.S. automotive, aerospace, medical, and pharmaceutical industries. The discovery of new techniques and substances, as well as cost reduction through simulation rather than physical prototyping, underlies progress in a number of economically important areas. Many technologies initially developed for supercomputers have enriched the mainstream computer industry. For example, multithreading and vector processing are now used on personal computer chips. Application codes that required supercomputing performance when they were developed are now routinely used in industry. This trickle-down process is expected to continue and perhaps even intensify.

But progress in supercomputing has slowed in recent years, even though today’s computational problems require levels of scaling and speed that stress current supercomputers. Many scientific fields need performance improvements of up to seven orders of magnitude to achieve well-defined computational goals. For example, performance measured in petaflops (thousands of teraflops) is necessary to conduct timely simulations that, in the absence of real-world testing, will certify to the nation that the nuclear weapons stockpile is safe and reliable. Another example is climate modeling for increased understanding of climate change and to enable forecasting. A millionfold increase in performance would allow reliable prediction of regional and local effects of certain pollutants on the atmosphere.

The success of the killer micros

The disheartening state of supercomputing today is largely due to the swift rise of commodity-based supercomputing. That is clear from the TOP500, a regularly updated list of the 500 most powerful computer systems in the world, as measured by performance on the LINPACK dense linear algebra benchmark (an imperfect but widely used measure of performance on real-world computational problems). Most systems on the TOP500 list are now clusters, systems assembled from commercial off-the-shelf processors interconnected by off-the-shelf switches. Fifteen years ago, almost all TOP500 systems were custom supercomputers, built of custom processors and custom switches.

Cluster supercomputers are a prime example of Moore’s law, the observation that processing power doubles every 18 months. Cluster supercomputers have benefited from the huge investments in commodity processors and rapid increases in processor performance. For many applications, cluster technology offers supercomputing performance at the cost/perfor-mance ratio of a personal computer. For applications with the characteristics of the LINPACK benchmark, the cost of a cluster can be an order of magnitude lower than the cost of a custom supercomputer with the same performance. However, many important supercomputing applications have characteristics that are very different from those of LINPACK; these applications run well on custom supercomputers but achieve poor performance on clusters.

The success of clusters has reduced the market for custom supercomputers so much that its viability is now heavily dependent on government support. At less than $1 billion annually, the market for high-end systems is a minuscule fraction of the total computer industry, and according to International Data Corporation, more than 80 percent of high-end system purchases in 2003 were made by the public sector. Historically, the government has ensured that supercomputers are available for its missions by funding supercomputing R&D and by forging long-term relationships with key providers. Although active government intervention has risks, it is necessary in situations like this, where the private market is nonexistent or too small to ensure a steady flow of critical products and technologies. This makes sense because supercomputers are public goods, an essential component of government missions ranging from basic research to national security.

Yet government support for the development and acquisition of such platforms has shrunk. And computer suppliers are reluctant to invest in custom supercomputing, because the market is so small, the financial returns are so uncertain, and the opportunity costs of moving skilled personnel away from products designed for the broader IT market are considerable. In addition, the supercomputing market has become unstable, with annual variations of more than 20 percent in sales. Consequently, companies that concentrate primarily on developing supercomputing technologies have a hard time staying in business. Currently, Cray, which almost went out of business in the late 1990s, is the only U.S. firm whose chief business is supercomputing hardware and software and the only U.S. firm that is building custom supercomputers. IBM and Hewlett-Packard produce com-modity-based supercomputer systems as one product line among many. Most supercomputing applications software comes from the research community or from the applications developers themselves.

The limits of clusters

For increasingly important problems such as computations that are critical for nuclear stockpile stewardship, intelligence analysis, and climate modeling, an acceptable time to solution can be achieved only by custom supercomputers. Custom systems can sometimes reduce computation time by a factor of 10 or more, so that a computation that would take a cluster supercomputer a month is completed in a few days. Slower computation might cost less, but it also might not meet deadlines in intelligence analysis or allow research to progress fast enough.

This speed problem is getting worse. As semiconductor and packaging technology gets better, different components of a supercomputer improve at different rates. In particular, processor speed increases much faster than memory access time. Custom supercomputers overcome this problem with a processor architecture that can support a very large number of concurrent memory accesses to unrelated memory locations. Commodity processors support a modest number of concurrent memory accesses but reduce the effective memory access time by adding large and often multilevel cache memory systems. Applications that are unable to take advantage of the cache normally will scale in performance at the memory speed, not the processor performance speed. As the gap between processor and memory performance continues to grow, more applications that now make good use of a cache will be limited by memory performance. The problem affects all applications, but it affects scientific computing and supercomputing sooner because commercial applications usually can take better advantage of caches. A similar gap affects global communication: Although processors run faster, the physical dimensions of the largest supercomputers continue to increase, whereas the speed of light, which bounds the speed of interprocessor communication, does not increase.

Continued leadership in essential supercomputing technologies will require an industrial base of multiple domestic suppliers.

As transistors continue to shrink, hardware fails more frequently; this affects very large, tightly coupled systems such as supercomputers more than smaller or less-coupled systems. Also, the ability of microprocessor designers to translate the increasing number of transistors on a chip into increased processor performance seems to have reached its limits; processor performance continues to improve as clock rates continue to increase, but vendors now leverage the increased transistor count by putting an increasing number of processor cores on each chip. As a result, the number of processors per system will need to increase rapidly in order to sustain past rates of supercomputer performance improvement. But current algorithms and applications do not scale easily to systems with hundreds of thousands of processors.

Although clusters have reduced the hardware cost of supercomputing, they have increased the programming effort needed to implement large parallel codes. Scientific codes and the platforms on which they run have become more complex, but the application development environments and tools used to program complex parallel scientific codes are generally less advanced and less robust than those used for general commercial computing. As a result, software productivity is low. Custom systems could support more efficient parallel programming models, but this potential is largely unrealized. No higher-level programming notation that adequately captures parallelism and locality (the two main algorithmic concerns of parallel programming) has emerged. The reasons include the very low investment in supercomputing software such as compilers for parallel systems, the desire to maintain compatibility with prevalent cluster architecture, and the fear of investing in software that runs only on architectures that may disappear in a few years. The software problem will worsen as higher levels of parallelism are required and as global interprocessor communication slows down relative to processor performance.

Thus, there is a clear need for scaling and software improvements in supercomputing. New architectures are needed to cope with the diverging improvement rates of various components such as processor speed versus memory speed. New languages, new tools, and new operating systems are needed to cope with the increased levels of parallelism and low software productivity. And continued improvements are needed in algorithms to handle larger problems; new models that improve performance, accuracy, or generality; and changing hardware characteristics.

It takes time to realize the benefits of research into these problems. It took more than a decade from the creation of the first commercial vector computer until vector programming was well supported by algorithms, languages, and compilers. Insufficient funding for the past several years has emptied the research pipeline. For example, the number of National Science Foundation (NSF) grants supporting research on parallel architectures has been cut in half over little more than 5 years; not coincidentally, the number of scientific publications on high-per-formance computing has been reduced by half as well. Although many of the top universities had large-scale prototype projects exploring high-performance architectures a decade ago, no such effort exists today in academia.

Making progress in supercomputing

U.S. needs for supercomputing tostrengthen defense and national security cannot be satisfied with current policies and levels of spending. Because these needs are distinct from those of the broader information technology (IT) industry, it is up to the government to ensure that the requisite supercomputing platforms and technologies are produced. Government agencies that depend on supercomputing, together with Congress, should take primary responsibility for accelerating advances in supercomputing and ensuring that there are multiple strong domestic suppliers of both hardware and software.

The federal agencies that depend on supercomputing should be jointly responsible for the strength and continued evolution of the U.S. supercomputing infrastructure. Although the agencies that use supercomputers have different missions and requirements, they can benefit from the synergies of coordinated planning, acquisition strategies, and R&D support. An integrated long-range plan—which does not preclude individual agency activities and priorities—is essential to leverage shared efforts. Progress requires the identification of key technologies and their interdependences, roadblocks, and opportunities for coordinated investments. The government agencies responsible for supercomputing should underwrite a community effort to develop and maintain this roadmap. It should be assembled with wide participation from researchers, developers of both commodity and custom technologies, and users. It should be driven both top-down from application needs and bottom-up from technology barriers. It should include measurable milestones to guide the agencies and Congress in making R&D investment decisions.

If the federal government is to ensure domestic leadership in essential supercomputing technologies, a U.S. industrial base of multiple domestic suppliers that can build custom systems must be assured. Not all of these suppliers must be vertically integrated companies such as Cray that design everything from chips to compilers. The viability of these vendors depends on stable long-term government investments at adequate levels; both the absolute investment level and its predictability matter because there is no alternative support. Such stable support can be provided either via government funding of R&D expenses or via steady procurements or both. The model proposed by the British UKHEV initiative, whereby government solicits and funds proposals for the procurement of three successive generations of a supercomputer family over 4 to 6 years, is a good example of a model that reduces instability.

The creation and long-term maintenance of the software that is key to supercomputing require the support of the federal agencies that are responsible for supercomputing R&D. That software includes operating systems, libraries, compilers, software development and data analysis tools, application codes, and databases. Larger and more coordinated investments could significantly improve the productivity of supercomputing platforms. The models for software support are likely to be varied— vertically integrated vendors that produce both hardware and software, horizontal vendors that produce software for many different hardware platforms, not-for-profit organizations, or software developed on an open-source model. No matter which model is used, however, stability and continuity are essential. The need for software to evolve and be maintained over decades requires a stable cadre of developers with intimate knowledge of the software.

Because the supercomputing research community is small, international collaborations are important, and barriers to international collaboration on supercomputer research should be minimized. Such collaboration should include access to domestic supercomputing systems for research purposes. Restrictions on supercomputer imports have not benefited the United States, nor are they likely to do so. Export restrictions on supercomputer systems built from widely available components that are not export-controlled do not make sense and might damage international collaboration. Loosening restrictions need not compromise national security as long as appropriate safeguards are in place.

Supercomputing is critical to advancing science. The U. S. government should ensure that researchers with the most demanding computational requirements have access to the most powerful supercomputing systems. NSF supercomputing centers and Department of Energy (DOE) science centers have been central in providing supercomputing support to scientists. However, these centers have undergone a broadening of their mission even though their budgets have remained flat, and they are under pressure to support an increasing number of users. They need stable funding, sufficient to support an adequate supercomputing infrastructure. Finally, science communities that use supercomputers should have a strong say in and a shared responsibility for providing adequate supercomputing infrastructure, with budgets for acquisition and maintenance of this infrastructure clearly separated from the budgets for IT research.

In fiscal year (FY) 2004, the aggregate U.S. investment in high-end computing was $158 million, according to an estimate in the 2004 report published by the High-End Computing Revitalization Task Force. (This task force was established in 2003 under the National Science and Technology Council to provide a roadmap for federal investments in high-end computing. The proposed roadmap has had little impact on federal investments so far.) This research spending included hardware, software, and systems for basic and applied research, advanced development, prototypes, and testing and evaluation. The task force further noted that federal support for high-end computing activities had decreased from 1996 to 2001. The report of our committee estimated that an investment of roughly $140 million annually is needed for supercomputing research alone, excluding the cost of research into applications using supercomputing, the cost of advanced development and testbeds, and the cost of prototyping activities (which would require additional funding). A healthy procurement process for top-performing supercomputers that would satisfy the computing needs of the major agencies using supercomputing was estimated at about $800 million per year. Additional investments would be needed for capacity supercomputers in a lower performance tier.

The High-End Computing Revitalization Act passed by Congress in November 2004 is a step in the right direction: It called for DOE to establish a supercomputing software research center and authorized $165 million for research. However, no money has been appropriated for the recommended supercomputing research. The High-Performance Computing Revitalization Act of 2005 was introduced to the House Committee on Science in January 2005. This bill amends the High-Performance Act of 1991 and directs the president to implement a supercomputing R&D program. It further requires the Office of Science and Technology Policy to identify the goals and priorities for a federal supercomputing R&D program and to develop a roadmap for high-performance computing systems. However, the FY 2006 budget proposed by the Bush administration does not provide the necessary investment. Indeed, the budget calls for DOE’s Office of Science and its Advanced Simulation and Computing program to be reduced by 10 percent from the FY 2005 level.

Immediate action is needed to preserve the U.S. lead in supercomputing. The agencies and scientists that need supercomputers should act together and push not only for an adequate supercomputing infrastructure now but also for adequate plans and investments that will ensure that they have the tools they need in 5 or 10 years.

Susan L. Graham () and Marc Snir () cochaired the National Research Council (NRC) committee that produced the report Getting Up to Speed: The Future of Supercomputing. Graham is Pehong Chen Distinguished Professor of Electrical Engineering and Computer Science at the University of California, Berkeley, and former chief computer scientist of the multi-institutional National Partnership for Advanced Computational Infrastructure, which ended in October 2004. Snir is the Michael Faiman and Saburo Muroga Professor and head of the Department of Computer Science at the University of Illinois, Urbana-Champaign. Cynthia A. Patterson () was the NRC committee’s study director.