|Future Technologies Group
Computer Science and Mathematics Division
Oak Ridge National Laboratory
One Bethel Valley Road
Oak Ridge, TN 37831-6173
Phone: +1 865-356-1649
Recent Colloquium Speakers
2010-11-22 -- Future Tech participates in Gordon Bell Prize at SC10. A team from Georgia Tech, New York University, and Oak Ridge National Laboratory (ORNL) took this year's Gordon Bell Prize by pushing ORNL's Jaguar supercomputer to 700 trillion calculations per second (700 teraflops) with a groundbreaking simulation of blood flow, and demonstrating impressive performance of the same application on graphics processors using the NSF Keeneland Initial Delivery System. See the [Paper] for more information.
2010-06-02 -- Meredith presents new geometric material interface reconstruction algorithm at Eurographics/IEEE Symposium on Visualization. Jeremy Meredith of ORNL’s Future Technologies group will present a new material interface reconstruction algorithm at EuroVis 2010, the Eurographics/IEEE Symposium on Visualization. This type of algorithm is used for generating geometric interfaces from material volume fraction data used in scientific applications, and is critical for a variety of visualization and analysis tasks. The paper, “Visualization and Analysis-Oriented Reconstruction of Material Interfaces”, by Meredith and collaborator Hank Childs from UC Davis and LBNL, evaluates this method against competing techniques. The results show that it strikes a favorable balance with respect to not just memory footprint and execution time, but also in various accuracy and shape characteristics.
2010-05-30 -- FT develops a highly-scalable global address space model for petascale computing on the Cray XT5. Vinod Tipparaju, Weikuan Yu, and Jeffrey Vetter - members of ORNL’s Future Technologies group – have recently designed and developed a highly optimized Aggregate Remote Memory Copy Interface (ARMCI) runtime library on the Cray XT5 2.3 PetaFLOPs computer at Oak Ridge National Laboratory. The results have been published in May at the 2010 ACM International Conference on Computing Frontiers (CF’10) in Bertinoro, Italy as “Enabling a highly-scalable global address space model for petascale computing” and at the 2010 International Supercomputing Conference in Hamburg, Germany as “Cooperative Server Clustering for a Scalable GAS Model on Petascale Cray XT5 Systems” (PDF). In the first paper, the team describes the design and implementation of ARMCI for the Cray XT5, and its optimization with the flow intimation technique, resulting in significant improvements for the Global Arrays (GA) programming model and a real-world chemistry application – NWChem – from small jobs up through 180,000 cores. In the second paper, the team examines the memory requirement of ARMCI on Cray XT5 and introduced a new technique “cooperative server clustering” to enhance the memory scalability of ARMCI communication servers. It leads to significant improvements in the memory requirement of ARMCI, thereby further boosting up the scalability of GAS scientific applications. The team demonstrated that the optimization reduces the memory footprint by five times for a program of 9600 processes, and the the total execution time of a scientific application of NWChem by 45% on 2400 processes (PDF).
2010-02-11 -- ORNL Future Technologies Group improves performance of major DOE applications over 23% using Memphis: a new tool for analyzing m. Collin McCurdy and Jeffrey Vetter - members of ORNL’s Future Technologies group - have recently developed Memphis: a tool that analyzes memory access patterns in scientific applications on Non-Uniform Memory Access (NUMA) architectures. The authors have been using Memphis to find and fix performance problems in several major DOE applications. These improvements have, so far, led to performance increases on the Cray XT5 at Oak Ridge of 23% for runs at scale of XGC1, and of 24% and 13% for single node runs of CAM and HYCOM, respectively. The results will be published in April at the 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) as “Memphis: Finding and Fixing NUMA-related Performance Problems on Multi-core Platforms.” Not surprisingly, high-end scientific applications have been immune to NUMA problems due to the uniform latency of memory accesses offered by earlier Symmetric Multiprocessing (SMP) platforms. However, current trends in micro-processor design, including on-chip memory controllers and multi-core processing, are pushing NUMA issues into small-scale systems. Several platforms, such as the AMD Istanbul and Intel Westmere, are currently NUMA between sockets, and upcoming processors will be NUMA within a socket. Memphis uses hardware performance monitoring to pinpoint memory accesses to data arrays that cause NUMA-related performance problems. The team is continuing to analyze and optimize other applications for NUMA performance problems. The paper is available at the FT PUB site.
--- Read more articles at our news archive. ---