The Sunway TaihuLight Supercomputer:
the Design of the Processor and the System
Haohuan Fu (The National Supercomputing Center in Wuxi)
The Sunway TaihuLight supercomputer is the world’s first system with a peak performance greater than 100 PFlops, and a parallel scale of over 10 million cores. In contrast with other existing heterogeneous supercomputers, which include both CPU processors and PCIe-connected many-core accelerators (NVIDIA GPU or Intel MIC), the computing power of TaihuLight is provided by a homegrown many-core SW26010 CPU that includes both the management processing elements (MPEs) and computing processing elements (CPEs) in one chip. With 260 processing elements in one CPU, a single SW26010 provides a peak performance of over three TFlops. To alleviate the memory bandwidth bottleneck in most applications, each CPE comes with a scratch pad memory, which serves as a user-controlled cache. To support the parallelization of programs on the new many-core architecture, in addition to the basic C/C++ and Fortran compilers, the system provides a customized Sunway OpenACC tool that supports the OpenACC 2.0 syntax and supports the management of parallel tasks. This talk introduces and discusses the design philosophy behind both the many-core processor and the 10-million-core system. The application performance on both the processor and the system would also be discussed.
Haohuan Fu is the deputy director of the National Supercomputing Center in Wuxi, and leads the research and development efforts on Sunway TaihuLight, the current fastest supercomputer in the world. He is also an associate professor in the Ministry of Education Key Laboratory for Earth System Modeling, and Department of Earth System Science in Tsinghua University, where he leads the research group of High Performance Geo-Computing (HPGC). His research interests include design methodologies for highly efficient and highly scalable simulation applications that can take advantage of emerging multi-core, many-core, and reconfigurable architectures, and make full utilization of current Peta-Flops and future Exa-Flops supercomputers; and intelligent data Management, analysis, and data Mining platforms that combine the statistic methods and machine learning technologies. Fu has a PhD in computing from Imperial College London. He’s a member of IEEE. Since joining Tsinghua in 2011, Dr. Fu has been working towards the goal of providing both the most efficient simulation platforms and the most intelligent data management and analysis platforms for geoscience applications. His research has, for example, led to efficient designs of atmospheric dynamic solvers for both Tianhe-1A, Tianhe-2, Sunway TaihuLight supercomputers, and the reconfigurable computing platforms. The work based on reconfigurable technology can be both faster and more energy efficient than the Tianhe-1A and Tianhe-2 supercomputer, leading to a publication selected as one of the 27 most Significant Papers of the FPL conference in 25 years (27 out of 1765). The work based on the Sunway TaihuLight supercomputer manages to scale a fully-implicit solver to over 10 million cores, which won the Gordon Bell Prize of SC16.
New Era of Electrification and Vehicle Intelligence
Haruyoshi Kumura (Nissan Motor Co.,Ltd.)
Abstract: As the global demand for personal mobility grows continuously, the automotive industry needs to accelerate the development of solutions for the social problems such as environment, energy, resources, traffic accidents and urban congestions. The vehicle electrification and intelligence are key technologies to resolve these social problems. Autonomous drive system technology installed in the latest vehicle will be shared. The examples of recent development for future autonomous drive system will be also shared. Technical issues and challenges for semiconductor will be discussed.
Haruyoshi Kumura was appointed Fellow in charge of Technology Intelligence of Nissan Motor Co., Ltd. in April 2009.
He graduated from Tokyo Institute of Technology, Master’s course of Mechanical Engineering in March of 1981 and joined Nissan Motor. After assuming several management positions in Nissan’s Powertrain and Environment Research Laboratory, he became General Manager in the laboratory in 2003. His appointment to Vice President in charge of Nissan Research Center in 2005 was quickly followed by his promotion to Corporate Vice President in 2006.
He holds a doctor’s degree in mechanical engineering from Yokohama National University.
ARM: scaling new heights
David Brash and Nigel Stephens (ARM)
Abstract: In 2011 ARM announced ARMv8-A which included the new 64-bit AArch64 execution state, a step change for the architecture. ARMv8-A is now well established across a wide range of devices spanning consumer and enterprise markets. Fujitsu’s 2016 announcement that Japan’s Post-K supercomputer would be based on ARMv8-A, followed by the disclosure of ARM’s next-generation Scalable Vector Extension (SVE) at HotChips, illustrates how the ARM processor architecture now scales from the smallest IoT devices to the largest HPC systems. This talk will focus on the evolution of the high-end ARMv8-A architecture, its recent enhancements for enterprise solutions, and its building momentum in high-performance scientific computing — all as part of an expanding collaboration between hardware, software and architecture partners. Key aspects of SVE will be outlined as well as the work to promote a thriving development ecosystem for tools, middleware and applications.
David Brash joined ARM in 1998 and took on responsibility for architecture program management in 2000. His responsibilities include chairing the Architecture Review Board, coordination of the development and delivery of the ARM Architecture specifications, along with enabling the supporting collateral needed for compliance and software development activities. The role also includes key partner engagement, evangelizing and encouraging early adoption of the ARM architecture and a vibrant tools, hardware and software development ecosystem. David started his career with Racal developing government and military communications equipment. He then joined Digital Equipment Company (DEC) to design backbone router and infrastructure products. David was promoted to Consultant Engineer and then started a European systems engineering group to provide boards, tools and firmware support for Digital’s semiconductor division. The group was involved with Digital’s Alpha processor, PCI, and StrongARM products. David graduated in 1978 from the University of Strathclyde with a BSc in Electrical & Electronic Engineering, and holds 2 patents from his time with Digital.
Nigel Stephens joined ARM in 2008 to contribute to the specification of ARMv8-A, with a particular focus on its new AArch64 Instruction Set Architecture (ISA). He went on to become lead architect with responsibility for ARMv8-A instruction sets, and was appointed an ARM Fellow in 2015. Most recently Nigel led the development of the Scalable Vector Extension (SVE), which was announced at HotChips 2016. Prior to joining ARM Nigel has been a systems programmer, compiler developer and computer architect. He has specialized in RISC processors ever since he worked on the development of the first ever MIPS-based UNIX graphics workstation at Whitechapel Computer Works Ltd in 1986. Following this he was co-founder of the RISC hardware/software consultancy Algorithmics Ltd, which was later acquired by MIPS Technologies in 2002. Nigel joined the MIPS Architecture team, eventually becoming Technical Director of MIPS UK. Nigel graduated in 1981 from University College London with a BSc in Computer Science. He currently holds 5 patents.
POWER9 Design Innovations
Jeffrey L. Burns (IBM Research, USA)
Abstract: Cognitive computing has become a major trend, driving the rapid development of new applications and workloads. The upcoming IBM POWER9 implements several new features targeting these emerging cognitive workloads. Adding new and improved cognitive capabilities requires higher levels of performance and energy-efficiency in the IT infrastructure. The maturation of semiconductor scaling has made delivering these higher levels more challenging; reliance on scaling alone is insufficient. Because of these trends, innovation in design and architecture is crucial. Innovations are required to improve power-performance of processors, e.g., by increasing the ability to dynamically adjust operating points without compromising the reliability of the computations. Accelerators will increasingly be needed to improve the power/performance of systems. To enable practical accelerator integration, system architectures must be designed up-front to incorporate heterogeneous components such as today’s GPUs, as well as future accelerators for cognitive applications. In this presentation I will describe these trends, the POWER9 innovations that address them, and areas of research towards future systems.
Jeffrey L. Burns received his B.S. in Engineering from UCLA, and his M.S. and Ph.D. in Electrical Engineering from U.C. Berkeley. In 1988 he joined the IBM T.J. Watson Research Center and worked in layout automation and processor design. In 1996 he joined the IBM Austin Research Lab where he worked on the first 1 GHz PowerPC; he then managed the Exploratory VLSI Design group. In 2003 he returned to Watson to work on IBM Research’s annual study into the future of IT. He then managed a program exploring a streaming-oriented supercomputer. From mid-05 until mid-09 he managed the VLSI Design department, focusing on high-end processors, SoC designs, and 3D.
From mid-2009 to September 2015 he was Director of VLSI Systems at Watson, and since September 2015 he has been Director, Systems Architecture and Design, managing the Division’s activities in VLSI design, design automation, microprocessor and systems architecture, and accelerator design.
Topics: “Cool chips for the next decade”
Organizer and Moderator:
Hideharu Amano (Keio Univ.)
Abstract: The advance of CMOS process is still going, but the end is coming into sight. Semiconductor chips with advanced process later than 21nm are so expensive that they are developed only for million selling products. On the other hand, the advanced AI, IoT and big data technologies require more and more computation/communication power with a tightly limited power budget. How we can develop a “Cool-chips” in the next decade? And how can the conference “Coolchips” contribute?
Hironori Kasahara. “Cool Chips, Low Power Multicores, Open the Way to the Future”.
Yoshiaki Hagiwara. “Consumer Electronics from HOT to COOLchips”.
And more (TBA).
Hideharu Amano is a Professor of Dept. of infomation and computer scinece, Keio University. He has been developed various types of LSI chips for parallel machines, low power acceleters and reconfigurable systems. Now he is developing a building block computing system using a wireless inductive through chip interface.
Tadao Nakamura is currently a Professor Emeritus of Tohoku University, and also a Professor (as a visiting status) of Keio University. In 1994 he was given the status of a Full Professor (as a visiting status) in the Electrical Engineering Department at Stanford University. And even today he still stays at Stanford at any time. In 2007 he was also induced as a Professorial Fellow at Imperial College London. His research interests are toward computer systems. In 2004 he received the IEEE Computer Society’s Taylor L. Booth Award. He has been Advisory Committee Chair, after the Organizing Committee Chair, of COOL Chips conference series fully sponsored by the IEEE Computer Society. He is Life Fellow of the IEEE.
Hironori Kasahara is IEEE Computer Society (CS) 2018 President and 2017 President Elect and has served as a chair or member of 245 society and government committees, including the CS Board of Governors; Executive Committee; Planning Committee; chair of CS Multicore STC and CS Japan chapter; associate editor of IEEE Transactions on Computers; vice PC chair of the 1996 ENIAC 50th Anniversary International Conference on Supercomputing; general chair of LCPC; PC member of SC, PACT, and ASPLOS; board member of IEEE Tokyo section; and member of the Earth Simulator committee. He received a PhD in 1985 from Waseda University, Tokyo, joined its faculty in 1986, and has been a professor of computer science since 1997 and a director of the Advanced Multicore Research Institute since 2004. He was a visiting scholar at University of California, Berkeley, and the University of Illinois at Urbana–Champaign’s Center for Supercomputing R&D. Kasahara received the CS Golden Core Member Award, IEEE Fellow, IFAC World Congress Young Author Prize, IPSJ Fellow and Sakai Special Research Award, and the Japanese Minister’s Science and Technology Prize. He led Japanese national projects on parallelizing compilers and embedded multicores, and has presented 214 papers, 139 invited talks, and 28 patents. His research on multicore architectures and software has appeared in 560 newspaper and Web articles.
Yoshiaki Hagiwara received BS, MS and PhD in EE in major and Physics in minor from California Institute of Technology, Pasadena California in 1971,1972 and 1975 respectively. He joined Sony Corporation Tokyo Japan and involved in early developments of CCD imagers and its video camera systems, and served as an engineering manager of various signal processing and control chips for consumer electronics. After retiring from Sony in 2008, he joined Sojo University in Kumamoto-city, Japan as a professor in the Faculty of Communication and Information Sciences, till retirement in March 2017. He is an IEEE fellow, currently serving as the president of the Artificial Intelligent Partner Systems (AIPS) consortium in Atsugi-city, Kanagawa, Japan (see http://www.aiplab.com/). He is an IEEE fellow , having served as the ISSCC2008 international program chair and also as the COOLChips2013 vice general chair.
|Special Sessions (invited lectures)
TrueNorth: A Neurosynaptic Integrated Circuit
with 1 Million Spiking Digital Neurons
Yutaka Nakamura (IBM Japan)
Abstract: Inspired by the brain’s structure, we have developed an efficient, scalable, and flexible non–von Neumann architecture that leverages contemporary silicon technology. To demonstrate, we built TrueNorth, a 5.4-billion-transistor chip with 4096 neurosynaptic cores interconnected via an intra-chip network that integrates 1 million programmable spiking neurons and 256 million configurable synapses. Chips can be tiled in two dimensions via an inter-chip communication interface, seamlessly scaling the architecture to a cortex-like sheet of arbitrary size. The architecture is well suited to many applications that use complex neural networks in real time, for example, multi object detection and classification. With 400-pixel-by-240-pixel video input at 30 frames per second, the chip consumes 63 mW.
Yutaka Nakamura is currently a Research Staff Member in IBM Research – Tokyo and his research interests include memory circuit design, redundancy repair, ECC, and neuromorphic chip/system. He received the B.S. degree from the Department of Mechanical Engineering, Waseda University, Tokyo, Japan, in 1985.
Digital Microfluidic Biochips
Tsung-Yi Ho (National Tsing Hua University, Taiwan)
Abstract: A digital microfluidic biochip (DMFB) is an attractive technology platform for automating laboratory procedures in biochemistry. However, today’s DMFBs suffer from several limitations: (i) constraints on droplet size and the inability to vary droplet volume in a fine-grained manner; (ii) the lack of integrated sensors for real-time detection; (iii) the need for special fabrication processes and reliability/yield concerns. To overcome the above problems, DMFBs based on a micro-electrode-dot-array (MEDA) architecture, and fabricated using a TSMC 350 nm process, have recently been demonstrated.
This presentation will first describe a biochemistry synthesis approach for such MEDA biochips. This synthesis method targets operation scheduling, module placement, routing of droplets of various sizes, and diagonal movement of droplets in a two-dimensional array. Simulation results using benchmarks and experimental results using a fabricated MEDA biochip will be presented to demonstrate the effectiveness of the proposed co-optimization technique. Finally, the presentation will describe an efficient built-in self-test (BIST) solution for MEDA biochips. Simulation results based on HSPICE and experiments using fabricated MEDA biochips will highlight the effectiveness of the proposed BIST architecture.
Tsung-Yi Ho received his Ph.D. in Electrical Engineering from National Taiwan University in 2005. He is a Professor with the Department of Computer Science of National Tsing Hua University, Hsinchu, Taiwan. His research interests include design automation and test for microfluidic biochips and nanometer integrated circuits. He has presented 10 tutorials and contributed 11 special sessions in ACM/IEEE conferences, all in design automation for microfluidic biochips. He has been the recipient of the Invitational Fellowship of the Japan Society for the Promotion of Science (JSPS), the Humboldt Research Fellowship by the Alexander von Humboldt Foundation, and the Hans Fischer Fellow by the Institute of Advanced Study of the Technical University of Munich. He was a recipient of the Best Paper Awards at the VLSI Test Symposium (VTS) in 2013 and IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems in 2015. He served as a Distinguished Visitor of the IEEE Computer Society for 2013-2015, the Chair of the IEEE Computer Society Tainan Chapter for 2013-2015, and the Chair of the ACM SIGDA Taiwan Chapter for 2014-2015. Currently he serves as an ACM Distinguished Speaker, a Distinguished Lecturer of the IEEE Circuits and Systems Society, and Associate Editor of the ACM Journal on Emerging Technologies in Computing Systems, ACM Transactions on Design Automation of Electronic Systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, and IEEE Transactions on Very Large Scale Integration Systems, Guest Editor of IEEE Design & Test of Computers, and the Technical Program Committees of major conferences, including DAC, ICCAD, DATE, ASP-DAC, ISPD, ICCD, etc.