The Sunway TaihuLight Supercomputer:
the Design of the Processor and the System
Haohuan Fu (The National Supercomputing Center in Wuxi)
The Sunway TaihuLight supercomputer is the world’s first system with a peak performance greater than 100 PFlops, and a parallel scale of over 10 million cores. In contrast with other existing heterogeneous supercomputers, which include both CPU processors and PCIe-connected many-core accelerators (NVIDIA GPU or Intel MIC), the computing power of TaihuLight is provided by a homegrown many-core SW26010 CPU that includes both the management processing elements (MPEs) and computing processing elements (CPEs) in one chip. With 260 processing elements in one CPU, a single SW26010 provides a peak performance of over three TFlops. To alleviate the memory bandwidth bottleneck in most applications, each CPE comes with a scratch pad memory, which serves as a user-controlled cache. To support the parallelization of programs on the new many-core architecture, in addition to the basic C/C++ and Fortran compilers, the system provides a customized Sunway OpenACC tool that supports the OpenACC 2.0 syntax and supports the management of parallel tasks. This talk introduces and discusses the design philosophy behind both the many-core processor and the 10-million-core system. The application performance on both the processor and the system would also be discussed.
Haohuan Fu is the deputy director of the National Supercomputing Center in Wuxi, and leads the research and development efforts on Sunway TaihuLight, the current fastest supercomputer in the world. He is also an associate professor in the Ministry of Education Key Laboratory for Earth System Modeling, and Department of Earth System Science in Tsinghua University, where he leads the research group of High Performance Geo-Computing (HPGC). His research interests include design methodologies for highly efficient and highly scalable simulation applications that can take advantage of emerging multi-core, many-core, and reconfigurable architectures, and make full utilization of current Peta-Flops and future Exa-Flops supercomputers; and intelligent data Management, analysis, and data Mining platforms that combine the statistic methods and machine learning technologies. Fu has a PhD in computing from Imperial College London. He’s a member of IEEE. Since joining Tsinghua in 2011, Dr. Fu has been working towards the goal of providing both the most efficient simulation platforms and the most intelligent data management and analysis platforms for geoscience applications. His research has, for example, led to efficient designs of atmospheric dynamic solvers for both Tianhe-1A, Tianhe-2, Sunway TaihuLight supercomputers, and the reconfigurable computing platforms. The work based on reconfigurable technology can be both faster and more energy efficient than the Tianhe-1A and Tianhe-2 supercomputer, leading to a publication selected as one of the 27 most Significant Papers of the FPL conference in 25 years (27 out of 1765). The work based on the Sunway TaihuLight supercomputer manages to scale a fully-implicit solver to over 10 million cores, which won the Gordon Bell Prize of SC16.
ARM: scaling new heights
David Brash and Nigel Stephens (ARM)
Abstract: In 2011 ARM announced ARMv8-A which included the new 64-bit AArch64 execution state, a step change for the architecture. ARMv8-A is now well established across a wide range of devices spanning consumer and enterprise markets. Fujitsu’s 2016 announcement that Japan’s Post-K supercomputer would be based on ARMv8-A, followed by the disclosure of ARM’s next-generation Scalable Vector Extension (SVE) at HotChips, illustrates how the ARM processor architecture now scales from the smallest IoT devices to the largest HPC systems. This talk will focus on the evolution of the high-end ARMv8-A architecture, its recent enhancements for enterprise solutions, and its building momentum in high-performance scientific computing — all as part of an expanding collaboration between hardware, software and architecture partners. Key aspects of SVE will be outlined as well as the work to promote a thriving development ecosystem for tools, middleware and applications.
David Brash joined ARM in 1998 and took on responsibility for architecture program management in 2000. His responsibilities include chairing the Architecture Review Board, coordination of the development and delivery of the ARM Architecture specifications, along with enabling the supporting collateral needed for compliance and software development activities. The role also includes key partner engagement, evangelizing and encouraging early adoption of the ARM architecture and a vibrant tools, hardware and software development ecosystem. David started his career with Racal developing government and military communications equipment. He then joined Digital Equipment Company (DEC) to design backbone router and infrastructure products. David was promoted to Consultant Engineer and then started a European systems engineering group to provide boards, tools and firmware support for Digital’s semiconductor division. The group was involved with Digital’s Alpha processor, PCI, and StrongARM products. David graduated in 1978 from the University of Strathclyde with a BSc in Electrical & Electronic Engineering, and holds 2 patents from his time with Digital.
Nigel Stephens joined ARM in 2008 to contribute to the specification of ARMv8-A, with a particular focus on its new AArch64 Instruction Set Architecture (ISA). He went on to become lead architect with responsibility for ARMv8-A instruction sets, and was appointed an ARM Fellow in 2015. Most recently Nigel led the development of the Scalable Vector Extension (SVE), which was announced at HotChips 2016. Prior to joining ARM Nigel has been a systems programmer, compiler developer and computer architect. He has specialized in RISC processors ever since he worked on the development of the first ever MIPS-based UNIX graphics workstation at Whitechapel Computer Works Ltd in 1986. Following this he was co-founder of the RISC hardware/software consultancy Algorithmics Ltd, which was later acquired by MIPS Technologies in 2002. Nigel joined the MIPS Architecture team, eventually becoming Technical Director of MIPS UK. Nigel graduated in 1981 from University College London with a BSc in Computer Science. He currently holds 5 patents.
New Era of Electrification and Vehicle Intelligence
Haruyoshi Kumura (Nissan Motor Co.,Ltd.)
Abstract: As the global demand for personal mobility grows continuously, the automotive industry needs to accelerate the development of solutions for the social problems such as environment, energy, resources, traffic accidents and urban congestions. The vehicle electrification and intelligence are key technologies to resolve these social problems. Autonomous drive system technology installed in the latest vehicle will be shared. The examples of recent development for future autonomous drive system will be also shared. Technical issues and challenges for semiconductor will be discussed.
Haruyoshi Kumura was appointed Fellow in charge of Technology Intelligence of Nissan Motor Co., Ltd. in April 2009.
He graduated from Tokyo Institute of Technology, Master’s course of Mechanical Engineering in March of 1981 and joined Nissan Motor. After assuming several management positions in Nissan’s Powertrain and Environment Research Laboratory, he became General Manager in the laboratory in 2003. His appointment to Vice President in charge of Nissan Research Center in 2005 was quickly followed by his promotion to Corporate Vice President in 2006.
He holds a doctor’s degree in mechanical engineering from Yokohama National University.
POWER9 Design Innovations (Tentative)
Jeffrey L. Burns (IBM Research, USA)
Topics: “Cool chips for the next decade”
Organizer and Moderator:
Hideharu Amano (Keio Univ.)
Abstract: The advance of CMOS process is still going, but the end is coming into sight. Semiconductor chips with advanced process later than 21nm are so expensive that they are developed only for million selling products. On the other hand, the advanced AI, IoT and big data technologies require more and more computation/communication power with a tightly limited power budget. How we can develop a “Cool-chips” in the next decade? And how can the conference “Coolchips” contribute?
|Special Sessions (invited lectures)|
TrueNorth: A Neurosynaptic Integrated Circuit
with 1 Million Spiking Digital Neurons
Yutaka Nakamura (IBM Japan)
Abstract: Inspired by the brain’s structure, we have developed an efficient, scalable, and flexible non–von Neumann architecture that leverages contemporary silicon technology. To demonstrate, we built TrueNorth, a 5.4-billion-transistor chip with 4096 neurosynaptic cores interconnected via an intra-chip network that integrates 1 million programmable spiking neurons and 256 million configurable synapses. Chips can be tiled in two dimensions via an inter-chip communication interface, seamlessly scaling the architecture to a cortex-like sheet of arbitrary size. The architecture is well suited to many applications that use complex neural networks in real time, for example, multi object detection and classification. With 400-pixel-by-240-pixel video input at 30 frames per second, the chip consumes 63 mW.
Yutaka Nakamura is currently a Research Staff Member in IBM Research – Tokyo and his research interests include memory circuit design, redundancy repair, ECC, and neuromorphic chip/system. He received the B.S. degree from the Department of Mechanical Engineering, Waseda University, Tokyo, Japan, in 1985.
Digital Microfluidic Biochips
Tsung-Yi Ho (National Tsing Hua University, Taiwan)
Abstract: A digital microfluidic biochip (DMFB) is an attractive technology platform for automating laboratory procedures in biochemistry. However, today’s DMFBs suffer from several limitations: (i) constraints on droplet size and the inability to vary droplet volume in a fine-grained manner; (ii) the lack of integrated sensors for real-time detection; (iii) the need for special fabrication processes and reliability/yield concerns. To overcome the above problems, DMFBs based on a micro-electrode-dot-array (MEDA) architecture, and fabricated using a TSMC 350 nm process, have recently been demonstrated.
This presentation will first describe a biochemistry synthesis approach for such MEDA biochips. This synthesis method targets operation scheduling, module placement, routing of droplets of various sizes, and diagonal movement of droplets in a two-dimensional array. Simulation results using benchmarks and experimental results using a fabricated MEDA biochip will be presented to demonstrate the effectiveness of the proposed co-optimization technique. Finally, the presentation will describe an efficient built-in self-test (BIST) solution for MEDA biochips. Simulation results based on HSPICE and experiments using fabricated MEDA biochips will highlight the effectiveness of the proposed BIST architecture.
Tsung-Yi Ho received his Ph.D. in Electrical Engineering from National Taiwan University in 2005. He is a Professor with the Department of Computer Science of National Tsing Hua University, Hsinchu, Taiwan. His research interests include design automation and test for microfluidic biochips and nanometer integrated circuits. He has presented 10 tutorials and contributed 11 special sessions in ACM/IEEE conferences, all in design automation for microfluidic biochips. He has been the recipient of the Invitational Fellowship of the Japan Society for the Promotion of Science (JSPS), the Humboldt Research Fellowship by the Alexander von Humboldt Foundation, and the Hans Fischer Fellow by the Institute of Advanced Study of the Technical University of Munich. He was a recipient of the Best Paper Awards at the VLSI Test Symposium (VTS) in 2013 and IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems in 2015. He served as a Distinguished Visitor of the IEEE Computer Society for 2013-2015, the Chair of the IEEE Computer Society Tainan Chapter for 2013-2015, and the Chair of the ACM SIGDA Taiwan Chapter for 2014-2015. Currently he serves as an ACM Distinguished Speaker, a Distinguished Lecturer of the IEEE Circuits and Systems Society, and Associate Editor of the ACM Journal on Emerging Technologies in Computing Systems, ACM Transactions on Design Automation of Electronic Systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, and IEEE Transactions on Very Large Scale Integration Systems, Guest Editor of IEEE Design & Test of Computers, and the Technical Program Committees of major conferences, including DAC, ICCAD, DATE, ASP-DAC, ISPD, ICCD, etc.