[pdf version is here](As of 2019-Feb-5)
“GPU: A true AI Cool-Chip with High Performance/Power Efficiency and Full-Programmability”
Toru Baji (NVIDIA)
Abstract: In the early days of processor LSIs, CPU performance increased almost 1.5 times / year thanks to the Moor’s Law. However, around year 2010, due to the leakage current, too complex CPU architecture and Amdahl’s Law limitation this rate becomes 1.1 times / year. Today where Moor’s Law is at its end, and the performance increase is reported to be almost 3% / year. On the other hand, parallel processing dedicated GPU continues to grow its performance with the rate of 1.5 times / year, and even with the Moor’s Law ending, it still continue to grow its performance by architecture evolution and various built-in accelerators. Pascal GPU came with Tensor Core which accelerates the AI matrix multiplication 12 times. And the latest Turing GPU further added an RT Core accelerator to increase the most complex 3D computer graphics (ray tracing) more than 10 times. GPU delivers around one to two order of magnitude performance advantage over the CPU and now is the most widely used processor accelerator in AI and Supercomputing. On the other hand, due to its local data processing architecture and careful circuit/layout design, the performance/power efficiency is almost one order of magnitude better than the CPU. The AI application demands for extraordinary high performance and full programmability to meet the rapidly evolving algorithms. Now it might be said that only the GPU can meet these contradictory requirements. In this talk, GPU basic technology which realize this high performance and high power efficiency will be introduced. Also the applications to Supercomputing, AI and ML and advanced autonomous driving will be reported.
Toru Baji graduated from Osaka University Graduate School and joined Hitachi’s Central Research Lab in 1977. There he conducted research in solid-state image sensors and processor architecture. From 1984, he was engaged in analog-digital circuit and processor architecture research at University of California, Berkeley and Hitachi America R&D respectively. After transferred to Hitachi’s Semiconductor Division in 1993, the Division has been moved to Renesas and he served as a General Manager of Automotive Application Technology Department. He joined NVIDIA in 2008 as a Senior Solution Architect for automotive business, supporting worldwide customers for automotive processors applications. Since 2016, he serves as an NVIDIA technology advisor and GPU Evangelist.
“Quantum Computing at IBM – from hardware to software”
Patryk Gumann (IBM)
Abstract: In quantum computing, information is processed in a fundamentally different way than classical computing: by taking advantage of quantum phenomena such as entanglement and interference, a number of quantum algorithms have been theoretically proven to offer a speedup over classical algorithms. However, the difficulty lies in controlling and operating the quantum bits, or qubits, that make up a quantum computer. All physical qubits are notoriously fragile and sensitive to any fluctuations in their environment, which makes the promise of quantum computing difficult to realize. Fortunately, due to tremendous material research and development, deeper understanding of underlaying decoherence mechanisms as well as smart microwave and cryogenic engineering we are getting closer to building a first NISQ type of quantum computers. At IBM we focus on parallel efforts to improve the qubits and controls to ultimately achieve fault tolerance while also searching for near-term applications that do not require error correction. Recent results in quantum chemistry and on error mitigation techniques show that quantum computing may be able to tackle some of its most anticipated applications before full universal fault-tolerant quantum computers are realized.
Patryk Gumann received his PhD in experimental physics from Darmstadt University of Technology in Germany, in 2007. He has worked at various research facilities over his career, including Leiden University, The Netherlands, the Institute for Solid State Physics at the University of Tokyo, the Low Temperature Laboratory at Kyoto University, Rutgers University, The Institute for Quantum Computing at the University of Waterloo in Canada, and the Department of Physics at Harvard University. His research focus has ranged from quantum fluids and solids, quantum sensing, including nitrogen vacancy defects in diamonds, and phosphorus defects in silicon, to experimental superconducting quantum comping – which he has been pursuing at IBM since 2016, where he is a manager for quantum processor & system integration group.
“Architectures for efficient, low-power AI Edge processing”
Sanjay Patel (Wave Computing)
Abstract: There is a need in AI to move intelligence to the edge to circumvent challenges that include latency, security, and bandwidth for AI use cases at the edge. This trend will accelerate with the continued proliferation of IoT. Utilization of cloud-base, AI-as-a-service to address edge use cases can become exceedingly expensive. AI processing on a remote server system can be unreliable where connectivity is poor. Processing AI at the edge comes with its own challenges, achieving desired performance at low-power with constrained memory and processing resources. This talk discusses the ramifications of moving AI processing to the edge for not only inference but also training, focusing on the potential of CPU-centric edge-AI Architectures with acceleration assists, and the distinction of data formats for inference and training.
Sanjay Patel is a Director of IP Architecture at Wave Computing. He has worked at Motorola, Sun Microsystem, Afara Websystems, Oracle, MIPS Technologies in the past. He has focused on processor and SOC design in the context of multi-threaded and multi-core system architectures. His interests now include the development of low-power compute solutions for neural networks for edge AI processing platforms. He is the owner of approximately 20 patents in the area of compute architectures.
“Vector Engine Processor of NEC’s Brand-New Supercomputer SX-Aurora TSUBASA”
Yoshihiro Konno (NEC)
Abstract: NEC has released the latest vector supercomputer, SX-Aurora TSUBASA in 2018. It features superior sustained performance, especially for memory-intensive scientific applications and it inherits DNA over 30 years by the SX Series from the SX-1/2 to the SX-ACE with their specialized vector processors. The system architecture of SX-Aurora TSUBASA is drastically changed from its predecessors of the SX series. The SX-Aurora TSUBASA mainly consists of a vector host (VH) and one or more vector engines (VEs). VH is a standard x86/Linux server, which provides Linux OS functions, and VE OS developed by NEC runs on the Linux to control VEs. VE is implemented as a PCI Express (PCIe) card equipped with the newly developed vector processor, and is connected to VH. By this architecture change the SX-Aurora TSUBASA became able to be applied from the dedicated large scale supercomputer to desktop server. In the presentation at Cool Chips, we will elaborate the design of the VE processor, including its vector architecture such as configurations of vector pipelines and registers, execution sequences of vector operations, configuration of the memory network, and power control and fault tolerance mechanisms. We will also show overall configurations of the SX-Aurora TSUBASA system, VE card implementation including cooling, sustained performance in wider area of benchmark programs and some use cases.
Yoshihiro Konno joined NEC Corporation in 1992, and first he engaged in CAD development and design methodology for supercomputers and mainframes. After that he performed their LSI development and now he is a senior manager at AI platform Division.
“DLU and Domain Specific Computing”
Takumi Maruyama (Fujitsu)
Abstract: Fujitsu recently started development of Domain specific processor such as DLU (Deep Learning Unit) as well as DA (Digital Annealer), in addition to conventional server processors which Fujitsu has developed for many years such as GS processors for mainframe, SPARC processors for UNIX server, and SPARC/ARM processors for HPC. In this talk, what is common and what is unique in domain specific processors compared with conventional general processors will be explained through DLU and DA as examples, to discuss about the future of computer architecture.
Takumi Maruyama is a senior director of AI Platform Business Unit at Fujitsu, and an architect of DLU, which is the first AI processor Fujitsu ever designed. He started working on the first SPARC64 processor design in 1993, and has been involved in the development of various SPARC64 processors, including SPARC64 VIIIfx which was a processor of K supercomputer. He holds a BE in Mathematical Engineering and Instrumentation Physics from the University of Tokyo.
“A64FX High Performance Processor Architecture and its Design Challenges”
Shuji Yamamura (Fujitsu)
Abstract: In the last year Fujitsu introduced A64FX at Hot Chips 30. A64FX is the latest Fujitsu’s HPC processor which is designed for Post-K supercomputer. Fujitsu is developing Post-K supercomputer as a successor to K computer with RIKEN. A64FX is the latest Fujitsu’s HPC processor based on our own microarchitecture, as used in our SPARC64 and mainframe processor development. In this talk, A64FX microarchitecture will be explained and also cache memory system and on-chip network which can provide coherent and high throughput memory to many cores on the chip. We will also introduce our steady development effort in combination of the front-end and the back-end implementations.
Shuji Yamamura is a senior professional engineer of AI Platform Business Unit at Fujitsu. He joined Fujitsu Laboratories in 2001. From 2007, he was engaged in the development of SPARC64 processors at Fujitsu, including SPARC64 VIIIfx which was a processor of K supercomputer. His technical interests are in HPC and AI processor architectures. He has a PhD in electronics and information science from the Kyoto Institute of Technology.
Topics: “Where will the computer architecture go?”
Deming Chen (Univ. of Illinois at Urbana-Champaign)
Takumi Maruyama (Fujitsu)
Sanjay Patel (Wave Computing)
Abstract: Discussions on Computer Architecture are on the rise. Domain Specific Architecture, Deep Learning chips, Approximate Computing, Security Aware Architecture, and even Quantum Computers are some of these. Along with this, users’ requests are also changing and various; some expect that ICT or A.I. support and help us to solve social problems and lead an enjoyable life. In addition, demands from the technology side such as low power consumption capability are also critical. Taking these technology and social changes around us into account, at this panel discussion, we would discuss what we architects and engineers should do right now, how we should act to promote R&D in a long term perspective, to meet the needs and expectations of society eventually. We would also like to discuss human resource development, and project management system.
Yasunori Kimura is now Senior Fellow at Fujitsu Laboratories Ltd. and Principal Fellow at Japan Science and Technology Agency (JST).
He joined Fujitsu Limited in 1981. Since then throughout his career, he has been engaged primarily in computer system design and development at the company. To name a few, he had contributed to the Japanese fifth generation computer systems project, and led the ‘KEI’ supercomputer development team there.
He had been President and CEO of Fujitsu Laboratories of America, Inc. in California from 2011 through 2015. He holds the positions of the organizations above since January, 2017. He spent a summer at Stanford University as a visiting scholar in 1995, and served as Visiting Professor at the University of Tokyo for four years from 2002 where he earned a Ph.D. in Computer Science.
Special Sessions (invited lectures)
“Design, Compilation, and Acceleration for Deep Neural Networks in IoT Applications”
Deming Chen (UIUC)
Abstract: Many new IoT (Internet of Things) applications are driven by the fast creation, adaptation, and enhancement of various types of Deep Neural Networks (DNNs). DNNs are computation intensive. Without efficient hardware implementations of DNNs, these promising IoT applications will not be practically realizable. In this talk, we will analyze several challenges facing the AI and IoT community for mapping DNNs to hardware accelerators. Especially, we will evaluate FPGA’s potential role for accelerating DNNs for both the cloud and edge devices. Although FPGAs can provide desirable customized hardware solutions, they are difficult to program and optimize. We will present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include automated hardware/software co-design, the use of configurable DNN IPs, resource allocation across DNN layers, smart pipeline scheduling, Winograd and FFT techniques, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, bidirectional LSTM for machine translation, and Inception module (GoogleNet) for face recognition. We will also present some of our recent work on developing new DNN models and data structures for achieving higher accuracy for several interesting applications such as crowd counting, music synthesis, and smart sound.
Deming Chen obtained his BS in computer science from University of Pittsburgh, Pennsylvania in 1995, and his MS and PhD in computer science from University of California at Los Angeles in 2001 and 2005 respectively. He joined the ECE department of University of Illinois at Urbana-Champaign (UIUC) in 2005 and has been a full professor in the same department since 2015. His current research interests include machine learning and acceleration, GPU and reconfigurable computing, system-level and high-level synthesis, computational genomics, and hardware security. He has given about 100 invited talks sharing these research results worldwide. Dr. Chen is a technical committee member for a series of top conferences and symposia on EDA, FPGA, low-power design, and embedded systems design. He is an associated editor for several leading IEEE and ACM journals. He received the NSF CAREER Award in 2008, the ACM SIGDA Outstanding New Faculty Award in 2010, and IBM Faculty Award in 2014 and 2015. He also received eight Best Paper Awards and the First Place Winner Award of DAC International Hardware Contest on IoT in 2017. He is included in the List of Teachers Ranked as Excellent in 2008 and 2017. He was involved in two startup companies previously, which were both acquired. In 2016, he co-founded a new startup, Inspirit IoT, Inc., for design and synthesis for machine learning applications targeting the IoT industry. He is the Donald Biggar Willett Faculty Scholar, an IEEE Fellow, an ACM Distinguished Speaker, and the Editor-in-Chief of ACM Transactions on Reconfigurable Technology and Systems.
“Low Power Design: Facts, Myths, and Misunderstandings”
Youngsoo Shin (KAIST)
Abstract: Low-power design is a standard practice these days, so quite often people do not pay much attention on the details. This may create misconception and misunderstanding. An example is a simple statement “Our design consumes 100mW”. Power consumption cannot be declared as a single concrete number because of complications in the estimation or measurement process with inherent inaccuracies and uncertainties. Another popular example is “Our design consumes lowest power of 100mW”. After understanding that a chip is typically designed with huge amount of margins, it is easy to see that the limit is far beyond. A few of these details of low power design, which often go unnoticed but deserve careful consideration, are addressed in this talk.
Youngsoo Shin received the B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul National University, Korea. From 2001 to 2004, he was a Research Staff Member with IBM T. J. Watson Research Center, Yorktown Heights, NY, USA. He joined the School of Electrical Engineering, KAIST, Korea, in 2004, where he is currently a Professor. His recent research interests include low-power design, computational lithography, design for manufacturability, and neuromorphic circuit design. Since 2016, Dr. Shin has been a CTO of Baum, which specializes in fast power analysis solution. He is an IEEE Fellow.