COOLCHIPS XVI

Sister Conferences

CALL FOR PARTICIPATION [pdf version is here].

Please check time table and detail programs on Advance Program page.

Keynote Presentations

CoolChips at the core of a healthier world

Bert Gyselinckx (IMEC/Holst Centre, Netherlands)

Abstract: A healthier world ... is that not what we all wish for in our new year's resolutions? So why is it then that so many people suffer from disease that is caused be lifestyle choices. There are many ways to approach this question. In this talk, we take the stance that people have a strong trust in the incumbent healthcare system and under estimate the impact daily behavior can have on the longer term. As a result we are seeing pandemics of chronic disease that are saturating our healthcare systems and driving the costs through the roof. Technology cannot change the world, only people can. But technology can help people change the world. Biochemical progress has equipped us with drugs that can cure disease. Engineering and physics have brought us very advanced imaging systems that can look inside us without using a scalpel. Now we are ready for the next step. Technology that can help us with making lifestyle changes. Technology that will help us manage our health. And in the odd event that we do get ill, the same technology will help us to get on our feet as fast as possible. The magic ingredient for all of this, is the body area network. Think of it as an aura of invisible health guards. These health guards are embedded in clothing, jewelry, or eyewear and communicate with the cloud through a mobile device. This revolution is happening today. Many players are introducing cardiac, brain, sleep, or energy expenditure monitoring systems. This is just a first step of this new wave. Cool chips, chips that are very small and hardly consume any power will allow to further miniaturize such systems. This will lead to throughly pervasive BAN systems. In 10 years time, we will all be walking around with embedded health guards, like we are carrying around our cellphones today. In 10 years time, we will manage our own health. Or, we will at least be better equipped to do so if we want.

Bert Gyselinckx is General Manager of Holst Centre - imec. Bert was instrumental in defining the technical strategy of the Holst Centre at its creation in 2005. He brought to the Holst Centre his innovation management experience and know how in wireless research from imec. Bert is well known in the scientific community for his pioneering contributions to wireless OFDM communications leading to our current generation of WiFi modems. Bert lives by the golden rule “working hard, playing hard”. In 2001, he replaced his office chair for a bike saddle and went on a 12 month odyssey in the Asia Pacific region. After 15000km in “under” developed countries, he was inspired to create technologies that can have a true impact on society. For this purpose, he established the Human++ program within imec. This program develops disruptive technologies for health and comfort monitoring. As the exponent of the Human++ program, Bert became known as a thought leader in the area of body area networks. Bert is also a board member of NanoLabNL, a Dutch national facility for nanotechnology research. Bert received the M.S. degree in Electrical Engineering from the Rijksuniversiteit Gent, Belgium, and the DEA degree in Air and Space Electronics from the École nationale supérieure de l'aéronautique et de l'espace, Toulouse, France. He is also a trainee at the Research and Development group of Siemens in Munich, Germany.

Why and how “Watson” Answered Questions on the TV Quiz Show?

Hiroshi Kanayama (IBM Japan, Japan)

Abstract: IBM Research decided to build a question answering system, Watson, to compete with humans on the American TV Quiz show, Jeopardy! We improved Natural Language Processing (NLP), Information Retrieval (IR), Machine Learning (ML), massively parallel computation and Knowledge Representation and Reasoning (KR&R), and created an architecture to integrate all of them, and the resulting system competed on television against two famous human champions on an equal footing. Our results proved that our architecture is effective and extensible and may be used as a foundation for combining, deploying, evaluating and advancing a wide range of algorithmic techniques to rapidly advance the field of open domain question answering. In this talk I will introduce the grand challenge, present an overview of the technology upon which Watson is built, including hardware points of view, and explore how the technology is being applied to new industries and domains, such as healthcare.

Hiroshi Kanayama joined IBM Research - Tokyo just after receiving Master Degree in 2000. His research interest is natural language processing, mainly syntactic and semantic analysis of Japanese language. His research outcomes on Japanese syntactic parser and sentiment analysis were integrated with a software product IBM Content Analytics, and they differentiated IBM's text mining solutions. From 2008 he joined Watson project (Jeopardy! challenge) and provided lexical resources for type matching using information extraction techniques. In 2012 he received Ph.D. degree from the University of Tokyo with his work on sentiment analysis.

What Can Supercomputers Learn from Phones?

Michael McCool (Intel, USA)

Abstract: Energy efficiency and power management have become crucial factors in all forms of computation, from phones to supercomputers, and Intel produces processors for the entire range. Phones, with their limited thermal and battery power budgets, need careful management of power consumption driven by the specific use-case scenarios in which these devices are used. For example, mobile applications are frequently very "bursty" in nature and it is possible to temporarily exceed the thermal budget of the device and race to halt, improving both the user experience (since the task is completed faster) and lowering the overall system power requirements (since the display and other components can be turned off sooner). A great deal of effort also goes into systems management in phones so that features not actually needed for a specific task can be powered down, and so that the minimum processor power needed for the task can be used. Supercomputer applications also have limited power budgets. Targets for Exascale computing in particular emphasize not only raw performance, but achieving that performance within a specific system power budget. Compared with phones, supercomputers are used for continuous operation, and so the strategies used for power management are different---or are they? What can we learn from phones to optimize power consumption in supercomputers? In this talk I will review the power management facilities in Intel processors across the compute continuum, their evolution, convergence, and divergence, and point out some interesting research opportunities.

Michael McCool (Intel Principal Engineer) has degrees in Computer Engineering (University of Waterloo, BASc) and Computer Science (University of Toronto, M.Sc. and PhD.) with specializations in mathematics (BASc) and biomedical engineering (MSc) as well as computer graphics and parallel computing (MSc, PhD). He has research and application experience in the areas of data mining, computer graphics (specifically sampling, rasterization, path rendering, texture hardware, antialiasing, shading, illumination, function approximation, compression, and visualization), medical imaging, signal and image processing, financial analysis, and parallel languages and programming platforms. In order to commercialize research work into many-core computing platforms done while he was an Associate Professor at the University of Waterloo, in 2004 he co-founded RapidMind, which in 2009 was acquired by Intel. Currently he is a software architect with Intel working on parallel programming languages, applications, and mobile computing. In addition to his university teaching, he has presented numerous tutorials at Eurographics, SIGGRAPH, and SC on graphics and/or parallel computing, and has co-authored three books. The most recent book, Structured Parallel Programming, was co-authored with James Reinders and Arch Robison. It presents a pattern-based approach to parallel programming using a large number of examples in Intel Cilk Plus and Intel Threading Building Blocks.

Next Generation Vector Supercomputer for Providing Higher Sustained Performance

Shintaro Momose (NEC, Japan)

Abstract: Many of the current supercomputers are designed to target higher peak performance with lower memory bandwidth due to the memory wall issue. However, the characteristics of applications in the High Performance Computing (HPC) area are getting diversified and the sustained performance of each application strongly depends on not only the peak performance of the system but also its memory bandwidth. The NEC's goal is to provide higher sustained performance for every application area with the vector supercomputer for memory-intensive applications and with commodity products such as x86-based clusters and accelerators for compute-intensive applications. Within a year, we are going to launch the Next Generation Vector Supercomputer (NGV) aimed at much higher sustained performance particularly for memory intensive applications, which is the successor model of our current SX-9 products. NGV is designed with the big core strategy targeting higher sustained performance in parallel execution and improved programmability/productivity for users. NGV has both the world highest single core performance of 64GF and the world highest memory bandwidth per core of 64GB/s. Four cores, memory controllers and network controllers are integrated into one CPU LSI, enabling the CPU performance of 256GF and the memory bandwidth of 256GB/s. NEC is also participating in the feasibility study of future HPC technologies as part of the Japanese governmental HPCI project with Tohoku University and JAMSTEC. Our target will remain unchanged with the big core-based design balanced with large memory bandwidth over 1 Byte/Flop, pursuing higher sustained performance (not Linpack) for the enhanced applicability to disaster prevention/mitigation and industrial applications.

Shintaro Momose is a manager of HPC Division at NEC Corporation, Tokyo, Japan. He is a hardware architect of SX vector supercomputer. Especially, his research focuses on building a grand design of HPC system and analyzing future HPC trends. He received the B.E. Degree in Mechanical Engineering, and the M.S. and the Ph.D. Degrees in Information Sciences form Tohoku University in 1999, 2003, and 2005 respectively.

Invited Presentations

A 28nm HKMG Single-Chip Communications processor with 1.5GHz dual core application processor and LTE/HSPA+ capable Baseband processor

Takeshi Kataoka (Renesas Mobile Corp., Japan)

Abstract: The R-Mobile U2 (RMU2) achieves single-chip integration of 1.5GHz dual core application processor and 2G/3G/HSPA+/LTE baseband processor by 28nm HKMG High-Performance and Low-Leakage CMOS bulk process, which satisfies low leakage current and high performance. Additionally, this chip features a clock control mechanism “Power Saver” to limit maximum power and to reduce IR drop in short term. This chip also features IO NMOS power switch and Dual Mode Low-leak SRAM to minimize leakage current.

Takeshi Kataoka is a Director of Mobile SoC Design Department in Renesas Mobile Corporation. He received the B.E. and M.E. degrees in Applied Physics from University of Tokyo, Tokyo, Japan in 1991 and 1993, respectively. Also he received the M.S. degree in Electrical Engineering from Stanford University, California, United States in 2001. He was a designer and architect of 32bit RISC CPU core, SH-series in Hitachi Ltd. and Renesas Technology Corporation, and a co-author of papers in CoolChips VII, VIII and XI. Also he had been engaging in development and business of Microcontroller in Automotive field, as power train, car multimedia and dash board for many years. Now he is engaging in development and business of SoC for Mobile Application and Baseband Field.

Zero Overhead State-Retention Power-Gating and Gate-Bias on a Dual-Core ARM Cortex-A5MP Processor for 50-80% Idle Power Reduction

James Myers (ARM Ltd, UK)

Abstract: Power gating is now a mainstream leakage mitigation technique implemented in all modern SoCs, but saving away program state and actually shutting down is an energy gamble left to the Operating System. Hardware state retention registers allow more opportunity for power gating with less software, energy and latency overhead. A dual-core ARM® processor is fabricated on a 65nm process with zero-area retention, along with on-chip voltage reduction circuitry, which enables five intermediate power modes of gradually increasing leakage reduction and wake latency. Leakage savings between 50-80% are observed at room temperature.

James Myers is a Staff Engineer in ARM's Silicon R&D group. He joined ARM in 2007 where he was initially responsible for developing reference implementation flows for the various ARM soft processor cores. Joining R&D full time in 2009, he has since focused on deployable techniques for reduction of CPU and SoC power. His current research interests include low power circuits, advanced power gating, low voltage and better than worst-case design. James holds an MEng from Imperial College, London.

SPARC64^TM X: Fujitsu's New Generation 16 Core Processor for UNIX servers

Toshio Yoshida (Fujitsu Limited, Japan)

Abstract: Fujitsu has developed a new processor SPARC64^TM X for UNIX server, which runs at a speed of 3GHz and consists of 16 cores, a 24MB shared level 2 cache, memory controllers, IO controllers and system controllers which connect multiple chips. We have strengthened a microarchitecture and introduced the enhanced instruction sets HPC-ACE (High performance computing-arithmetic computational extensions), which have been already applied to the K computer. Peak memory bandwidth reaches 102GB/s. An extremely high throughput performance is realized by those features. In addition, we have added new functions to the core pipelines, which accelerate software such as cryptography processing and some specific applications. We call these architectures 'Software on Chip.' Furthermore, high-reliability technology for mainframes is used to ensure stable operation of a mission-critical system. This presentation will introduce the overview of SPARC64TM X, performance and power efficiency of 'Software on Chip.'

Toshio Yoshida is a director of LSI Development Division in the Next Generation Technical Computing unit at Fujitsu. His technical interests include microprocessor architecture. Yoshida received his MS in physics from the Faculty of Science and Graduate School of Science at the University of Tokyo.

Panel Discussion

Topics: "The Next Step in Processor Evolution"

Organizer and Moderator:

Yoshio Masubuchi (Toshiba, Japan)

Panelists:

Bert Gyselinckx (IMEC, Netherlands)
Michael McCool (Intel, USA)
Shintaro Momose (NEC)
James Myers (ARM, UK)
Toshio Yoshida (Fujitsu)

Special Sessions (invited lectures)

STHORM: A Multi-Processor Platform and Programming Environment

Pierre G. Paulin (STMicroelectronics, Canada)

Abstract: In this talk we will present the STMicroelectronics STHORM^* multi-processor fabric and its programming environment.
                                       ( ^* formerly known as Platform 2012)
・ STHORM many-core platform overview
  ・ Multi-cluster and multi-core platform
  ・ Asynchronous Network-on-Chip
  ・ Memory hierarchy and DMA
  ・ STHORM SoC
・ Platform programming model (PPM) support:
  ・ OpenCL: Standard PPM for S/W-based platform variants
  ・ Native Programming Model (NPM): A flexible, open source component-based
     environment and runtime for programming model experimentation.
  ・ Predicated Execution Data Flow (PEDF), for mixed HW-SW platform variants
・ Programming model-aware debug, trace, visualization and analysis tools
・ STHORM platforms
  ・ Virtual platform: Functional and performance models
  ・ STHORM evaluation board
・ Application and benchmarking results
  ・ Focus on computer vision and augmented reality applications
  ・ Application library, integrated in OpenCV and Android environments
  ・ Benchmark comparison with general-purpose processor

Dr. Pierre G. Paulin is director of System-on-Chip Platform Automation at STMicroelectronics, Ottawa, Canada. He is responsible for the platform programming tools of a large-scale multi-processor SoC fabric in ST. Previously, he was director of Embedded Systems Technologies for ST in Grenoble, France. Before this, he managed embedded software tools and high-level synthesis R&D with BNR, the research lab of Nortel Networks. He obtained a Ph.D. from Carleton University, Ottawa, and B.Sc. and M.Sc. degrees from Laval University, Quebec. He won the best presentation award at DAC in 1986, and won the best paper award at ISSS-Codes in 2004. His paper on Force-directed scheduling for high-level synthesis was chosen in 1988 for the DAC compendium of the most influential papers over a 25 year period. He is a member of the IEEE.

Hot Research Issues in Main Memory Subsystem

Jung Ho Ahn (Seoul National University, Korea)
Sungjoo Yoo (POSTECH, Korea)

Abstract: DRAM has served as a de facto standard for main memory for decades owing to its low-latency and high-density features. Recently, the power and latency of its inter-package and on-chip global datalines has become the primary inhibitors in further improving the performance and energy-efficiency of the main-memory systems. 3D stacking technologies, such as through-silicon via, have been suggested and adopted to alleviate the problem by lowering the impedance and physical distance between storage and computation components. In this talk, we first survey the solution space of 3D stacked DRAM systems from industry and academia. After reviewing their recent progresses, we identify the active issues in developing and utilizing the 3D stacked DRAM systems. In the long-term, emerging memory technologies can resolve the scaling problem of DRAM. We focus on a promising emerging memory, phase-change RAM (PRAM). First, we present an overview of recent research work on applying PRAM to the main memory subsystem. Then, we explain key solutions in the topics of hybrid DRAM/PRAM, data encoding for bit update reduction, wear leveling, error correction, and write performance improvement. Finally, we present current issues to realize multi-level cell PRAM which is imperative to realize low bit-cost in PRAM.

Jung Ho Ahn received the B.S. degree in electrical engineering from Seoul National University in 1997, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University in 2002 and 2007, respectively. He is currently an Assistant Professor in the Graduate School of Convergence Science and Technology at Seoul National University, where he leads the Scalable Computer Architecture Laboratory. Prior to joining Seoul National University, he was a research scientist at Hewlett-Packard Laboratories, Palo Alto, California, from 2007 to 2009. His research interest includes designing high-performance and high-efficiency memory systems and interconnection networks as well as bridging the gap between the performance and energy efficiency demand of emerging applications and the potential of modern and future massively parallel systems.

Sungjoo Yoo is currently an assistant professor at Department of EE, POSTECH, Korea. He received Ph.D. from Seoul National University in 2000. He worked as researcher at TIMA laboratory, Grenoble France from 2000 to 2004. He was also with Samsung System LSI from 2004 to 2008, where he led system-on-chip architecture design team and was involved in memory and bus architecture designs for mobile application processors and performance modeling and optimization of solid state disk. His research interests include software, architecture and RTL design for low power SoC, and memory and storage hierarchy from cache, DRAM, phase-change RAM to solid state disk. He received Best Paper Award at International SoC Conference (ISOCC) in 2006 and Best Paper Award nominations at Design Automation Conference (DAC) in 2011 and Design Automation and Test in Europe (DATE) in 2002 and 2009.

Keynote Presentations

CoolChips at the core of a healthier world

Bert Gyselinckx (IMEC/Holst Centre, Netherlands)

Why and how “Watson” Answered Questions on the TV Quiz Show?

Hiroshi Kanayama (IBM Japan, Japan)

What Can Supercomputers Learn from Phones?

Michael McCool (Intel, USA)

Next Generation Vector Supercomputer for Providing Higher Sustained Performance

Shintaro Momose (NEC, Japan)

Invited Presentations

A 28nm HKMG Single-Chip Communications processor with 1.5GHz dual core application processor and LTE/HSPA+ capable Baseband processor

Takeshi Kataoka (Renesas Mobile Corp., Japan)

Zero Overhead State-Retention Power-Gating and Gate-Bias on a Dual-Core ARM Cortex-A5MP Processor for 50-80% Idle Power Reduction

James Myers (ARM Ltd, UK)

SPARC64TM X: Fujitsu's New Generation 16 Core Processor for UNIX servers

Toshio Yoshida (Fujitsu Limited, Japan)

Panel Discussion

Special Sessions (invited lectures)

STHORM: A Multi-Processor Platform and Programming Environment

Pierre G. Paulin (STMicroelectronics, Canada)

Hot Research Issues in Main Memory Subsystem

Jung Ho Ahn (Seoul National University, Korea) Sungjoo Yoo (POSTECH, Korea)

SPARC64^TM X: Fujitsu's New Generation 16 Core Processor for UNIX servers

Jung Ho Ahn (Seoul National University, Korea)
Sungjoo Yoo (POSTECH, Korea)