Call for Participation | IEEE COOLChips 27

[pdf version is here](As of 2024-04-10)

Keynote Presentations

Accelerating AI with Analog In-Memory-Computing

Stefano Ambrogio (IBM Research)

Abstract: The last decade has witnessed the pervasive spread of AI in a variety of domains, from image and video recognition and classification to speech and text transcription and generation. In general, we have observed a relentless run towards larger models with huge number of parameters. This has led to a dramatic increase in the computational workload, with the necessity of several CPUs and GPUs to train and inference neural networks. Therefore, improvements in the hardware have become more and more essential. To accommodate for improved performance, in-memory computing provides a very interesting solution. While digital computing cores are limited by the data bandwidth between memory and processor, computation in the memory avoids the weight transfer, increasing power efficiency and speed. The talk will describe a general overview, highlighting our own 14-nm chip, based on 34 crossbar arrays of Phase-Change Memory technology, with a total of around 35 million devices. We demonstrate the efficiency of such architecture in a selection of MLPerf networks, demonstrating that Analog-AI can provide superior power performance with respect to digital cores, with comparable accuracy. Then, we provide guidelines towards the next steps in the development of reliable and efficient Analog-AI chips, with specific focus on the architectural constraints and opportunities that are required to implement larger and improved Deep Neural Networks.

Stefano Ambrogio obtained his PhD in 2016 at Politecnico di Milano, Italy, studying the reliability of resistive memories and their application on neuromorphic networks. He is now a Senior Research Scientist at IBM-Research, Almaden, in the Analog AI team, working on hardware accelerators based on Non-Volatile Memories for neural network inference.

“Energy-Efficient Heterogeneous Photonics for Next Generation AI and Hardware Accelerators”

Stanley Cheung (Hewlett Packard Enterprise)

Abstract: As Moore’s law, Dennard scaling, and the Von-Neumann bottleneck continually push the boundaries of conventional computing, there arises a need to seek alternative architectures, systems, and devices. On the other hand, AI/ML applications in mega-data centers and high-performance post-exa-scale computing will require an interconnect solution that scales in bandwidth and energy efficiency. In this talk, I will first introduce our unique heterogeneous III-V quantum-dot-on-Si platform for high-performance optical interconnects and post-exa-scale computing systems. Key devices and architectures will be discussed that significantly improve bandwidth-density/energy-efficiency metrics by orders of magnitude. This lays the groundwork for exploring photonic in-memory neuromorphic computing leveraging our recent breakthroughs in non-volatile photonic devices: photonic memristors & photonic charge-trap memory.

Stanley Cheung is currently a Principal Research Scientist in the Large-Scale-Integrated-Photonics (LSIP) Laboratory within Hewlett Packard Enterprise. His interests lie within pushing the state-of-the-art in integrated photonics for communications and computing applications. Currently, he has shifted his attention towards the use of non-volatile, ultra-low-power, heterogeneous III-V/Si photonics for novel neuromorphic/brain inspired computing architectures as well as general purpose programmable optics. He has authored and co-authored over 80+ journal and conference papers and was granted 11 patents with another 50+ pending.

“Processing-in-Memory: from Technology to Products”

Kyomin Sohn (Samsung Electronics)

Abstract: The traditional computing architecture represented by Von Neuman maintains a simple memory hierarchy that is still in use to this day. However, the strong demand for computing power, which began with big data and AI applications, is evolving from a new memory hierarchy. In particular, the emergence of large language models in generative AI, requires higher bandwidth and higher capacity of memory. This talk provides an explanation of the key concepts of in-memory computing, which is referred as CIM (compute-in-memory) or PIM (processing-in-memory). CIM technology enables a memory array as a processing unit using inherent feature of multiplication between wordline and bitline. In contrast, PIM technology utilizes internal memory bandwidth by allocating processing units near memory array and activating them simultaneosly. The concept of PIM technology is already proven by HBM2-PIM and GDDR6-AiM from the major DRAM vendors. It is the time to apply this technology to the mass-produced DRAM products. In the system and application having low operational intensity of data, PIM technology looks very attractive to overcome the limitation. However, there are obstacles to apply PIM technology to the conventional DRAM directly. The challenges will be discussed and the several suggestions will be given. From the PIM technology to the DRAM products with PIM technology, this journey will go on.

Kyomin Sohn is a Samsung Master (VP of Technology) in Samsung Electronics and he is responsible for future architecture and circuit technology of DRAM. He received the B.S. and M.S. degrees in Electrical Engineering in 1994 and 1996, respectively, from Yonsei University, Seoul. From 1996 to 2003, he was with Samsung Electronics, Korea, involved in SRAM Design Team. He designed various kinds of high-speed SRAMs. He received the Ph.D. degree in EECS in 2007 from KAIST, Korea. He rejoined Samsung Electronics in 2007, where he has been involved in DRAM Design Team. His interests include 3D-DRAM, reliable memory design, and processing-in-memory. In addition, he has currently served as a Technical Program Committee member of Symposium on VLSI Circuits since 2012.

“Intel Foundry Advanced Packaging and Test: Enabling Disaggregation in AI and PC“

Chunqing Peng (Intel)

Abstract: Intel Foundry advanced packaging and Test solution are well positioned to enable next generation of AI and PC packaging. Future of packaging are relying on disaggregation in all front. Intel’s broad profolio range from FCBGA 2D pkg to 2.5D EMIB package to 3D Foveros and 3.5D CoEMIB technology. This Presentation will share Intel’s Technology Roadmap and example in both High-performance compute AI Chip and PC compute chip.

Chunqing Peng is currently Principal Engineer and Advanced Packaging Customer Engineering Manager at Intel Foundry Packaging Business Group since 2023. Chunqing received his PhD degree in Materials Science from Georgia Institute of Technology, Atlanta, Georgia in 2011, and then joined Intel assembly test and technology department (ATTD). He has 13 year of IC packaging development experience in package design, assembly process, quality and reliability in 3DIC heterogeneous advanced packaging (Foveros, EMIB 3.5D), 3DIC package, EMIB 2.5D packaging, Package on Package, System in package (SIP), FCxGA packaging, and FCCSP packaging, for product ranging from Xeon Server Products, Client Computing Products, Modem, Smartphone Products, to Discreet GPU and HPC GPU Products.

“Hot AI by COOL SoCs”

Hoi-Jun Yoo (KAIST)

Abstract: In the current landscape of computing, the prevalence of AI applications on mobile devices has emphasized the critical importance of designing energy-efficient System-on-Chip (SoC) systems to curtail power consumption. Recognizing this, we present a comprehensive suite of low-power design techniques tailored to address the intricacies of various AI applications. One of the key pillars of our approach lies in harnessing the inherent sparsity within Convolutional Neural Networks (CNN). By strategically leveraging sparsity, we can intelligently skip unnecessary operations during the inference process. Our innovative design includes specific strategies like the Single Zero Skipping Logic, Dual Zero Skipping Logic, and Triple Zero Skipping Logic. These mechanisms collectively contribute to achieving a state-of-the-art level of energy efficiency, setting a new standard in the field. Beyond CNN optimization, we introduce Spiking Neural Networks (SNN) to increase sparsity within the input feature map. This nuanced incorporation enhances our ability to tailor the SoC design to the specific characteristics of AI workloads, further contributing to gains in energy efficiency. Moreover, we explore the synergies between CNN and SNN, presenting an approach that capitalizes on the strengths of both architectures for high energy efficiency. The culmination of these advancements results in the development of a highly energy-efficient SoC, proficient in processing a myriad of AI applications with remarkable power efficiency. Our design techniques extend beyond conventional applications, achieving the state-of-the-art energy efficiency in specialized domains such as deep reinforcement learning, 3D rendering utilizing Neural Radiance Fields (Nerf), and natural language processing with Large Language Models (LLM). In summary, our multifaceted approach to SoC design not only addresses the pressing need for energy efficiency in the realm of neural network processing, but also pushes the boundaries of AI applications that can be processed, making significant progress toward sustainable high-performance computing.

Hoi-Jun Yoo (Fellow, IEEE) graduated from the Department of Electronics, Seoul National University, Seoul, South Korea, in 1983. He received the M.S. and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 1985 and 1988, respectively. Dr. Yoo has served as a member of the Executive Committee for the International Solid-State Circuits Conference (ISSCC), the Symposium on Very Large-Scale Integration (VLSI), and the Asian Solid-State Circuits Conference (A-SSCC), the TPC Chair of the A-SSCC 2008 and the International Symposium on Wearable Computer (ISWC) 2010, the IEEE Distinguished Lecturer from 2010 to 2011, the Far East Chair of the ISSCC from 2011 to 2012, the Technology Direction Sub-Committee Chair of the ISSCC in 2013, the TPC Vice-Chair of the ISSCC in 2014, and the TPC Chair of the ISSCC in 2015. More details are available at http://ssl.kaist.ac.kr

Special Invited presentation

“Achieving the most energy-efficient compute fabric for ML and HPC using of thousands of RISC-V cores”

Dave Ditzel(Esperanto Technologies Inc)

Abstract: Esperanto Technologies will describe high performance compute fabrics for machine learning and high performance computing. The fabric is based on clusters of 32 custom designed RISC-V processors called ET-Minions, each of which as an attached vector unit and attached tensor unit. This fabric and software to drive it was implemented in 7nm with over a thousand ET-Minions on a single die. Future implementations will utlize a chiplet based approach to be able to grow the compute fabric across multiple die. Chiplet designs for 4nm and 2nm are discussed, allowing for up to 4096 ET-Minions and performance levels that could exceed high end GPUs. The implementation approach is focused on energy effiiency though low-voltage techniques, and and performance per watt projections for different data types are presented. The conclusion is that an array of general purpose processsors based on RISC-V can provide one of the most energy-efficient compute fabrics.

Dave Ditzel is the founder and CTO of Esperanto Technologies, Inc., a company founded in 2014 that builds energy-efficient processors for AI and beyond based on the RISC-V instruction set. Prior to Esperanto, Dave spent six years at Intel Corporation as vice-president of Hybrid Computing, leading a team building a high-performance out-of-order processor using binary translation to run legacy x86 or ARM applications with improved energy efficiency. In 2007 he founded ThruChip Communications, to reduce IO energy between die by using wireless inductive communication. In 1995 Dave founded Transmeta Corporation, which developed low-power x86 compatible processors using Code Morphing binary translation on top of an energy-efficient VLIW architecture. Dave spent 10 years at Sun Microsystems as CTO for the SPARC Technology Business and led the development of the 64-bit SPARC ISA and various SPARC processors. Prior to Sun, Dave spent 10 years at AT&T Bell Laboratories, where he worked on a series of RISC processors optimized for the C programming language. Dave Ditzel was a graduate student under U.C. Berkeley Professor David Patterson and in 1980 they co-authored “The Case for the Reduced Instruction Set Computer”, which catalyzed the movement to RISC processors.

Panel Discussion

Topics “Exploring the Potentials, Limitations, and Challenges of PiM (Processing-in-Memory) and CiM (Computation-in-Memory)”

Organizer and Moderator:
Yasuhiko Nakashima (NAIST)

Panelist:

Stefano Ambrogio (IBM Research)
Kyomin Sohn (Samsung Electronics)
Hoi-Jun Yoo (KAIST)
Yu-Guang Chen (National Central Univ.)

Abstract: This panel discusses Processing-in-Memory (PiM) and Computation-in-Memory (CiM). The main topics include the potential, limitations, and challenges of each computational technique from their respective standpoints. We will continue the discussion with comments and questions from the audience to help us understand future directions.

Yasuhiko NAKASHIMA received B.E., M.E., and Ph.D. degrees in Computer Engineering from Kyoto University in 1986, 1988 and 1998, respectively. He was a computer architect in the Computer and System Architecture Department, FUJITSU Limited, from 1988 to 1999. From 1999 to 2005, he was an associate professor at the Graduate School of Economics at Kyoto University. Since 2006, he has been a professor at the Graduate School of Information Science, Nara Institute of Science and Technology. His research interests include computer architecture, emulation, circuit design, and accelerators.

Special Sessions (invited lectures)

“Navigating Aging Realities: Integrating Reliability into Cutting-Edge Computing Systems”

Andy Yu-Guang Chen (National Central Univ. Taiwan)

Abstract: As CMOS technology undergoes further scaling down, the emergence of the aging effect poses a significant threat to the lifetime reliability of computing systems, with the potential to induce performance degradation or timing failures. Effectively addressing these challenges necessitates a comprehensive understanding of how aging effects impact the outcomes of modern computing systems, prompting the development of methodologies dedicated to aging detection, mitigation, and tolerance. This presentation aims to provide a concise overview of major aging effects and their root causes. Subsequently, a deeper exploration will unfold, focusing on two pivotal aspects: (1) Aging-aware, energy-efficient task deployment for heterogeneous multicore systems, and (2) Aging-aware SRAM-based Computing-In-Memory architecture specifically tailored for multiply-accumulate operations. Throughout the talk, I will showcase innovative concepts devised by our research team to confront these challenges and elaborate on the encountered implementation difficulties. The overarching goal is to furnish the audience with a foundational background in the design of reliable computing systems and to inspire additional researchers to contribute to this dynamic and evolving field.

Andy Yu-Guang Chen received his B.S. and Ph.D. degrees in Computer Science from National Tsing Hua University, Hsinchu, Taiwan, in 2009 and 2016, respectively. He held the position of Lecturer at Missouri University of Science and Technology, MO, USA, in 2015, and later served as a research fellow at the University of Notre Dame, IN, USA, in 2016. Following this, Dr. Chen worked as a project assistant on the ICT project at St. Kitts and Nevis with ICDF Taiwan from 2016 to 2017. From 2017 to 2019, he was part of the Department of Computer Science and Engineering at Yuan Ze University. Currently, he is an Assistant Professor in the Department of Electrical Engineering at National Central University, Taoyuan, Taiwan, since 2019. Additionally, he has been an Adjunct Assistant Professor in the Department of Computer Science at National Tsing Hua University, Hsinchu, Taiwan, since 2018. Dr. Chen’s research focuses on reliable circuit and system design, Computing-In-Memory (CIM) architecture design, AI for physical design, and hardware security. He has authored numerous technical papers and actively contributed as a committee member in major conferences such as DAC, ASP-DAC, A-SSCC, ISVLSI, GLSVLSI, SASIMI. He has also served as a reviewer for esteemed journals like TCAD, TVLSI, ACM JETC. Dr. Chen’s involvement extends to co-chairing The CAD Contest at ICCAD and CADathlon at ICCAD from 2019 to 2023. Notably, he has received several awards, including The Chinese Institute of Electrical Engineering Outstanding Young Electrical Engineer Award (2023), National Central University Outstanding Teaching Award (2023), National Central University Excellent Hostel Instructor (2023), and the Best Paper Award at the Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI) in 2024.

“Radiation-hardened circuit design for space application”

SinNyoung Kim (imec)

Abstract: Space projects are being divided into two distinct categories. The first category consists of projects driven by private funding, focusing on cost, time to market and volume as new business models. The second category comprises the traditional projects led by national or international space agencies. There is not only a growing demand of Commercial Off-The-Shelf (COTS) products for projects in the first category but also an increasing importance of System-on-Chip (SoC) designs with radiation-hardening for projects in the second category, particularly in light of moon exploration efforts. As more and more countries join the race for moon exploration, such as Artemis project, highly reliable microelectronic systems to enable communication, rover control, environmental observation on the moon, and other tasks becomes essential for lunar missions. Consequently, the radiation-hardened SoCs are required to ensure high reliability of systems on the moon that endure the high energy and probability of particle hits. This talk will introduce the design of radiation-hardened circuits, which are one of the basic elements for radiation-hardened SoCs, ensuring the successful operation of your chips on the lunar surface.

SinNyoung Kim received M.S.(2009) degree from Hanyang University and Ph.D.(2014) degrees from Kyoto University. She studied the analysis and design of radiation-hardened phase-locked loop in Kyoto University. In Toshiba Electronic Devices & Storage Corporation, she worked as an analog mixed-signal designer (2014 – 2017). She is currently working in IMEC and in charge of radiation-hardened PLL design in several European space projects.