[pdf version is here](As of 2023-04-13)
Sustainability and Fleet Manageability Innovations with 4th Gen Intel Xeon processor
Arijit Biswas and Pankaj Kumar (Intel)
Abstract: System architecture approach along with architecture innovation of Intel Xeon processors to lower customers’ energy use emissions, also known as scope 2 emissions, due to their energy efficiency achieved through power management, platform monitoring technology and design of built-in accelerators. These accelerators are designed for today’s in-demand workloads and deliver significant performance per watt advantage. Another innovation is the Optimized Power Mode feature that, when enabled, provides significant energy savings while only minimally impacting performance. This session will also highlight new telemetry features of Intel Xeon processors to enable companies to better monitor and control electricity consumption and carbon emissions. And platform Monitoring Technology for better telemetry fleet management through exposure of CPU core temperature, power consumption, package C state residency and error information telemetry to both in-band and out-of-band agents.
Arijit Biswas is a Senior Principal Engineer in the Datacenter Products Architecture group at Intel. He graduated from Carnegie Mellon University in 1997, joining Intel’s Pentium 4 design team immediately thereafter. During his 20+ year career he has done everything from post-Si debug through circuit design & validation to architecture & micro-architecture. Arijit is currently the technical director of the Technologies for Reliability & Usage group, responsible for developing several marquee technologies such as Turbo Boost Max 3.0 and Xeon soft error reliability mitigation architectures. Arijit is the chief architect of the Sapphire Rapids & Emerald Rapids Xeon CPUs.
Pankaj Kumar is Chief Architect for Xeon server platform and Senior Principal Engineer at Intel Datacenter and AI Group where he drives architecture definition of next generation server platform. During his 20+ year career at Intel, he successfully drove platform architecture of multiple generations of Server and Storage platform for Enterprise, Hyperscale and Edge datacenter. His area of expertise is storage system and established Xeon processor as architecture of choice for storage workloads.
“RISC-V Robust Ecosystem”
Mark Himelstein (RISC-V International)
Abstract: This talk will discuss the state of RISC-V and its software ecosystem. RISC-V members have already shipped in excess of 10 billion cores for profit so what are they and their customers running on RISC-V? From EDA tools to firmware to operating systems to runtime infrastructure and applications, the RISC-V ecosystem provides both open source and commercial products for implementers and customers to take advantage of in order to enable solutions and inevitably success.
Mark Himelstein was, before RISC-V international, the President of Heavenstone, Inc. which concentrated on Strategic, Management, and Technology Consulting providing hardware and software product architecture, analysis, mentoring and interim management. Previously, Mark started Graphite Systems, Inc (acquired by EMC) where he was the VP of Engineering and CTO developing large Analytics Appliances using highly integrated FLASH memory. Prior to Graphite, Mark held positions as the CTO of Quantum Corp, Vice President of Solaris development engineering at Sun Microsystems and other technical management roles at Apple, Infoblox, and MIPS. Mark has a bachelors degree in Computer Science and Math from Wilkes University in Pennsylvania and a masters degree in Computer Science from University of California Davis/Livermore. In addition to publishing numerous technical papers and holding many patents, he is the author of the book “100 Questions to Ask Your Software Organization”.
“Next Generation Cryogenic Superconductor Computing ~ From Classic to Quantum ~”
Koji Inoue (Kyushu University)
Abstract: Moore’s Law, doubling the number of transistors in a chip every two years, has so far contributed to the evolution of computer systems. Unfortunately, we cannot expect sustainable transistor shrinking anymore, marking the beginning of the so-called post-Moore era. Therefore, it has become essential to explore emerging devices, and superconductor single-flux-quantum (SFQ) logic that operates in a 4.2-kelvin environment is a promising candidate. Josephson junctions (JJs) are used as switching elements in SFQ logic to compose a superconductor ring (SFQ ring) that can store (or trap) and transfer a single magnetic flux quantum. It fundamentally operates with the voltage pulse-driven nature that makes it possible to achieve extremely low-latency and low-energy JJ switching. This talk shares the history of our SFQ Research, e.g., revisiting microarchitecture and demonstrating over 30 GHz microprocessors, AI accelerator designs, and recently targeting quantum computers. Then, the role of computer architecture for such emerging device computing is discussed.
Koji Inoue is a professor of Department of Advanced Information Technology at Kyushu University, Japan. His current broader research interests are in computer architecture, IoT platform, and cyber-physical systems. Currently his driving researches target emerging devices for superconductor computing, nanophotonic computing, and quantum computing. He served as a general chair of many conferences, including the IEEE/ACM International Symposium on Microarchitecture (MICRO) 2018, the International Forum on Embedded MPSoC and Multicore (MPSoC) 2011, and the International Symposium on Low Power Electronics and Design (ISLPED) 2011. He received his PhD in Computer Science and Communication Engineering from Kyushu University in 2001. He also joined Halo LSI Design & Technology, Inc., NY, as a circuit designer in 1999.
“AI Software stack: enabling co-optimizations on Deep Learning frameworks”
Kazuaki Ishizaki (IBM)
Abstract: Deep neural networks are becoming popular since they improve the accuracy of machine learning tasks in multiple domains among image, object detection, language, speech, conversation, code, and others. These improvements are enabled by increasing the number of parameters and computational operations. AI accelerator is critical to achieving high accuracy in these tasks since they require a lot of computational resources for training and inference. In this talk, I will review co-optimizations among hardware, algorithms, and software to achieve high performance per watt in an AI accelerator. The algorithms to maintain the same level of accuracy in low-precision arithmetic can simplify the hardware design and implementation. That hardware can achieve high performance at good power efficiency. I will present how the AI software stack can enable these optimizations during the compilation from Deep Learning frameworks to AI accelerators.
Kazuaki Ishizaki is a Senior Technical Staff Member at IBM Research – Tokyo. His research interests include optimizing compilers, performance optimizations, and parallel processing. He has worked on various compiler and performance research projects, including High Performance Fortran compiler, Java just-in-time compiler, Python compiler, distributed frameworks, and web applications for over 30 years. Many contributions have been incorporated into IBM products, such as XL Fortran compiler, IBM Java runtime, WebSphere application server, and Db2. He is an ACM Distinguished Member and is a committer of Apache Spark and Apache Arrow.
“Compute Express Link (CXL): Shaping the compute landscape”
Debendra Das Sharma (Intel)
Abstract: High-performance workloads demand heterogeneous processing, tiered memory architecture, infrastructure accelerators such as SmartNICs, and infrastructure processing units to meet the demands of the emerging compute landscape. Applications such as artificial intelligence, machine learning, data analytics, 5G, automotive, and high-performance computing are driving significant changes within cloud computing, intelligent edge and client computing infrastructure. Interconnect is a key pillar in this evolving computational landscape. The recent advent of Compute Express Link (CXL), a new open standard for cache-coherent interconnect, with its memory and coherency semantics has made it possible to pool computational and memory resources at the rack level using low-latency, higher-throughput, and memory-coherent access mechanisms. CXL is adopting networking features such as multi-host connectivity, pooled memory, persistence flows, and fabric managers while keeping its low-latency load-store semantics intact. CXL is evolving to provide efficient access mechanisms across multiple nodes with advanced atomics, acceleration, SmartNICs, persistent memory support, etc. In this talk we will explore how synergistic evolution across load-store interconnects and fabrics can benefit the compute infrastructure of the future.
Debendra Das Sharma is an Intel Senior Fellow and co-GM of Memory and I/O Technologies in the Data Platforms and Artificial Intelligence Group at Intel Corporation. He is a leading expert on I/O subsystem and interface architecture. Das Sharma is a member of the Board of Directors for the PCI Special Interest Group (PCI-SIG) and a lead contributor to PCIe specifications since its inception. He is a co-inventor and founding member of the CXL consortium and co-leads the CXL Technical Task Force. He co-invented the chiplet interconnect standard UCIe and is the chair of the UCIe consortium. Das Sharma has a bachelor’s in technology (with honors) degree in Computer Science and Engineering from the Indian Institute of Technology, Kharagpur and a Ph.D. in Computer Engineering from the University of Massachusetts, Amherst. He holds 160+ US patents and 400+ patents world-wide. He is a frequent keynote speaker, plenary speaker, distinguished lecturer, invited speaker, and panelist at the IEEE Hot Interconnects, IEEE Cool Chips, IEEE 3DIC, SNIA SDC, PCI-SIG Developers Conference, CXL consortium, Open Server Summit, Open Fabrics Alliance, Flash Memory Summit, Intel Innovation, various Universities (CMU, Texas A&M, UIUC), and Intel Developer Forum. He has been awarded the Distinguished Alumnus Award from Indian Institute of Technology, Kharagpur in 2019, the IEEE Region 6 Outstanding Engineer Award in 2021, the first PCI-SIG Lifetime Contribution Award in 2022, and the IEEE Circuits and Systems Industrial Pioneer Award in 2022.
Topics “The Past, Present, and Future of COOLChips”
Hiroaki Kobayashi (Tohoku University)
Hideharu Amano (Keio University)
Kunio Uchiyama (AIST)
Makoto Ikeda (The University of Tokyo)
Abstract: In this panel discussion, looking back over the past quarter century of COOLChips, the current OC chair and four of our former OC chairs will share their views on developing low-power and high-speed microprocessors and systems in the past 25 years and its current trends and their visions for COOLChips toward the next 25 years.
Fumio Arakawa is a designated researcher of d.lab at the University of Tokyo, and an invited professor of Graduate School of Informatics at Nagoya University. His research interests include architecture and micro-architecture of low-power and high-performance processors. He has founded an R&D and consulting company, Famer Systems, Inc. to contribute on industries with his R&D experience of Hitachi and Renesas Electronics. He is an organizing committee co-chair of the Cool Chips conference series. He served as a Guest Editor of IEEE Micro for seven times, and TPC members of conferences including ISSCC, VLSI Circuits Symposium, A-SSCC, and MCSoC. He has a Ph.D. in electrical engineering from the University of Tokyo. He is a member of IEEE and IEICE.
Tadao Nakamura is currently a Professor Emeritus of Tohoku University, and also a Professor (as a visiting status) of Keio University. In 1994 he was given the status of a Full Professor (as a visiting status) in the Electrical Engineering Department at Stanford University. And even today he still stays at Stanford at any time. In 2007 he was also induced as a Professorial Fellow at Imperial College London. His research interests are toward electronic and quantum-gated computer systems. In 2004 he received the IEEE Computer Society’s Taylor L. Booth Award. He has been Steering Committee Chair after the Organizing and Advisory Committee Chairs of COOL Chips conference series fully sponsored by the IEEE Computer Society. He is a Life Fellow of the IEEE.
Hiroaki Kobayashi is currently Professor and Special Adviser for Digital innovation to President of Tohoku University. In 1995, 1997-1998 and 2001-2002, he was Visiting Associate Professor of Stanford University (EE department and Computer Systems Lab.). In 2008-2016, he was Director of Cyberscience Center (Supercomputer Center) of Tohoku University. He is also an associate member of Science Council of Japan. His research interests include
– Power-Aware High-Performance Microprocessor Architectures
– Supercomputing systems and their applications
– Quantum-Classical Computing Hybrid Architectures and their applications
He was awarded the 2017 Minister of Education Award in the Field of Computer Science and the 2018 Minister of Education Award in the field of Science and Engineering.
Hideharu Amano received the Ph.D. degree in electric engineering from Keio University, Yokohama, Japan, in 1986. He is a Professor at Information and Computer Science, Keio University. His research interests include parallel architecture and reconfigurable computing.
Kunio Uchiyama, Invited Senior Researcher of AIST in Japan, is currently the head of the AI chip Design Center of AIST. He had worked for the Central Research Laboratory of Hitachi since 1978 and served as Corporate Officer and Chief Scientist. He is a Fellow of IEEE and IEICE. He had been a member of Board of Governors of the IEEE Computer Society from 2016 to 2021.
Makoto Ikeda received the B.S., M.S., and Ph.D. degrees in electronic engineering from the University of Tokyo, Tokyo, Japan, in 1991,1993, and 1996, respectively. He joined the Electronic Engineering Department, University of Tokyo, as a Faculty Member in 1996, and he is currently a full Professor with the Systems Design Lab, at the University of Tokyo. His research interests include the hardware security, including cryptographic engine design, asynchronous system design and smart image sensor designs. He served numerous positions in conferences including OC Chair (25 and 26), Program co-Chair (VI-24), and Program vice-chair (II-V) for COOL Chips, ISSCC 2021 ITPC Chair, Symposium on VLSI Circuits 2017 Progmra Chair and 2019 Symposium Chair, and A-SSCC 2025 Program Chair. Also he served IEEE SSCS Distinguished Lecturer in 2015 and 2016. He is a senior member of IEEE, IEICE Japan, and a member of IPSJ and ACM.
Special Sessions (invited lectures)
“Vortex: An open-source RISC-V based GPGPU accelerator”
Hyesoon Kim (Georgia Tech)
Abstract: Vortex is an open source Hardware and Software project to support GPGPU based on RISC-V ISA extensions. Currently Vortex supports OpenCL/CUDA and it runs on FPGA. The vortex platform is highly customizable and scalable with a complete open source compiler, driver and runtime software stack to enable research in GPU architectures/compiler/run-time systems. In this talk, I will present the vortex architecture/software stack.
Hyesoon Kim is an Associate chair in the School of Computer Science at the Georgia Institute of Technology and a co-director of center for novel computing hierarchy. Her research areas include the intersection of computer architectures and compilers, with an emphasis on heterogeneous architectures, such as GPUs and near-data-processing. She is a recipient of NSF Career award and is a member of Micro Hall of Fame. She is the chair of IEEE TCuARCH. Her research has been recognized with a best paper award at PACT 2015. She is an associate editor of Transactions on Architecture and Code Optimization and IEEE-CAL.
“The Parameter and Chip Wars: Shifting the Focus from Model-centric to Data-centric AI”
Vijay Janapa Reddi (Harvard University)
Abstract: In recent years, deep learning has revolutionized the field of artificial intelligence by providing a powerful tool for solving complex problems across various domains, from computer vision to natural language processing. Traditionally, deep learning has focused on developing complex models to solve challenging problems, resulting in intense competition in the form of parameter and chip wars to create more powerful hardware. Furthermore, the dramatic scaling at the individual model level has had significant ramifications at the system level, requiring management of the growing complexity surrounding AI systems. Despite these advancements, recent research has highlighted the significant impact of data quality and quantity on model capabilities and performance, revealing that improving data quality often leads to better results than designing more complex models. This finding has prompted a shift towards a data-centric approach, emphasizing the acquisition of high-quality data and the design of effective data engineering pipelines. This talk delves into the challenges and directions presented by the parameter and chip wars in deep learning, including recent developments in hardware and algorithms, and it suggests that a data-centric approach for systems may be a more viable approach to offset the scaling challenges posed by the parameter and chip wars.
Vijay Janapa Reddi is an Associate Professor at Harvard University, VP and a founding member of MLCommons (mlcommons.org), a nonprofit organization aiming to accelerate machine learning innovation for everyone. He also serves on the MLCommons board of directors and is a Co-Chair of the MLCommons Research organization. Before joining Harvard, he was an Associate Professor at The University of Texas at Austin in the Department of Electrical and Computer Engineering. His research sits at the intersection of machine learning, computer architecture and runtime software. He specializes in building computing systems for tiny IoT devices, as well as mobile and edge computing. Dr. Janapa-Reddi is a recipient of multiple honors and awards, including the National Academy of Engineering (NAE) Gilbreth Lecturer Honor (2016), IEEE TCCA Young Computer Architect Award (2016), Intel Early Career Award (2013), Google Faculty Research Awards (2012, 2013, 2015, 2017, 2020), Best Papers at the 2020 Design Automation Conference (DAC), 2005 International Symposium on Microarchitecture (MICRO), 2009 International Symposium on High-Performance Computer Architecture (HPCA), IEEE’s Top Picks in Computer Architecture awards (2006, 2010, 2011, 2016, 2017,2021). He has been inducted into the MICRO and HPCA Hall of Fame (in 2018 and 2019, respectively). He is passionate about widening access and education to Machine Learning for STEM, Diversity, Wildlife Conservation, and so forth. He designed the Tiny Machine Learning (TinyML) series on edX, a massive open online course (MOOC) that sits at the intersection of embedded systems and ML that thousands of global learners can access and audit free of cost. He was also responsible for the Austin Hands-on Computer Science (HaCS) that is deployed in the Austin Independent School District for K-12 CS education. Dr. Janapa-Reddi received a Ph.D. in computer science from Harvard University, an M.S. from the University of Colorado at Boulder and a B.S from Santa Clara University.