Call for Participation

[pdf version is here](As of 2022-03-25)

Keynote Presentations

“Heterogeneity with oneAPI: A Play in two acts”

Joseph Curley and Timothy Mattson (Intel)

Abstract:  Heterogenous computing promises high performance with optimized energy consumption. In Act 1 programmers write code with one API which our system maps onto any single processor from a range of processor-types. We’ll show, for example, a single codebase moving between CPUs and GPUs. In Act 2 we look to the future where many heterogeneous processors work together to solve a single problem.  We still focus on one API mapping onto many processor-types, but now we run parallel programs over distributed heterogeneous systems. Our name for this amazing system… oneAPI of course.

Joseph (Joe) Curley serves Intel Corporation as Vice President and General Manager of Software Products and Ecosystem in Intel’s Software and Advanced Technology Group. His primary responsibilities include the oneAPI industry initiative, product management of developer and foundational software, and supporting the oneAPI developer ecosystem. Mr. Curley joined Intel Corporation in 2007, and has served in multiple other strategic planning, ecosystem development, and business leadership roles. Prior to joining Intel, Joe worked at Dell, Inc. leading the global workstation product line, the consumer and small business desktop product line, and in a series of engineering roles. He began his career at computer graphics pioneer Tseng Labs.

Tim Mattson is a parallel programmer obsessed with every variety of science (Ph.D. Chemistry, UCSC, 1985). He is a senior principal engineer in Intel’s parallel computing lab. Tim has been with Intel since 1993 and has worked with brilliant people on great projects including: (1) the first TFLOP computer (ASCI Red), (2) MPI, OpenMP and OpenCL, (3) two different research processors (Intel’s TFLOP chip and the 48 core SCC), (4) Data management systems (Polystore systems and Array-based storage engines), and (5) the GraphBLAS API for expressing graph algorithms as sparse linear algebra. Tim has over 150 publications including five books on different aspects of parallel computing, the latest (Published November 2019) titled “The OpenMP Common Core: making OpenMP Simple Again”.


“Software-Defined Architecture and platforms – automotive and beyond”

Masaki Gondo (eSOL)

Abstract:  The embedded system industries, notably the automotive but not limited to, are all talking about “Software-Defined.” They typically mean “Software-Defined Architecture (SDA)” but the meaning is quite vague. In this talk, we start by reviewing the different use of the term in the industry. Then, using the next-generation automotive software platform based on AUTOSAR Adaptive as an example, we look into the background of SDA trends, and the design approach of such platforms that needs to tackle the challenge of developing highly-energy efficient, high-performance, safe, and yet affordable systems. Also, the business and market implications of such trends and approaches will be discussed briefly if the time allows.

Masaki Gondo is SEVP/CTO/Head of Software Division  at eSOL, the company that provides POSIX/AUTOSAR/TRON RTOS, various embedded software development tools and full-stack engineering services. He has more than 25 years of experience in the field of OS architecture and related technologies for use in a wide range of embedded system applications including automotive, industrial, and electronic appliances. In the last decade he is working on a scalable heterogenous-multi-manycore OS called eMCOS, application parallelization tools eMBP, domain-knowledge-based-machine-learned driver models eBRAD, Scrum development eWeaver, and functional safety. He also acts as an architect of AUTOSAR Adaptive Platform specification, IEEE Std. 2804 SHIM WG Chair, Vice-chair of Embedded Multicore Consortium, Chief Architect for AUBASS, visiting research fellow at Advanced Multicore Processor Research Institute at Waseda University, among others.



“The IBM Telum enterprise-class processor”

Christian Jacobi  (IBM)

Abstract:  The IBM Telum Processor powers scalable performance for enterprise workloads, embedded AI acceleration, and industry leading security and RAS. In this presentation, learn how Telum balances high performance and high frequency design, power efficiency, and reliability using a multi-faceted approach including micro-architecture, advanced power and thermal management and optimization of processor and system design. Telum-based systems will grow total workload capacity, provide new functionality like real-time embedded AI inferencing, and improve power efficiency compared to prior-generation IBM Z systems.

Christian Jacobi is a Distinguished Engineer and Chief Architect for microprocessors at IBM with over 20 years of experience in IBM Z and IBM POWER systems. Christian is highly experienced in micro-architecture, logic design, performance analysis, verification, development and hardware and software co-optimization. Dr. Jacobi was the Chief Architect for the z15 microprocessor and is currently working on next generation IBM Z systems.



“NanoBridge-based FPGA for Space Applications”

Makoto Miyamura  (NanoBridge Semiconductor)

Abstract:  We demonstrate FPGA based on NanoBridge, a novel resistive-change switch. NanoBridge, which is integrated in the back end of line (BEOL), features a high ON/OFF conductance ratio, weak temperature dependence of its resistance, non-volatility, and endurance against soft errors. In place of SRAM and a pass transistor, NanoBridge is utilized as a configuration switch in the FPGA. In this presentation, we will report the NanoBridge-based FPGA (NBFPGA) for applications in low-power and harsh environments, such as space application. Under JAXA’s innovative satellite technology demonstration program, NBFPGA was mounted on a camera module and installed in RAPIS-1 (RAPid Innovative payload demonstration Satelite-1). Successful one-your operation in space, soft error reliability of Nanobridge device, and low-power operation will also be reported.

Makoto Miyamura received the B.S. and M.S. degrees in Electronic Engineering from the University of Tokyo, Japan, in 2000 and 2002, respectively. After joining NEC Corporation, Japan, he worked on the research of the random variability in highly-scaled MOSFETs and low power CMOS process with high-k gate dielectric. From 2008 to 2009, he was a visiting researcher at Stanford University, CA. He is currently working on the low-power FPGA design and its application at NanoBridge Semiconductor, Inc.



“Xilinx 7nm Edge Processors”

Juanjo Noguera   (Xilinx)

Abstract:  In this presentation, Xilinx will provide an overview of the 7nm Versal architecture, focussing on Edge applications. This talk with introduce the key target use-cases focusing on AI/ML inference applications at the edge, give a device-level Versal architecture overview and provide details on the second-generation AIE architecture (AIE-ML). We’ll provide an overview of the programing abstractions for this products, and finally demonstrate some system-level performance/W comparison to different architectures.

Juanjo Noguera is member of Xilinx silicon architecture group responsible for the AIE architecture. Juanjo joined the Xilinx CTO office back in 2006, and then moved to the product engineering organization to productize the AIE architecture. Before joining Xilinx, Juanjo worked for Hewlett-Packard. Juanjo holds a PhD in Computer Architecture from the Technical University of Catalunya (UPC, Barcelona, Spain). Juanjo has published multitude of research technical papers in international conferences and been granted more than twenty patent applications.




“Universal Chiplet Interconnect Express (UCIe): Poised to change the Compute Landscape”

Debendra Das Sharma (Intel Corporation)

Abstract:  High-performance workloads demand on-package integration of heterogeneous processing units, on-package memory, and communication infrastructure to meet the demands of the emerging compute landscape. Applications such as artificial intelligence, machine learning, data analytics, 5G, automotive, and high-performance computing are driving these demands to meet the needs of cloud computing, intelligent edge, and client computing infrastructure. On-package interconnects are a critical component to deliver the power-efficient performance with the right feature set in this evolving landscape.  Universal Chiplet Interconnect Express (UCIe), is an open industry standard with a fully specified stack that comprehends plug-and-play interoperability of chiplets on a package; similar to the seamless interoperability on board with well-established and successful off-package interconnect standards such PCI Express®, Universal Serial Bus (USB)®, and Compute Express Link (CXL)®. In this talk, we will discuss the usages and key metrics associated with different technology choices in UCIe. We will also delve into the different layers as well as the software model associated with UCIe along with the compliance and interoperability mechanisms. We will also discuss how this open standard could potentially evolve to incorporate additional usage models in the future.

Debendra Das Sharma is an Intel Senior Fellow in the Data Platforms and Artificial Intelligence Group and Chief Architect of the I/O Technology and Standards at Intel Corporation. He is a leading expert on I/O subsystem and interface architecture. Das Sharma’s team delivers Intel-wide critical interconnect technologies in Peripheral Component Interconnect Express (PCIe), Compute Express Link (CXL), Universal Chiplet Interconnect Express (UCIe), and Intel’s Coherency interconnect. He is a key driver of external standards for PCIe, CXL, and UCIe, and internal proprietary interfaces, as well as implementation.  Das Sharma joined Intel in 2001 as a technical lead in the Advanced Components Division, designing server chipsets. He previously worked with Hewlett-Packard, where he led development of their server chipsets. He holds 148 US patents. He is a frequent keynote speaker, plenary speaker, distinguished lecturer, invited speaker, and panelist at the Hot Interconnects, SNIA SDC, PCI-SIG Developers Conference, CXL consortium, Open Server Summit, Open Fabrics Alliance, Flash Memory Summit, Intel Innovation, and Intel Developer Forum.  Das Sharma is a member of the Board of Directors for the PCI Special Interest Group (PCI-SIG) and a lead contributor to PCIe specifications since its inception. He is a co-inventor and founding member of the CXL consortium and co-leads the CXL Technical Task Force. He co-invented the chiplet interconnect standard UCIe and is the chair of the UCIe consortium.  Das Sharma has a bachelor’s in technology (with honors) degree in Computer Science and Engineering from the Indian Institute of Technology, Kharagpur and a Ph.D. in Computer Engineering from the University of Massachusetts, Amherst. He has been awarded the Distinguished Alumnus Award from Indian Institute of Technology, Kharagpur in 2019. He has also been awarded the 2021 IEEE Region 6 Outstanding Engineer Award.


“RISC-V-based parallel processor IP with vector extension for embedded systems” 

Shotaro Shintani  (NSITEXE)

Abstract:  There are a variety of controls in automobiles, and the amount of processing is continuously on the increase due to the more complex algorithms. Moreover, the progress of autonomous driving has been remarkable in recent years, and the performance requirements have increased dramatically in addition to the diversification of processing. There are also traditional requirements in automotive systems, such as hard real-time performance, functional safety, power efficiency, software portability, and so on. On the other hand, there is the problem that software development will become more complex and less portable when several processors or accelerators are adopted to meet the automotive requirements. In order to solve it, we develop new processors with enough performance and flexibility in a single architecture that can handle the various types of processing for autonomous driving while also satisfying the traditional automotive requirements such as functional safety. In this presentation, our first-generation Data Flow Processor (DFP), DR1000C, is introduced, and it is targeted at safety-critical systems and delivers the real-time performance and high-throughput data processing required for microcontrollers for embedded systems. DR1000C is a RISC-V-based multiple instruction stream, multiple data stream (MIMD) processor with a vector processing unit. It also supports a hardware multi-threading mechanism that allows up to 16 threads which simultaneously share the vector processing unit, resulting in highly efficient resource use. Furthermore, DR1000C has the necessary functional safety modules and meets ISO 26262 ASIL B to D safety requirements without additional external special safety mechanisms. Our evaluation results show that DR1000C can efficiently process control algorithms such as Model Predictive Control (MPC). DR1000C is also the world’s first RISC-V processor with vector extensions which has achieved ISO 26262 ASIL D Ready certification. Lastly, our product roadmap and portfolio are referred to for addressing the area of autonomous driving, which is the target of the next-generation DFP.


Shotaro Shintani is a Project Assistant Manager in Semiconductor IP R&D Unit of NSITEXE Inc. His specialty is processor microarchitecture and design, and he is engaged in developing RISC-V processor cores including out-of-order superscalar processors and vector processors. He received a master’s degree from the University of Tokyo and started working for Yokohama Research Laboratory, Hitachi, Ltd in 2015. There he researched hardware architecture and hardware accelerator for enterprise storage systems. After joining Sony Corporation in 2017 as a vision processor engineer, he was engaged in developing CPU subsystems in Intelligent Vision Sensors with AI Processing Functionality. In 2019, he joined NSITEXE, Inc. as a processor development engineer.


Invited presentations

“AMD Ryzen 6000 Series Processor”

Jim Gibney   (AMD)

Abstract:  AMD will present the AMD Ryzen™ 6000 Series processor for laptops, bringing the new “Zen 3+” core architecture together with AMD RDNA™ 2 architecture-based on-chip graphics. Fabricated using TSMC’s 6nm process technology, the processor SoC delivers higher single and multi-threaded CPU performance than the last generation, new levels of built-in graphics performance, new platform features and long battery life. The updated AMD “Zen 3+” core is optimized to deliver high frequency and performance-per-watt.  The SoC features up to eight high-performance cores, delivering 16 threads of processing, with clock speeds up to 5 GHz. These are the first notebook processors to feature RDNA™ 2 architecture-based built-in graphics, with performance up to 2X of the last generation.  The SoC includes an all-new integrated display engine, allowing for ultra-high resolutions and refresh rates. New power management features include deep sleep states that save power and greatly extend battery life. The platform is entirely new with the SoC including support for DDR5 memory, PCIe® 4.0, USB4, and AI noise cancellation.  This is the first x86 processor to fully support advanced Windows® 11 security features with the integrated Microsoft® Pluton security processor.

Jim Gibney is a Fellow SoC architect at AMD working on next generation client processors. He received a Bachelor of Computer Engineering and MSEE from Georgia Tech in 1996. He worked at Digital/Compaq on AlphaServer chipsets until 2003, when he joined ATI to work on gaming console graphics processors. His interest area is memory system architecture for high performance notebook processors.



Panel Discussion

Topics “The future of Mission-critical, mixed-criticality high-performance embedded systems. “

Organizer and Moderator:
           Masaki Gondo   (eSOL)
           Christian Jacobi  (IBM)
           Timothy Mattson (Intel)
           Makoto Miyamura  (NanoBridge Semiconductor)
          Juanjo Noguera (Xilinx)

Abstract:  This panel will discuss the future of Mission-critical, mixed-criticality high-performance embedded systems, expected in automotive and other vehicle systems, industrial devices, and other edge computing focusing more on the system software that becomes the gateway/window for the next generation hardware to the software system. We will discuss the technological and business challenges, with keywords like “Open Standards”, “Safety”, “Software-Defined”, “Parallel processing”, and others. The goal is to hopefully grasp the landscape by sharing the insights from experts in different fields and feel the practical, implementable strategic direction of system software and the hardware for future high-performance mission-critical systems.


Masaki Gondo is SEVP/CTO/Head of Software Division at eSOL, the company that provides POSIX/AUTOSAR/TRON RTOS, various embedded software development tools and full-stack engineering services. He has more than 25 years of experience in the field of OS architecture and related technologies for use in a wide range of embedded system applications including automotive, industrial, and electronic appliances. In the last decade he is working on a scalable heterogenous-multi-manycore OS called eMCOS, application parallelization tools eMBP, domain-knowledge-based-machine-learned driver models eBRAD, Scrum development eWeaver, and functional safety. He also acts as an architect of AUTOSAR Adaptive Platform specification, IEEE Std. 2804 SHIM WG Chair, Vice-chair of Embedded Multicore Consortium, Chief Architect for AUBASS, visiting research fellow at Advanced Multicore Processor Research Institute at Waseda University, among others.


Special Sessions (invited lectures)

“Closing the Gap between Quantum Algorithms and Machines with Hardware-Software Co-Design”

Fred Chong  (University of Chicago and

Abstract:  Quantum computing is at an inflection point, where 127-qubit (quantum bit) machines are deployed, and 1000-qubit machines are perhaps only a few years away.  These machines have the potential to fundamentally change our concept of what is computable and demonstrate practical applications in areas such as quantum chemistry, optimization, and quantum simulation.  Yet a significant resource gap remains between practical quantum algorithms and real machines.  A promising approach to closing this gap is to design architectural interfaces that selectively expose to programming languages and compilers some of the key physical properties of emerging quantum technologies.  I will describe some of our recent work that focuses on techniques that break traditional abstractions, including compiling programs directly to analog control pulses, adapting programs for machine variations, computing with ternary quantum bits, efficient error correction using 2.5D structures, and designing just enough tunability in qubits to avoid crosstalk.

Fred Chong is the Seymour Goodman Professor in the Department of Computer Science at the University of Chicago and the Chief Scientist at He is also Lead Principal Investigator for the EPiQC Project (Enabling Practical-scale Quantum Computing), an NSF Expedition in Computing. Chong received his Ph.D. from MIT in 1996 and was a faculty member and Chancellor’s fellow at UC Davis from 1997-2005. He was also a Professor of Computer Science, Director of Computer Engineering, and Director of the Greenscale Center for Energy-Efficient Computing at UCSB from 2005-2015. He is a recipient of the NSF CAREER award, the Intel Outstanding Researcher Award, and 12 best paper awards.


“Computer Architecture Challenges for the Security of Persistent Memory”

Yan Solihin (University of Central Florida)

Abstract:  Non-volatile or persistent memory (PM) is increasingly integrated into the main memory of computer systems. PM allows programmers to keep persistent data in data structures in memory instead of files in storage. Such structures can be wrapped into containers that we refer to as Persistent Memory Objects (PMO). While PMO allows fine-grain access at low latency to persistent data, it also presents several challenges. Among these challenges are: (1) new abstraction is needed to define its use, access, and sharing, (2) security vulnerabilities that arise from keeping them in memory instead of files, and (3) how it affects memory encryption and integrity verification.  In this talk, I will give a broad landscape of computer architecture challenges with PM and discuss our approaches in solving these problems.

Yan Solihin is a (Interim) Department Chair of Computer Science, Charles N. Millican Professor, and Director for Cybersecurity and Privacy Cluster at the University of Central Florida. Prior to joining UCF, he was a Program Director at the National Science Foundation, with responsibilities in managing the Computer Systems Research (CSR) cluster, Scalable Parallelism in the eXtreme (SPX), and Secure and Trustworthy Cyberspace (SaTC), among others. He co-founded NSF/Intel Partnership on Foundational Microarchitecture Research (FoMR) and SPX. He was a Professor of Electrical and Computer Engineering at NCSU from 2002 to 2018. His past research contributions to Secure Execution Environment include split counter mode encryption, Bonsai Merkle Tree, self-encrypted memory, and ObfusMem, among others. In 2017, he was elected IEEE Fellow “for contributions to shared cache hierarchies and secure processors.”   He obtained his B.S. degree in computer science from Institut Teknologi Bandung in 1995, B.S. degree in Mathematics from Universitas Terbuka Indonesia in 1995, M.A.Sc degree in computer engineering from Nanyang Technological University in 1997, and M.S. and Ph.D. degrees in computer science from the University of Illinois at Urbana-Champaign in 1999 and 2002. He is a recipient of 2010 and 2005 IBM Faculty Partnership Award, 2004 NSF Faculty Early Career Award, and 1997 AT&T Leadership Award. He is listed in the ISCA and HPCA Hall of Fame. His research interests include computer architecture, memory hierarchy design, non-volatile memory architecture, programming models, and workload cloning. He has published more than 70 papers in computer architecture and performance modeling, and authored 40+ patents. He has released several software packages to the public: ACAPP – a cache performance model toolset, HeapServer – a secure heap management library, Scaltool – parallel program scalability pinpointer, and Fodex – a forensic document examination toolset. He has written two graduate-level textbooks, including Fundamentals of Parallel Multicore Architecture, CRC Press, 2015.