[pdf version is here](As of 2020-Mar-30)
“How to Uplift the World with “Memory””
Kenichi Mori (Kioxia)
Abstract: NAND flash memory expanded its market and application and changed our lifestyle by reducing the cost per bit ($/GB). Many innovations were introduced to keep the cost trend and to improve the performance. Now, as the next step, 5G and AI are changing our society and advanced computing systems is required for such high level computing.
Non-volatile memory is a key component to enable this paradigm shift. The challenges and opportunities of NAND flash and other emerging memories for next decades will be discussed.
Kenichi Mori is General Manager, System Technology Research & Development Center, Kioxia Corporation, which Toshiba Memory Corporation was renamed to on October 1, 2019. He joined Toshiba Corporation in 1990, and engaged in research and development of real-time computer graphics for home video game console system LSIs. After that, he was the software development leader of SpursEngine, which was a high-performance multi-core stream processor for media processing acceleration. He is now leading research and development of storage systems and controller hardware and software for non-volatile memory. He received M.E. in Electrical and Communication Engineering from Tohoku University in 1990.
“Reconfigurable Cloud Scale AI “
Aaron Smith (Microsoft)
Aaron Smith leads the machine learning compiler team in Microsoft Azure’s Cloud Systems group and is a Reader in Informatics at the University of Edinburgh. His research interests include optimizing compilers, computer architecture and cloud computing.
“An Extremely Quantized Deep Neural Network Accelerator for Edge Devices”
Hiroyuki Tokunaga (LeapMind)
Abstract: Deep learning greatly improved the performance in wide variety of research areas including image processing and audio processing. Although significant progress has been made in terms of accuracy, huge computation cost is still remaining as a big issue. Various ideas including quantization have been proposed to solve this issue. In terms of quantized neural network, quantization means a significant reduction of numerical precision of a computation, such as reducing from 32-bit floating point number to an 8-bit integer. The ultimate quantization is making the weight and activation to just 1 bit. It is known that the neural network works even in such an extreme case, although the accuracy is slightly lowered. Techniques for reducing the numeric precision smaller than 8 bits is called extremely low bit quantization or ultra low bit quantization. In this problem setting, the computation of the inner product of the two vectors can be replaced by several bit manipulations. In this talk, we will give an overview of a new accelerator IP dedicated to extremely quantized neural network. Also, how to realize the extremely low bit quantization will be presented.
Hiroyuki Tokunaga received his BS in mechanical science from Osaka University in 2005, and his MS in creative informatics from University of Tokyo in 2007. He worked as a software engineer for several companies including Preferred Infrastructure and SmartNews. Since 2018, he has worked as CTO at LeapMind. His recent interest is accelerating machine learning algorithms, especially deep neural networks from both of algorithm and implementation perspectives.
“Disruptive Evolutions: Technology Challenges and Countermeasures”
Shinichi Yoshioka (Renesas)
Abstract: Disruptive evolutions of digital transformation in industries and societies are invoking technology challenges to embedded processors and their solutions. After introducing key issues and criteria of those challenges in several market segments such as automotive, industry and infrastructure, countermeasure examples are to be explained showing how to address those issues clearing the criteria such as efficiencies (performance per power & cost), effective performance in real use cases, system robustness, easy-to-use/reuse SW and development environment. A real solution example with high-end processors and micro controllers for a mission critical application is introduced to summarize each technology explained in the presentation.
Shinichi Yoshioka serves as the Senior Vice President and Chief Technology Officer at Renesas. He was appointed this role in August 2019, from his experience and technological expertise of the products and the market following the years he has dedicated towards Renesas. He began his career in Hitachi, Ltd in 1986. After the joint venture of Hitachi and Mitsubishi Electric was established, he served different posts at Renesas Technology Corporation, later becoming Vice President and Head of System Solution Business Unit 2, followed by Vice President and Head of Mobile Multimedia SoC Business Division.
In 2010, he served as the Senior Executive Vice President and COO of the Renesas Mobile Corporation. Following the absorption type merger into Renesas Electronics Corporation, he has held many key roles such as Vice President and Head of Automotive Control and Analog & Power Systems Business Division in 2013, Vice President and Deputy General Manager of 1st Solution Business Unit and Head of Automotive Control and Analog & Power Systems Business Division in 2014, and Vice President and Deputy General Manager of 1st Solution Business Unit and Head of Safety Solution Business Division in 2016, and Senior Vice President and Deputy General Manager of the Automotive Solutions Business Unit in 2017. Prior to his appointment as the CTO, Mr. Yoshioka was also the Senior Vice President and Chief Technology Officer of Automotive Solution Business Unit in 2018, contributing to the growth of the Automotive Business for Renesas Electronics. Mr. Yoshioka has a Bachelor of Engineering degree in Applied Physics from the University of Tokyo and graduated from Stanford University with a Master of Science degree in Electrical Engineering.
“Virtualization for Non-volatile Memory Devices”
Takahiro Hirofuchi (AIST)
Abstract: Non-volatile memory (NVM) technologies, being accessible in the same manner as DRAM, are considered indispensable for expanding main memory capacities. For example, Intel Optane DCPMM is a long-awaited product that drastically increases main memory capacities. However, a substantial performance gap exists between DRAM and NVM. This performance gap in main memory presents a new challenge to researchers; we need new system software technologies efficiently supporting emerging hybrid memory architecture. In this talk, I first present RAMinate, a hypervisor-based virtualization mechanism for hybrid memory systems, and a key technology to address the performance gap in main memory systems. It provides great flexibility in memory management and maximizes the performance of virtual machines (VMs) by dynamically optimizing memory mappings. Through experiments, we confirmed that even though a VM has only 1% of DRAM in its RAM, the performance degradation of the VM was drastically alleviated by memory mapping optimization. This talk will also cover performance emulation of memory devices, which is indispensable for system software studies targeting future memory devices. We developed a new NVM emulation mechanism that is not only light-weight but also aware of a read/write latency gap in NVM-based main memory. The emulator accurately emulates write-latencies of NVM-based main memory: in our experiments, it emulated the NVM write latencies in a range from 200 ns to 1000 ns with negligible errors from 0.2% to 1.1%. If the time for this talk remains, I will also introduce our ongoing research project that aims at massively energy-efficient memory subsystems by developing approximate computing techniques throughout whole the memory hierarchy. This is joint work with spintronics researchers. We are rethinking the design of memory cells, computer hardware and software from the scratch.
Takahiro Hirofuchi is a senior research scientist of National Institute of Advanced Industrial Science and Technology (AIST) in Japan. He is working on system software technologies for emerging memory devices. He is the principal investigator of the research project supported by the JSPS Grant-in-Aid for Scientific Research (A), which aims at massively energy-efficient memory subsystems by developing approximate computing techniques throughout whole the memory hierarchy. He received the best paper award in ACM SoCC 2016 and the Yamashita SIG Research Award of Information Processing Society of Japan in 2014. He obtained a Ph.D. of engineering in 2007 at the Graduate School of Information Science of Nara Institute of Science and Technology (NAIST). He obtained the BS of Geophysics at Faculty of Science in Kyoto University in 2002. He is an expert of system software technologies, especially operating system and virtual machine. His research interests also cover computer architecture, FPGA and network technologies.
“Intel Optane™ Data Center Persistent Memory – A True Breakthrough to Break the Traditional Memory-Storage Technologies Barriers”
Jane Jianping Xu / Kaushik Balasubramanian (Intel)
Abstract: Intel OptaneTM DC persistent memory offers big and cost-effective persistent memory for all data center current and possibly future applications. With this new type of memory, a user will be able to accelerate data processing closer to CPUs with substantially lower latency in contrast with previously located on a NAND disk. The new Intel® Optane™ DC persistent memory redefines traditional architectures, offering a large and persistent memory tier at affordable cost. With breakthrough performance levels in memory intensive workloads, virtual machine density, and fast storage capacity Intel Optane™ DC persistent memory — combined 2nd gen Intel Xeon® Scalable processors—accelerates IT transformation to support the demands of the data era, with faster-than-ever-before analytics, cloud services, and next-generation communication services.
Jane Jianping Xu is a Principal Memory Architect in the Data Platforms Group (formerly known as Datacenter Group) at Intel Corporation. She joined Intel in 1999 as a research scientist in the Microprocessor Research Labs where her primary focus was on improving the performance with low power interconnects and memory interfaces. She has contributed to numbers of innovations for Intel, including FIVR, Silicon Photonics initiative in her early career. Her focus later transitioned to defining OptaneTM SSD in Storage Technology Group and enabling OptaneTM memory architecture in the Datacenter Group. She has published a number of papers in Hot Chips, ISSCC, JSSCC, HPCA, Physics Letters, and IEEE Photonics Technology, etc. She received her Ph.D. in electrical and computer engineering from Purdue University and holds 31 issued patents.
Kaushik Balasubramanian is a Principal Engineer in the Datacenter Platforms Group focusing on systems performance with Xeon and adjacency such as DCPMM. He has over 18 years of experience in design, validation, and architecture. He has a Masters in Electrical Engineering from Indian Institute of Technology, Mumbai, India.
Special Sessions (invited lectures)
“Using AI to Bridge the Gap Between AI Models and the Hardware of Today and Tomorrow”
Luis Ceze (University of Washington)
Abstract: There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms — such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) — requires significant manual effort. In this talk I will present our work on the TVM stack, which exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. To address threat of changes in algorithms, models, operators, or numerical systems threaten to the viability of specialized hardware accelerators, we developed VTA, a programmable deep learning architecture template tightly coupled to TVM. VTA achieves this flexibility via a parameterizable architecture, two-level ISA, and a JIT compiler. The TVM/VTA was incubated as an Apache Foundation project and is benefiting from a thriving community of developers. I will end the talk with ideas and possibilities for AI systems in a post-Moore’s law world, including using hybrid molecular electronic systems for similarity search.
Luis Ceze is a Professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington, Co-founder and CEO at OctoML, and Venture Partner at Madrona Venture Group. His research focuses on the intersection between computer architecture, programming languages, machine learning and biology. His current focus is on approximate computing for efficient machine learning and DNA-based data storage. He co-directs the Molecular Information Systems Lab (MISL), the Systems and Architectures for Machine Learning lab (SAML) and the Sampa Lab for HW/SW co-design. He has co-authored over 100 papers in these areas, and had several papers selected as IEEE Micro Top Picks and CACM Research Highlights. His research has been featured prominently in the media including New York Times, Popular Science, MIT Technology Review, Wall Street Journal, among others. He is a recipient of an NSF CAREER Award, a Sloan Research Fellowship, a Microsoft Research Faculty Fellowship, the IEEE TCCA Young Computer Architect Award and UIUC Distinguished Alumni Award.
“Evolving Hardware Security Landscape in the AI Era”
Guru Prasadh Venkataramani (The George Washington University)