The 4th International High Performance Computing Forum

Chengdu, China     May 16-17, 2019



  Group Photo


  Invited Speakers


Jack Dongarra

University Distinguished Professor
University of Tennessee
Innovative Computing Laboratory
Center for Information Technology Research
Distinguished Research Staff
Oak Ridge National Laboratory

  Jack Dongarra now holds an appointment as University Distinguished Professor of Computer Science in the Electrical Engineering and Computer Science Department at the University of Tennessee and holds the title of Distinguished Research Staff in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL); Turing Fellow at Manchester University; an Adjunct Professor in the Computer Science Department at Rice University. He is the director of the Innovative Computing Laboratory at the University of Tennessee. He is also the director of the Center for Information Technology Research at the University of Tennessee which coordinates and facilitates IT research efforts at the University. He specializes in numerical algorithms in linear algebra, parallel computing, the use of advanced-computer architectures, programming methodology, and tools for parallel computers. His research includes the development, testing and documentation of high quality mathematical software. He has contributed to the design and implementation of the following open source software packages and systems: EISPACK, LINPACK, the BLAS, LAPACK, ScaLAPACK, Netlib, PVM, MPI, NetSolve, Top500, ATLAS, and PAPI. He is a Fellow of the AAAS, ACM, IEEE, and SIAM and a foreign member of the Russian Academy of Sciences and a member of the US National Academy of Engineering.

TITLE: High Performance Computing and Big Data: Challenges for the Future

  We will look at the challenges and opportunities presented by the convergence of HPC, big data, and machine learning. We will discuss what is driving this convergence and what capabilities might it provide over the current scope/timescale of traditional HPC.


William Gropp

Director and Chief Scientist
National Center for Supercomputing Applications
Thomas M. Siebel Chair in Computer Science
University of Illinois Urbana-Champaign

  William Douglas "Bill" Gropp is the director of the National Center for Supercomputing Applications (NCSA) and the Thomas M. Siebel Chair in the Department of Computer Science at the University of Illinois at Urbana–Champaign. He is also the founding Director of the Parallel Computing Institute. Gropp helped to create the Message Passing Interface, also known as MPI, and the Portable, Extensible Toolkit for Scientific Computation, also known as PETSc. He is a Fellow of ACM, IEEE, and SIAM, and an elected member of the National Academy of Engineering

TITLE: Challenges in Intranode and Internode Programming for HPC Systems

  After over two decades of relative architectural stability for distributed memory parallel computers, the end of Dennard scaling and the looming end of Moore's "law" is forcing major changes in computing systems. To continue to provide increased performance computer architects are producing innovating new systems. This innovation is creating challenges for exascale systems that are different than the challenges for the extreme scale systems of the past. This talk discusses some of the issues in building a software ecosystem for extreme scale systems, with an emphasis on leveraging software for commodity elements while also providing the support needed by high performance applications.


Kai Lu

Deputy Dean
National University of Defense Technology
College of Computer

  Kai Lu received the Ph.D. degree from National University of Defense Technology, Changsha, China, in 1999. He is currently a professor and the Deputy Dean of College of Computer, National University of Defense Technology. His research interests include parallel and distributed system software, operating system, parallel tool suites and fault-tolerant computing technology.

TITLE: Challenges and Opportunities: the Exascale Road of Tianhe

Ying-Chih YANG

Lead Architect for the EPI
(European Processor Initiative) project.

  Lead Architect for the EPI project. He acted as adv. technology officer in the consumer digital division of STMicroelectronics France during 2015-2018. Before he was the head of engineering teams for settopbox chip product line in MediaTek-MStar, and CTO of Sunplus Technology in Taiwan.

TITLE: From EPI to ExaScale

European Processor Initiative (EPI) represents the ambition of Europe in HPC field. This joint project backed by EU H2020 framework brings together 23 project members from industrial and scientific domains. It aims to develop critical technologies and generate key components for next HPC systems with primary focus on EPI high-performance low-energy-consumption processor. In this talk we will present the project and the vision of the team.


Weifeng Liu

China University of Petroleum
Department of Computer Science and Technology

  Weifeng Liu is currently a Full Professor at the Department of Computer Science and Technology of the China University of Petroleum, Beijing, China. Formerly, he was a Marie Curie Fellow at the Department of Computer Science of the Norwegian University of Science and Technology, Norway. He received his Ph.D. in 2016 from University of Copenhagen, Denmark. Before he moved to Copenhagen, he has been working as a Senior Researcher in high performance computing technology at SINOPEC Exploration & Production Research Institute for about six years. He also has been shortly working as a Research Associate at STFC Rutherford Appleton Laboratory, UK. He received his B.E. degree and M.E. degree in computer science, both from China University of Petroleum, Beijing, in 2002 and 2006, respectively. His research interests include numerical linear algebra and parallel computing, particularly in designing parallel and scalable algorithms and data structures for sparse matrix computations on throughput-oriented architectures.

TITLE:  Sparse Matrix Computations: Scalability, Performance and Practicability

Sparse matrices exist in a number of computational problems in scientific and engineering. Researchers have been always looking for faster parallel algorithms for sparse matrix in the last decades. Recently, the emergence of massively parallel platforms has introduced more conflicts between scalability, performance and practicability. On one hand, such platforms require a large amount of fine-grained tasks to saturate their resources for high scalability and performance. On the other, the irregular structure of sparse matrix in practice makes such task partitioning difficult. This talk will discuss several conflicts between scalability, performance and practicability of existing sparse matrix research. Several key challenges in this area will be presented as well.


Longbing Cao

University of Technology Sydney
Advanced Analytics Institute (AAI)

  Longbing Cao was awarded a PhD in computing science at UTS and another PhD in Pattern Recognition and Intelligent Systems from Chinese Academy of Sciences. He is a professor of information technology at the Faculty of Engineering and IT, UTS; and the founding Director of the UTS Advanced Analytics Institute, UTS.
  Before joining UTS, Longbing had several years of research experience in Chinese Academy of Sciences, and working experience in managing and leading industry and commercial projects in telecommunications, banking and publishing, as a manager or chief technology officer. He was also the Research Leader of the Data Mining Program at the Australian Capital Markets Cooperative Research Centre.

TITLE: Data science challenges and prospects for high-performance computing and analytics

  Data science brings the science, technology, economy and society to a new era: the age of data and data-driven science, technology, economy and society. The era of data science faces significant challenges and opportunities. This talk will briefly highlight some of the significant challenges in deeply understanding and managing complex data, behavior and systems, illustrate X-complexities, X-intelligences, X-informatics, X-analytics, and X-opportunities in data science, and discuss possible opportunities for high-performance computing (HPC) and high-performance analytics (HPA). Real-life analytics and learning problems and applications will be illustrated to explain such prospects and opportunities for HPC and HPA for discussion.


Vladimir Voevodin

Lomonosov Moscow State University
Deputy Director
MSU Research Computing Center

  Vladimir Voevodin is is the Full Professor on specialization “Applied and system software, computers, computing systems and networks”, the Deputy Director from Research Computing Center of the Moscow State University and the Head of Supercomputers and Quantum Informatics Department, MSU CMC. His Specialization and scientific interests include: 1) Parallel and distributed computing, high-performance computing, supercomputing; 2) Computer architectures, supercomputers, performance and efficiency of computers; 3) Program optimization and fine tuning, program transformations, program analysis. And he is awarded with The Informatics Europe Curriculum Best Practices Award: “Parallellism&Concurrency” (2011), Doctor Honoris Causa of Open Siberian University (2011) and Honorary Scientist of Moscow State University (2009).

TITLE:  Unlocking the hidden potential of supercomputer centers

A modern supercomputer is a very complex and delicate tool that, without proper use and control, will work with extremely low outcome and over time the situation will only get more and more complicated. Tens of thousands of hardware and software components of the most different kinds that ensure the functioning of a supercomputer and the processing of a task flow require permanent attention and special approaches to designing supercomputer applications. In this situation, the total monitoring of the software and hardware environment, the use of the formal model of supercomputers, automatic analysis of all events and wide adoption of the parallel structure of algorithms are vital for the efficient operation of supercomputer systems. Precise description of the fine architectural features and a proper approach to supercomputing education are two components which help to unlock the true potential of high-performance systems, increasing their productivity not by percentages, but by times. This approach is implemented in Lomonosov Moscow State University and will be presented in the talk.


Yu Zhang

XIDIAN University
Shaanxi Key Laboratory of Large scale EM computing

  Dr. Yu Zhang received the B.S., M.S., and Ph.D. degrees from Xidian University, Xi’an, China, in 1999, 2002, and 2004, respectively. He joined Xidian University as a Faculty Member in June 2004. He was a Visiting Scholar and Adjunct Professor at Syracuse University from 2006 to 2009. He is the director of Shaanxi Key Laboratory of Large Scale Electromagnetic Computing. As Principal Investigator, he works on projects including the project of National Natural Science Foundation of China, the project of National High-tech R&D Program of China (863 Program) and so on. He authored five books, namely Parallel Computation in Electromagnetics (Xidian Univ. Press, 2006), Parallel Solution of Integral Equation-based EM Problems in the Frequency Domain (Wiley–IEEE, 2009), Time and Frequency Domain Solutions of EM Problems Using Integral Equations and a Hybrid Methodology (Wiley, 2010), Higher Order Basis Based Integral Equation Solver (HOBBIES) (Wiley, 2012), and Super-Large Scale Parallel Method of Moments in Computational Electromagnetics (Xidian Univ. Press, 2016), as well as more than 100 journal articles and 40 conference papers.

TITLE:  Large-Scale Electromagnetic Simulation on Supercomputers

  When large-scale and high-accuracy computations are of interest, electromagnetic simulators that rely on the full-wave CEM(computational electromagnetics) algorithms with high-performance computing technology might be the best solution for electrically large complicated systems. Therefore, we have carried out a series of researches on high-performance CEM algorithms on several supercomputers. Some progresses have been made on parallel CEM, including the efficient pivoting strategy for parallel LU Factorization of complex dense matrix generated from method of moments (MoM), large-scale parallel FEM domain decomposition method (FEM-DDM), efficient parallel strategy of MLFMA based on a truncated tree, and so on. The developed high-performance CEM codes have been deployed on supercomputers such as Tianhe-2, Sunway Taihu Light and the prototypes of China's new-generation exascale supercomputers, and served various industrial applications.


Bernd Mohr

Jülich Supercomputing Centre
Institute for Advanced Simulation
Forschungszentrum Jülich, Germany

  Dr. Bernd Mohr, Deputy Division Head "Application support" Jülich Supercomputing Centre, is developing tools for performance analysis of parallel programs since 1987. He is known for his work on the TAU, KOJAK, Score-P, and Scalasca tools. Since 1996 he is senior scientist at Forschungszentrum Jülich, Germany's largest multidisciplinary research center and the "Performance Optimization" team leader. Since 2007, he serves as deputy head for the division "Application support".

TITLE: Juelich's Modular Supercomputing Architecture

  Juelich Supercomputing Centre (JSC) is one of the three national supercomputing centres of Germany. For more than three decades now, it provides leadership-class high-performance computing systems to the scientific community in Germany and Europe, supports application developers to make efficient use of the systems, and develops highly-scalable HPC middleware and tools.
  In this talk, we discuss the development of cutting-edge supercomputing technology by JSC. It is illustrated with the supercomputer evolution experienced at JSC, with three unique architecture approaches developed in the centre in slightly over a decade, namely the Dual Supercomputer approach, the Cluster-Booster concept and the Modular Supercomputer architecture. They represent an evolution in the way HPC systems are built and operated, aiming at offering real-world applications the best possible computing platform.


Daniel A. Jacobson

Chief Scientist for Computational Systems Biology,
Oak Ridge National Laboratory, USA
Winner of 2018 ACM Gordon Bell Prize.

  Daniel’s lab focuses on the development and subsequent application of mathematical, statistical and computational methods to biological datasets in order to yield new insights into complex biological systems. Their approaches include the use of Network Theory and Topology Discovery/Clustering, Wavelet Theory, Machine & Deep Learning (amongst others: iterative Random Forests, Deep Neural Networks, etc.) and Linear Algebra (primarily as applied to large-scale multivariate modeling), together with traditional and more advanced computing architectures, such MPI parallelization and Apache Spark. They make use of various programming languages including C, Python, Perl, Scala and R. Areas of Statistics of particular interest to lab include the use of both frequentist (parametric and non-parametric) and Bayesian methods as well as the development of new methods for Genome-Wide Association Studies (GWAS) and Phenome-Wide Associations Studies (PheWAS).

TITLE: Exascale Biology: Supercomputing as an Engine for Discovery in Systems Biology

  The cost of generating biological data is dropping exponentially, resulting in increased data that has far outstripped the predictive growth in computational power from Moore’s Law. This flood of data has opened a new era of systems biology in which there are unprecedented opportunities to gain insights into complex biological systems. Integrated biological models need to capture the higher order complexity of the interactions among cellular components. Solving such complex combinatorial problems will give us extraordinary levels of understanding of biological systems. Paradoxically, understanding higher order sets of relationships among biological objects leads to a combinatorial explosion in the search space of biological data. These exponentially increasing volumes of data, combined with the desire to model more and more sophisticated sets of relationships within a cell and across an organism (or in some cases even ecosystems), have led to a need for computational resources and sophisticated algorithms that can make use of such datasets.


Xian-He Sun

Distinguished Professor of
Computer Science
,Illinois Institute of Technology
IEEE Fellow

  Dr. Xian-He Sun is the director of the Scalable Computing Software laboratory at IIT and a guest faculty in the Mathematics and Computer Science Division at the Argonne National Laboratory. Before joining IIT, he worked at DoE Ames National Laboratory, at ICASE, NASA Langley Research Center, at Louisiana State University, Baton Rouge, and was an ASEE fellow at Navy Research Laboratories. Dr. Sun is an IEEE fellow and is known for his memory-bounded speedup model, also called Sun-Ni’s Law, for scalable computing. His research interests include data-intensive high-performance computing, memory and I/O systems, software system for big data applications, and performance evaluation and optimization. He has over 250 publications and 6 patents in these areas. He is the Associate Chief Editor of IEEE Transactions on Parallel and Distributed Systems, a Golden Core member of the IEEE CS society, a former vice chair of the IEEE Technical Committee on Scalable Computing, the past chair of the Computer Science Department at IIT, and is serving and served on the editorial board of leading professional journals in the field of parallel processing. More information about Dr. Sun can be found at his web site

TITLE: Deep Memory-Storage Hierarchy for Big Data Applications

  High-performance computing (HPC) applications generate massive amounts of data. However, the performance improvement of disk-based storage systems has been much slower than that of memory, creating a significant I/O performance gap. To reduce the performance gap, storage subsystems are under extensive changes, adopting new technologies and adding more layers into the memory/storage hierarchy. With a deeper memory hierarchy, the data movement complexity of memory systems is increased significantly, making it harder to utilize the potential of the deep memory-storage hierarchy (DMSH) architecture. In this talk, we present the development of Hermes, an intelligent, multi-tiered, dynamic, and distributed I/O caching system that utilizes DMSH to significantly accelerate I/O performance. Hermes is a US NSF supported large software development project. It extends HPC I/O stacks to integrated memory and parallel I/O systems, extends the widely used Data Format (HDF) and HDF5 library to understand users’ need and to achieve application-aware optimization in a DMSH environment, and extends caching systems to support vertical and horizontal non-inclusive caching in a distributed parallel I/O environment. We will introduce the Hermes’ design and implementation; discuss its uniqueness and challenges; and present some initial implementation results.


Scott Lathrop

Education, Outreach &
Training Technical Program Manager

National Center for Supercomputing Applications, Blue Waters Team, US

  Scott Lathrop currently works at the National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign. Scott does research in Computer Architecture, Parallel Computing and Distributed Computing. Their most recent publication is 'Building a Community of Practice to Prepare the HPC Workforce'.

TITLE: Enhancing HPC Education through International Collaboration

  The session will address the challenges and opportunities for enhancing HPC Education. There is a critical global need to better prepare researchers to utilize computational and data-enabled tools, resources, and methods. The computational researcher community can be best served by producing research findings are verified, validated and reproducible. The international community is working together to share best practices, lessons learned, and quality materials to prepare the workforce to advance discovery. Attendees will learn about the activities of the ACM SIGHPC Education chapter of the ACM SIGHPC special interest group to enhance HPC education and training. The presenter will describe the Chapter goals and activities, and use this session to encourage conference attendees to take advantage of the resources and services provided by the Chapter, to participate in the Chapter activities, and to become members of a growing international community of practice focused on HPC education. Information about the Chapter is available at


Vassil Alexandrov

Chief Science Officer
Hartree Centre, Science &
Technology Facilities Council, UK

  Vassil Alexandrov was appointed in 2019 as Chief Science Officer at Hartree Centre-STFC. Previously he was an ICREA Research Professor in Computational Science and Extreme Computing group leader at Barcelona Supercomputing Center. He has before that held positions at the University of Liverpool, UK, the University of Reading, UK as the Director of the ACET Centre and a Professor in Computational Science, School of Systems Engineering, and Monterrey Institute of Technology and Higher Education (ITESM), Mexico as a Distinguished Visiting Professor (Jan 2015-Jan 2018). He holds an MSc degree in Applied Mathematics from Moscow State University, and a PhD degree from Bulgarian Academy of Sciences.
  His research interests are in the area of scalable mathematical methods and algorithms with focus on extreme scale computing and methods and algorithms for discovering global properties on data. He has significant experience in stochastic modelling, stochastic methods (Monte Carlo methods and algorithms, etc.) and hybrid stochastic/deterministic methods and algorithms.

TITLE:  Hartree Centre Experience in Addressing Industrial, Societal and Scientific Challenges

  The Hartree Centre is transforming UK industry through High Performance Computing, Big Data and cognitive technologies. As such Hartree Centre performs transformative research and development addressing key industrial, societal and scientific challenges. Backed by over £170 million of government funding and significant strategic partnerships with organisations such as IBM, Atos, Intel and recent MoU with Turing Institute, the Hartree Centre is home to some of the most technically advanced high performance computing, data analytics, machine learning technologies in UK and is applying those to diverse applications with high industrial, societal and scientific impact. The talk gives examples of the work being undertaken


Mitsuhisa Sato

Deputy Director
RIKEN Center for Computational Science

  Mitsuhisa Sato received the M.S. degree and the Ph.D. degree in information science from the University of Tokyo in 1984 and 1990. He was a senior researcher at Electrotechnical Laboratory from 1991 to 1996, and a chief of Parallel and distributed system performance laboratory in Real World Computing Partnership, Japan, from 1996 to 2001. Currently, he is a professor of Center for Computational Sciences, University of Tsukuba, and is appointed to a team leader of the programming environment research team in Advanced Institute for Computational Science (AICS), RIKEN since 2010. He has been working as a director of Center for computational sciences, University of Tsukuba from 2007 to 2013. His research interests include computer architecture, compilers and performance evaluation for parallel computer systems, OpenMP and parallel programming. Dr. Sato is a member of IEEE CS and IPSJ, IEICE, JSIAM.

TITLE:  The Post-K system and Arm-SVE enabled A64FX processor for energy-efficiency and sustained application performance

  The post-K is the successor of the Japanese flagship supercomputer, K. RIKEN and Fujitsu have developed a new Arm-SVE enabled processor, called A64FX, for the Post=K system. The processor is designed for energy-efficiency and sustained application performance. The system will be installed in the next year. In this talk, the features and some preliminary performance of the post-K system will be presented, as well as the schedule of the project and supported software.


Wenguang Chen

Tsinghua University

  Wenguang Chen is a professor in Department of Computer Science and Technology, Tsinghua University. His research interest is in parallel and distributed systems and programming systems. He received the Bachelor’s and Ph.D. degrees in computer science from Tsinghua University in 1995 and 2000 respectively. Before joining Tsinghua in 2003, he was the CTO of Opportunity International Inc. He was appointed as the associate head of Department of Computer Science and Technology from 2007 to 2014. He has published over 50 papers in international conferences and journals like Supercomputing, EuroSys, USENIX ATC, OOPSLA, ICSE. He is a distinguished member and distinguished speaker of CCF( China Computer Foundation). He is an ACM member, vice chair of ACM China Council, and Editor-in-Chief of Communications of ACM( China Edition). He serves in the program committee of many conferences, such as PLDI 2012, PPoPP 2013、2014, SC 2015, ASPLOS 2016, CGO 2014、2016, IPDPS 2011, CCGrid 2014, ICPP 2009、2010、2011、2015,APSYS 2011、2013、2015. He received the Distinguished Young Scholar Award of Natural Science Foundation in 2015.

TITLE:  ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds

  Graphs are an important abstraction used in many scientific fields. With the magnitude of graph-structured data constantly increasing, effective data analytics requires efficient and scalable graph processing systems. Although HPC systems have long been used for scientific computing, people have only recently started to assess their potential for graph processing, a workload with inherent load imbalance, lack of locality, and access irregularity. We propose ShenTu, the first general- purpose graph processing framework that can efficiently utilize an entire Petascale system to process multi-trillion edge graphs in seconds. ShenTu embodies four key innovations: hardware spe- cialization, supernode routing, on-chip sorting, and degree-aware messaging, which together enable its unprecedented performance and scalability. It can traverse a record-size 70-trillion-edge graph in seconds. ShenTu enables the processing of internet scale web graphs and shows potential for brain simulation.