About me

I am currently a Research Assistant Professor at the Department of Computing, The Hong Kong Polytechnic University. I am also a member of Internet and Mobile Computing Laboratory (IMCL).

I received my Ph.D. degree from the Department of Computer Science, The University of Hong Kong, and very fortunate to work with my supervisor Professor Cho-Li Wang. I received my Bachelor’s degree in Computer Science from Xi’an Jiaotong University.

Email: zhaorui.zhang@polyu.edu.hk

Address: PQ748, Mong Man Wai Building, PolyU

Research Interests:

AI Infrastructure, MLSys, LLMs:

Large-Scale Model (LLMs) Training, Fine-Tuning, Checkpointing, and Inference Optimization. I am broadly interested in the building and optimization of AI systems (MLSys) from both sides of the system and machine learning algorithms based on a wide range of computing platforms (e.g., distributed, cloud, HPC, IoT, AIoT, and even quantum and photonic platforms) for emerging big data and AI applications, including distributed communication optimization, data compression, fault-tolerance, etc.

HPC, Distributed Systems, Cloud Computing:

I am also interested in high-performance computing (HPC), distributed systems, data reduction, cloud computing, fault tolerance, and FPGA.


If you are interested in large-scale models (LLMs) fine-tuning and inference optimization, we are actively looking for motivated colleagues/students at different levels (Postdoc, PhD, MSc, Undergraduate, etc.) to reach out and join us!


29 Jan. 2025, Invited to serve as the Technical Program Commitee of SC’ 25, “ The International Conference for High-Performance Computing, Networking, Storage and Analysis “.

26 Nov. 2024, Our paper “StoreLLM: Energy Efficient Large Language Model Inference with Permanently Pre-stored Attention Matrices” has been accepted by ACM e-Energy 2025. The 16th ACM International Conference on Future and Sustainable Energy Systems. Rotterdam, Netherlands, June 17 - 20, 2025

18 July 2024, Our paper “A Compiler-Like Framework for Optimizing Cryptographic Big Integer Multiplication on GPUs” has been accepted by Micro’ 2024. 57th IEEE/ACM International Symposium on Microarchitecture. November 2 – November 6, 2024. Austin, Texas, USA

14 June 2024, Our paper “Versatile Datapath Soft Error Detection on the Cheap for HPC Applications” has been accepted by SC’2024 The International Conference for High-Performance Computing, Networking, Storage and Analysis. Atlanta, GA, USA, NOV 17–22

17 April 2024, Our paper “FedFa: A Fully Asynchronous Training Paradigm for Federated Learning” has been accepted by IJCAI’ 2024 International Joint Conference on Artificial Intelligence. Jeju 03.08.24 - 09.08.24

31 Jan. 2024, Our paper “An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression” has been accepted by IPDPS’ 2024 38th IEEE International Parallel & Distributed Processing Symposium. May 27-31, 2024, San Francisco, California USA

29 Nov. 2023, Our paper “Accelerating High-Precision Integer Multiplication used in Cryptosystems with GPUs” has been accepted by PPoPP’ 2024, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2024 March 2nd – March 6th, 2024, Edinburgh, UK


[ACM e-Energy 2025] Dan Wang, Boan Liu, Rui Lu, Zhaorui Zhang, Shuntao Zhu, StoreLLM: Energy Efficient Large Language Model Inference with Permanently Pre-stored Attention Matrices.

[Micro’ 2024] Zhuoran Ji, Jianyu Zhao, Zhaorui Zhang, Jiming Xu, Shoumeng Yan, Lei Ju, A Compiler-Like Framework for Optimizing Cryptographic Big Integer Multiplication on GPUs. (CCF-A Conference for Computer Architecture)

[SC’2024] Yafan Huang, Sheng Di, Zhaorui Zhang, Xiaoyi Lu, Guanpeng Li, Versatile Datapath Soft Error Detection on the Cheap for HPC Applications. (CCF-A Conference for High-Performance Computing)

[IJCAI’2024] Haotian Xu, Zhaorui Zhang, Sheng Di, Benben Liu, Alharthi Khalid, Jiannong Cao, FedFa: A Fully Asynchronous Training Paradigm for Federated Learning. (CCF-A Conference for AI Systems)

[IPDPS’2024] Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang, Jinyang Liu, Ken Raffenetti, Hui Zhou, Kai Zhao, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur, An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression. (CCF-B Conference for Distributed and Parallel Systems)

[PPoPP’2024] Zhuoran Ji, Zhaorui Zhang, Jiming Xu, Lei Ju, Accelerating High-Precision Integer Multiplication used in Cryptosystems with GPUs. (CCF-A Conference for Distributed and Parallel Systems)

Zhaorui Zhang, Efficient Parameter Update Strategy for Distributed Deep Learning Systems, HKU Theses Online (HKUTO), 2021.

[TPDS] Zhaorui Zhang, Cho-Li Wang, MIPD: An Adaptive Gradient Sparsification Framework for Distributed DNNs Training, IEEE Transactions on Parallel and Distributed Systems, Special Section on Parallel and Distributed Computing Techniques for AI, ML, and DL, 2022.

[TPDS] Zhaorui Zhang, Cho-Li Wang, SaPus: Self-Adaptive Parameter Update Strategy for DNN Training on Multi-GPU Clusters, IEEE Transactions on Parallel and Distributed Systems, 2021, directly accepted by the first round of review.

[JPDC] Zhaorui Zhang, Zhuoran Ji, Cho-Li Wang, Momentum-Driven Adaptive Synchronization Model for Distributed DNN Training on HPC Clusters, Journal of Parallel and Distributed Computing, 2021.

Xuebin Chi, Liping Liu, Yangang Wang, Zhaorui Zhang, etc., Development Report on National High-Performance Computing Environment, Book, published by Science Press, 2018.

Zhaorui Zhang, Xin Y, Liu B, Li WXY, Lee K.H., Ng C.F., Stoyanov D, Cheung RCC, Kwok KW, FPGA-based High-Performance Collision Detection: An Enabling Technique for Image-Guided Robotic Surgery, Frontiers in Robotics and AI, August 2016

[TCBB] Yao Xin, Will X. Y. Li, Zhaorui Zhang, Ray C. C. Cheung, Dong Song, Theodore W. Berger, An Application Specific Instruction Set Processor (ASIP) for Adaptive Filters in Neural Prosthetics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2015.

Professional Services:

Conference Program Committee:

SC’ 25: The International Conference for High-Performance Computing, Networking, Storage and Analysis, SC’ 25

IEEE Cloud Summit’ 25: IEEE Cloud Summit’ 25

ICLR’ 25: The Thirteenth International Conference on Learning Representations, ICLR’25.

IEEE Cluster’24: IEEE International Conference on Cluster Computing

HiPC’24: IEEE International Conference on High-Performance Computing, Data, and Analytics

SSDBM’24: International Conference on Scientific and Statistical Database Management


IEEE Transactions on Parallel and Distributed Systems (TPDS)

IEEE Transactions on Computers (TC)

IEEE/ACM Transactions on Networking (TON)

IEEE Transactions on Mobile Computing (TMC)

IEEE Transactions on Consumer Electronics

Academic Employment Experiences:

The Hong Kong Polytechnic University

Research Assistant Professor in the Department of Computing

Professional Collaboration:

  1. HKUST: X-GPU cluster, computing resources.
  2. China National Grid: helps to maintain the high-performance computing platform and publishes a book as a co-author, collaborating with the Computer Network Information Center of the Chinese Academy of Science.
  3. AWS (public cloud): computing resources, help to invite the AWS’s technique staff to give a talk at HKU.

Teaching Experiences:

COMP4442: Service and Cloud Computing, Department of Computing, The Hong Kong Polytechnic University, 2023, 2024, 2025 Spring Semester

COMP7104: The Introduction of the Linux Operating System, Lecturer, The University of Hong Kong

COMP7305: Cloud and Cluster Computing, Teaching Assistant, in the Department of Computer Science, The University of Hong Kong

COMP8301: Advanced Topics in Computer Systems, Teaching Assistant, in the Department of Computer Science, The University of Hong Kong

COMP9301: Systems Design and Implementation, Teaching Assistant, in the Department of Computer Science, The University of Hong Kong


Updated by Oct. 2022