I am currently a principal researcher (资深算法研究员) at SenseTime Smart City Group (SCG). Before that, I was a principal researcher (主任研究员) at Huawei Noah's Ark Lab. I focus on bringing the intellegent decision-making techniques to the real-world. I am specifically interested in Reinforcement Learning (RL), Multi-Agent Reinforcement Learning (MARL) and Large Language Model (LLM). I serve as the (senior) program-committee-member of top AI conferences like NeurIPS/ICML and AAAI/IJCAI. I also serve as the executive-committee-member of CCF-AI-MAS (中国计算机学会-人工智能专委会-多智能体学组).

I am looking for self-motivated interns. If you are interested, please do not hesitate to drop me an email: hy.mao@pku.edu.cn.

Education Experience

Ph.D. @ PKU
Sept. 2015 - Jul. 2020

  • School of Electronics Engneering and Computer Science
  • Institute of Network Computing and Information Systems
  • Rank 1st among all students in our institute
  • Research Focus: Multi-Agent Reinforcement Learning
  • Advisor: Prof. Zhen Xiao
  • "Excellent Graduate" by PKU
  • "Excellent Doctoral Thesis Award" by China Computer Federation (CCF)
  • B.S. @ NUAA
    Sept. 2011 - Jul. 2015

  • School of Computer Science and Technology
  • Department of Software Engineering
  • Rank 1st among all students in our department
  • "National Scholarship" by NUAA
  • Selected Work Experience

    Principal Researcher (资深算法研究员)
    Dec. 2022 - now

  • Time: May 2023 - now.
  • Task: LLM-based AI Agent for Complex Task Automation.
  • Note: We are woking on this. If interested, drop me an email.

  • Time: Dec. 2020 - now.
  • Task: RL-based Traffic Signal Control.
  • Note: We are woking on this. If interested, drop me an email.
  • Senior, then promoted to Principal Researcher (高级,主任研究员)
    Jul. 2020 - Dec. 2022

  • Time: Jul. 2020 - Sept. 2022.
  • Task: RL-based Chip Test Case Generation.
  • My Role: Project Manager (Project ID: 9433586).
  • Result: Achieve 100% coverage rate automatically for the first time and save 90% test cases at the same time in real systems.
  • Award: First prize of "Innovation Pioneer President Award" (Top 2/500).
  • Award: 2012 STAR.
  • Award: STAR of HISILICON.
  • Award: "Excellent Project" in Noah's Ark Lab in 2022 (Top 5/50+).

  • Time: Nov. 2021 - Sept. 2022.
  • Task: RL-based Automatic Driving Optimization.
  • My Role: Team Member.
  • Result: Achieve similar performance as commercial methods.

  • Time: Apr. 2021 - Aug. 2021.
  • Task: RL-based Large Language Model Parameter Optimization.
  • My Role: Team Leader.
  • Result: Improve the success rate of the GPT-based dialog system from 90.4% to 94.8%, which is better than the best public score 93.0%.

  • Time: Jul. 2020 - Feb. 2021.
  • Task: NeurIPS MineRL Competition.
  • My Role: Team Leader.
  • Result: Rank 1st among 90+ teams and 700+ users (39.55 v.s 13.29).
  • Award: First Place Award.
  • Research Intern
    Jun. 2019 - Oct. 2019

  • Task: MARL-based Wi-Fi Parameter Optimization.
  • My Role: Team Leader.
  • Result: Control large-scale agents for the first time.
  • Award: Third prize of "Innovation Pioneer President Award" (Top 13/500).
  • Award: "Excellent Intern" (Top 10/200).
  • School-Enterprise Cooperation
    Jan. 2017 - Jan. 2019

  • Task: MARL-based Network Traffic Control.
  • My Role: Team Leader.
  • Result: Improve baselines by 30%+.
  • Award: Granted 1,000,000-CNY and "Excellent Project".
  • Selected Publication

    1. Google Scholar: https://scholar.google.com/citations?user=EtVHsgcAAAAJ
      * means equal contribution.
    1. Mingzhe Xing, Hangyu Mao, Shenglin Yin, Lichen Pang, Zhengchao Zhang, Zhen Xiao, and Jieyi Long. A Dual-Agent Scheduler for Distributed Deep Learning Jobs on Public Cloud via Reinforcement Learning. KDD 2023 (CCF-A).
    2. 郝建业, 邵坤, 李凯, 李栋, 毛航宇, 胡舒悦, 王震. 博弈智能的研究与应用. 中国科学:信息科学 2023.
    3. Jianye Hao, Xiaotian Hao, Hangyu Mao, Weixun Wang, Yaodong Yang, Dong Li, Yan Zheng, and Zhen Wang. Boosting Multiagent Reinforcement Learning via Permutation Invariant and Permutation Equivariant Networks. ICLR 2023.
    4. Ming Yan, Junjie Chen*, Hangyu Mao*, Jiajun Jiang, Jianye Hao, Xingjian Li, Zhao Tian, Zhichao Chen, Dong Li, Zhangkong Xian, Yanwei Guo, Wulong Liu, Bin Wang, Yuefeng Sun, and Yongshun Cui. Achieving Last-Mile Functional Coverage in Testing Chip Design Software Implementations. ICSE 2023 (CCF-A).
    5. Xianjie Zhang, Yu Liu, Hangyu Mao, and Chao Yu. Common Belief Multi-Agent Reinforcement Learning Based on Variational Recurrent Models. Neurocomputing 2022 (CCF-C).
    6. Wenhan Huang, Kai Li, Kun Shao, Tianze Zhou, Matthew E. Taylor, Jun Luo, Dongge Wang, Hangyu Mao, Jianye Hao, Jun Wang, and Xiaotie Deng. Multiagent Q-learning with Sub-Team Coordination. NeurIPS 2022 (CCF-A).
    7. Lichen Pan, Jun Qian, Wei Xia, Hangyu Mao, Jun Yao, PengZe Li, and Zhen Xiao. Optimizing Communication in Deep Reinforcement Learning with XingTian. Middleware 2022 (CCF-B).
    8. Jinpeng Li, Guangyong Chen, Hangyu Mao, Danruo Deng, Dong Li, Jianye Hao, Qi Dou, and Pheng-Ann Heng. Flat-aware Cross-stage Distilled Framework for Imbalanced Medical Image Classification. MICCAI 2022 (CCF-B). Provisional Accept Recommendation (Top 13%).
    9. Mingzhe Xing, Hangyu Mao, and Zhen Xiao. Fast and Fine-grained Autoscaler for Streaming Jobs with Reinforcement Learning. IJCAI 2022 (CCF-A).
    10. Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, and Li Wang. What About Inputing Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator. AAAI 2022 (CCF-A).
    11. Wenhan Huang, Kai Li, Kun Shao, Tianze Zhou, Jun Luo, Dongge Wang, Hangyu Mao, Jianye Hao, Jun Wang, and Xiaotie Deng. Multiagent Q-learning with Sub-Team Coordination. Extended Abstract at AAMAS 2022 (CCF-B).
    12. Hangyu Mao*, Chao Wang*, Xiaotian Hao*, Yihuan Mao*, Yiming Lu*, Chengjie Wu*, Jianye Hao, Dong Li, and Pingzhong Tang. SEIHAI: A Sample-Efficient Hierarchical AI for the MineRL Competition. DAI 2021. Champion Solution for NeurIPS20 MineRL Competition (Top 1 among 90+ teams).
    13. Tianpei Yang*, Weixun Wang*, Hongyao Tang*, Jianye Hao, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Chengwei Zhang, Yujing Hu, Yingfeng Chen, and Changjie Fan. An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning. NeurIPS 2021 (CCF-A).
    14. Xianjie Zhang, Yu Liu, Xiujuan Xu, Qiong Huang, Hangyu Mao, and Anil Carie. Structural Relational Inference Actor-Critic for Multi-Agent Reinforcement Learning. Neurocomputing 2021 (CCF-C).
    15. Changmin Yu, Dong Li, Hangyu Mao, Jianye Hao, and Neil Burgess. Learning State Representations via Temporal Cycle-Consistency Constraint in Model-Based Reinforcement Learning. SSL-RL Workshop at ICLR 2021.
    16. Guss William Hebgen, ..., Hangyu Mao, ..., et al. Towards Robust and Domain Agnostic Reinforcement Learning Competitions: MineRL 2020. NeurIPS 2020 Competition and Demonstration Track (CCF-A).
    17. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong, and Yan Ni. Learning Multi-Agent Communication with Double Attentional Deep Reinforcement Learning. JAAMAS 2020 (CCF-B).
    18. Hangyu Mao, Wulong Liu, Jianye Hao, Jun Luo, Dong Li, Zhengchao Zhang, Jun Wang, and Zhen Xiao. Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning. AAAI 2020 (CCF-A). Long Oral Presentation (Top 5%).
    19. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong, and Yan Ni. Learning Agent Communication under Limited Bandwidth by Message Pruning. AAAI 2020 (CCF-A).
    20. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, and Zhibo Gong. Modelling the Dynamic Joint Policy of Teammates with Attention Multi-Agent DDPG. AAMAS 2019 (CCF-B).
    21. Hangyu Mao, Yang Xiao, Yuan Wang, Jiakang Wang, and Zhen Xiao. Topic-Specific Retweet Count Ranking for Weibo. PAKDD 2018 (CCF-C).
    22. Yuan Wang, Hangyu Mao, and Zhen Xiao. Identifying Influential Users’ Professions via the Microblogs They Forward. SocInf Workshop at IJCAI 2017 (CCF-A).
    23. Yang Xiao, Yuan Wang, Hangyu Mao, and Zhen Xiao. Predicting Restaurant Consumption Level through Social Media Footprints. COLING 2016 (CCF-B).
    1. Jingqing Ruan*, Yihong Chen*, Bin Zhang*, Zhiwei Xu*, Tianpeng Bao*, Guoqing Du*, Shiwei Shi*, Hangyu Mao*, Xingyu Zeng, and Rui Zhao. TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents. Arxiv 2023.
    2. Jingqing Ruan, Xiaotian Hao, Dong Li, and Hangyu Mao. Learning to Collaborate by Grouping: a Consensus-oriented Strategy for Multi-agent Reinforcement Learning. Arxiv 2023.
    3. Bin Zhang, Hangyu Mao, Lijuan Li, Zhiwei Xu, Dapeng Li, Rui Zhao, and Guoliang Fan. Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems. Arxiv 2023.
    4. Hangyu Mao, Rui Zhao, Hao Chen, Jianye Hao, Yiqun Chen, Dong Li, Junge Zhang, and Zhen Xiao. Transformer in Transformer as Backbone for Deep Reinforcement Learning. Arxiv 2022.
    5. Yiqun Chen, Hangyu Mao, Tianle Zhang, Shiguang Wu, Bin Zhang, Jianye Hao, Dong Li, Bin Wang, and Hongxing Chang. PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning. Arxiv 2022.
    6. Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, and Jianye Hao. Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment. Arxiv 2021.
    7. Hangyu Mao, Zhibo Gong, and Zhen Xiao. Reward Design in Cooperative Multi-Agent Reinforcement Learning for Packet Routing. Arxiv 2020.
    8. Hangyu Mao, Zhibo Gong, Zhengchao Zhang, Zhen Xiao, and Yan Ni. Learning Multi-Agent Communication under Limited-Bandwidth Restriction for Internet Packet Routing. Arxiv 2019.
    9. Hangyu Mao, Zhibo Gong, Yan Ni, and Zhen Xiao. Actor-Coordinator-Critic Net for "Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning. Arxiv 2017.
    1. Hangyu Mao, Wulong Liu, and Jianye Hao. Agent Training Method, Apparatus, and Computer-readable Storage Medium. U.S. Application No. 17877063, issued on Nov. 17, 2022. US Patent.
    2. 毛航宇、郭艳伟、冼章孔。确定芯片测试的输入的方法和装置。专利号:202210626411.6。已集成到实际产品。
    3. 陈俊洁、毛航宇、郝建业、孙月凤、姜佳君。一种芯片的测试用例生成方法、装置及存储介质。专利号:202111663515.6。已集成到实际产品。2022年4月被评为华为“潜在高价值”专利。
    4. 张正超、肖臻、毛航宇、潘丽晨。一种基于深度强化学习的集群资源管理和任务调度方法及系统。专利号:202010581407.3。
    5. 潘丽晨、毛航宇、肖臻、张正超。基于多智能体深度强化学习的集群资源调度方法及系统。专利号:202010322543.0。
    6. 毛航宇、刘武龙、郝建业。训练智能体的方法和装置。专利号:202010077714.8。已集成到实际产品。已申请国际专利。
    7. 毛航宇、张正超、肖臻、倪炎、龚志波。流量调度方法及装置。专利号:201811505121.6。
    8. 李本超、毛航宇、肖阳、肖臻。一种网络流量监测方法及网络设备。专利号:201710681276.4。
    9. 肖阳、陈凯、李本超、毛航宇、肖臻。一种多路径流量发送的方法及装置。专利号:201610915269.1。


  • Conference PC/SPC Member:
  • AAAI: 2021 (Top 25%), 2022.
  • IJCAI: 2020, 2021 (SPC), 2022, 2023 (SPC).
  • NeurIPS: 2022, 2023.
  • ICML: 2022, 2023.
  • Other Conferences: ECAI 2020, 2023; CoRL 2020, 2021, 2022, 2023; KDD 2023; etc.
  • Journal Reviewer: IEEE Transactions on Cybernetics, IEEE Transactions on Communications, IEEE Transactions on Games, IEEE/CCA Journal of Automatica Sinica (自动化学报英文版), etc.
  • Student Party Branch: I served as the secretary (or committee member) of the Student Party Branch from 2012 to 2019. We held many activities such as "the Introduction of Web 2.0" and "the Introduction of AI". I was awarded "the Outstanding Communist" by the School of EECS, PKU in June 2018.
  • Daily Pastime: I enjoy running, reading, badminton, rope-skipping, swimming, and skating. I accomplished the Beijing Half-marathon in 2016 and 2017.
  • English Skill: I passed the College English Test-4 (CET-4) with a score of 531 in Jun. 2012, and CET-6 with a score of 506 in Dec. 2012. However, my spoken English is not so good.