I am currently a principal researcher (资深算法研究员) at SenseTime. Before that, I was a principal researcher (主任研究员) at Huawei Noah's Ark Lab. I focus on bringing the intellegent decision-making techniques to the real-world. I am specifically interested in Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL). I serve as the (senior) program-committee-member of top AI conferences like NeurIPS/ICML and AAAI/IJCAI.

I am looking for self-motivated interns. If you are interested, please do not hesitate to drop me an email: hy.mao@pku.edu.cn.

Education Experience

Ph.D. @ PKU
Sept. 2015 - Jul. 2020

  • School of Electronics Engneering and Computer Science
  • Institute of Network Computing and Information Systems
  • Rank 1st among all students in our institute
  • Research Focus: Multi-Agent Reinforcement Learning
  • Advisor: Prof. Zhen Xiao
  • "Excellent Graduate" by PKU
  • "Excellent Doctoral Thesis Award" by China Computer Federation (CCF)
  • B.S. @ NUAA
    Sept. 2011 - Jul. 2015

  • School of Computer Science and Technology
  • Department of Software Engineering
  • Rank 1st among all students in our department
  • "National Scholarship" by NUAA
  • Selected Work Experience

    Principal Researcher (资深算法研究员)
    Dec. 2022 - now

  • Task: coming soon.
  • My Role: coming soon.
  • Result: coming soon.
  • Award: coming soon.
  • Senior, then promoted to Principal Researcher (高级,主任研究员)
    Jul. 2020 - Dec. 2022

  • Time: Jul. 2020 - Sept. 2022.
  • Task: RL-based Chip Test Case Generation.
  • My Role: Project Manager (Project ID: 9433586).
  • Result: Achieve 100% coverage rate automatically for the first time and save 90% test cases at the same time.
  • Award: First prize of "Innovation Pioneer President Award" (Top 2/500).
  • Award: 2012 STAR.
  • Award: STAR of HISILICON.
  • Award: "Excellent Project" in Noah's Ark Lab in 2022 (Top 5/50+).

  • Time: Nov. 2021 - Sept. 2022.
  • Task: RL-based Automatic Driving Optimization.
  • My Role: Team Member.
  • Result: Achieve similar performance as commercial methods.

  • Time: Apr. 2021 - Aug. 2021.
  • Task: RL-based Big Model Parameter Optimization.
  • My Role: Team Leader.
  • Result: Improve the success rate of the GPT-based dialog system from 90.4% to 94.8%, which is better than the best public score 93.0%.

  • Time: Jul. 2020 - Feb. 2021.
  • Task: NeurIPS MineRL Competition.
  • My Role: Team Leader.
  • Result: Rank 1st among 90+ teams and 700+ users (39.55 v.s 13.29).
  • Award: First Place Award.
  • Research Intern
    Jun. 2019 - Oct. 2019

  • Task: MARL-based Wi-Fi Parameter Optimization.
  • My Role: Team Leader.
  • Result: Control large-scale agents for the first time.
  • Award: Third prize of "Innovation Pioneer President Award" (Top 13/500).
  • Award: "Excellent Intern" (Top 10/200).
  • School-Enterprise Cooperation
    Jan. 2017 - Jan. 2019

  • Task: MARL-based Network Traffic Control.
  • My Role: Team Leader.
  • Result: Improve baselines by 30%+.
  • Award: Granted 1,000,000-CNY and "Excellent Project".
  • Selected Publication

    1. Google Scholar: https://scholar.google.com/citations?user=EtVHsgcAAAAJ
    1. Jianye Hao, Xiaotian Hao, Hangyu Mao, Weixun Wang, Yaodong Yang, Dong Li, Yan Zheng, and Zhen Wang. Breaking the Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation Framework. Submit to ICLR 2023 (深度学习领域国际顶会, Rating: 6/6/6/8).
    2. Ming Yan, Junjie Chen*, Hangyu Mao*, Jiajun Jiang, Jianye Hao, Xingjian Li, Zhao Tian, Zhichao Chen, Dong Li, Zhangkong Xian, Yanwei Guo, Wulong Liu, Bin Wang, Yuefeng Sun, and Yongshun Cui. Achieving Last-Mile Functional Coverage in Testing Chip Design Software Implementations. ICSE 2023 (CCF-A).
    3. Xianjie Zhang, Yu Liu, Hangyu Mao, and Chao Yu. Common Belief Multi-Agent Reinforcement Learning Based on Variational Recurrent Models. Neurocomputing 2022 (CCF-C).
    4. Wenhan Huang, Kai Li, Kun Shao, Tianze Zhou, Matthew E. Taylor, Jun Luo, Dongge Wang, Hangyu Mao, Jianye Hao, Jun Wang, and Xiaotie Deng. Multiagent Q-learning with Sub-Team Coordination. NeurIPS 2022 (CCF-A).
    5. Lichen Pan, Jun Qian, Wei Xia, Hangyu Mao, Jun Yao, PengZe Li, and Zhen Xiao. Optimizing Communication in Deep Reinforcement Learning with XingTian. Middleware 2022 (CCF-B).
    6. Jinpeng Li, Guangyong Chen, Hangyu Mao, Danruo Deng, Dong Li, Jianye Hao, Qi Dou, and Pheng-Ann Heng. Flat-aware Cross-stage Distilled Framework for Imbalanced Medical Image Classification. MICCAI 2022 (CCF-B). Provisional Accept Recommendation (Top 13%).
    7. Mingzhe Xing, Hangyu Mao, and Zhen Xiao. Fast and Fine-grained Autoscaler for Streaming Jobs with Reinforcement Learning. IJCAI 2022 (CCF-A).
    8. Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, and Li Wang. What About Inputing Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator. AAAI 2022 (CCF-A).
    9. Wenhan Huang, Kai Li, Kun Shao, Tianze Zhou, Jun Luo, Dongge Wang, Hangyu Mao, Jianye Hao, Jun Wang, and Xiaotie Deng. Multiagent Q-learning with Sub-Team Coordination. Extended Abstract at AAMAS 2022 (CCF-B).
    10. Hangyu Mao*, Chao Wang*, Xiaotian Hao*, Yihuan Mao*, Yiming Lu*, Chengjie Wu*, Jianye Hao, Dong Li, and Pingzhong Tang. SEIHAI: A Sample-Efficient Hierarchical AI for the MineRL Competition. DAI 2021 (多智能体领域中国顶会). Champion Solution for NeurIPS20 MineRL Competition (Top 1 among 90+ teams).
    11. Tianpei Yang*, Weixun Wang*, Hongyao Tang*, Jianye Hao, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Chengwei Zhang, Yujing Hu, Yingfeng Chen, and Changjie Fan. An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning. NeurIPS 2021 (CCF-A).
    12. Xianjie Zhang, Yu Liu, Xiujuan Xu, Qiong Huang, Hangyu Mao, and Anil Carie. Structural Relational Inference Actor-Critic for Multi-Agent Reinforcement Learning. Neurocomputing 2021 (CCF-C).
    13. Changmin Yu, Dong Li, Hangyu Mao, Jianye Hao, and Neil Burgess. Learning State Representations via Temporal Cycle-Consistency Constraint in Model-Based Reinforcement Learning. SSL-RL Workshop at ICLR 2021 (深度学习领域国际顶会).
    14. Guss William Hebgen, ..., Hangyu Mao, ..., et al. Towards Robust and Domain Agnostic Reinforcement Learning Competitions: MineRL 2020. NeurIPS 2020 Competition and Demonstration Track (CCF-A).
    15. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong, and Yan Ni. Learning Multi-Agent Communication with Double Attentional Deep Reinforcement Learning. JAAMAS 2020 (CCF-B).
    16. Hangyu Mao, Wulong Liu, Jianye Hao, Jun Luo, Dong Li, Zhengchao Zhang, Jun Wang, and Zhen Xiao. Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning. AAAI 2020 (CCF-A). Long Oral Presentation (Top 5%).
    17. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong, and Yan Ni. Learning Agent Communication under Limited Bandwidth by Message Pruning. AAAI 2020 (CCF-A).
    18. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, and Zhibo Gong. Modelling the Dynamic Joint Policy of Teammates with Attention Multi-Agent DDPG. AAMAS 2019 (CCF-B).
    19. Hangyu Mao, Yang Xiao, Yuan Wang, Jiakang Wang, and Zhen Xiao. Topic-Specific Retweet Count Ranking for Weibo. PAKDD 2018 (CCF-C).
    20. Yuan Wang, Hangyu Mao, and Zhen Xiao. Identifying Influential Users’ Professions via the Microblogs They Forward. SocInf Workshop at IJCAI 2017 (CCF-A).
    21. Yang Xiao, Yuan Wang, Hangyu Mao, and Zhen Xiao. Predicting Restaurant Consumption Level through Social Media Footprints. COLING 2016 (CCF-B).
    1. Yiqun Chen, Hangyu Mao, Tianle Zhang, Shiguang Wu, Bin Zhang, Jianye Hao, Dong Li, Bin Wang, and Hongxing Chang. PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning. Arxiv 2022.
    2. Xiaotian Hao, Hangyu Mao, Weixun Wang, Yaodong Yang, Dong Li, Yan Zheng, Zhen Wang, and Jianye Hao. Breaking the Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation Framework. Arxiv 2022.
    3. Hangyu Mao, Zhibo Gong, and Zhen Xiao. Reward Design in Cooperative Multi-Agent Reinforcement Learning for Packet Routing. Arxiv 2020.
    4. Hangyu Mao, Zhibo Gong, Zhengchao Zhang, Zhen Xiao, and Yan Ni. Learning Multi-Agent Communication under Limited-Bandwidth Restriction for Internet Packet Routing. Arxiv 2019.
    5. Hangyu Mao, Zhibo Gong, Yan Ni, and Zhen Xiao. Actor-Coordinator-Critic Net for "Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning. Arxiv 2017.
    1. Hangyu Mao, Wulong Liu, and Jianye Hao. Agent Training Method, Apparatus, and Computer-readable Storage Medium. U.S. Application No. 17877063, issued on Nov. 17, 2022. US Patent.
    2. 毛航宇、郭艳伟、冼章孔。确定芯片测试的输入的方法和装置。专利号:202210626411.6。已集成到实际产品。
    3. 陈俊洁、毛航宇、郝建业、孙月凤、姜佳君。一种芯片的测试用例生成方法、装置及存储介质。专利号:202111663515.6。已集成到实际产品。2022年4月被评为华为“潜在高价值”专利。
    4. 张正超、肖臻、毛航宇、潘丽晨。一种基于深度强化学习的集群资源管理和任务调度方法及系统。专利号:202010581407.3。
    5. 潘丽晨、毛航宇、肖臻、张正超。基于多智能体深度强化学习的集群资源调度方法及系统。专利号:202010322543.0。
    6. 毛航宇、刘武龙、郝建业。训练智能体的方法和装置。专利号:202010077714.8。已集成到实际产品。已申请国际专利。
    7. 毛航宇、张正超、肖臻、倪炎、龚志波。流量调度方法及装置。专利号:201811505121.6。
    8. 李本超、毛航宇、肖阳、肖臻。一种网络流量监测方法及网络设备。专利号:201710681276.4。
    9. 肖阳、陈凯、李本超、毛航宇、肖臻。一种多路径流量发送的方法及装置。专利号:201610915269.1。


  • Conference PC/SPC Member:
  • AAAI: 2021 (Top 25%), 2022.
  • IJCAI: 2020, 2021 (SPC), 2022, 2023 (SPC).
  • NeurIPS: 2022.
  • ICML: 2022, 2023.
  • ECAI: 2020.
  • CoRL: 2020, 2021, 2022.
  • Journal Reviewer: IEEE Transactions on Cybernetics, IEEE Transactions on Communications, IEEE/CCA Journal of Automatica Sinica (自动化学报英文版), etc.
  • Student Party Branch: I served as the secretary (or committee member) of the Student Party Branch from 2012 to 2019. We held many activities such as "the Introduction of Web 2.0" and "the Introduction of AI". I was awarded "the Outstanding Communist" by the School of EECS, PKU in June 2018.
  • Daily Pastime: I enjoy running, reading, badminton, rope-skipping, swimming, and skating. I accomplished the Beijing Half-marathon in 2016 and 2017.
  • English Skill: I passed the College English Test-4 (CET-4) with a score of 531 in Jun. 2012, and CET-6 with a score of 506 in Dec. 2012. However, my spoken English is not so good.