I am currently a principal researcher at Large Model Group, SenseTime. Before that, I was a principal researcher at Noah's Ark Lab, Huawei. I focus on bringing the intellegent decision-making techniques to the real-world. I am specifically interested in Reinforcement Learning (RL), Multi-Agent Reinforcement Learning (MARL) and Large Language Model (LLM). I serve as the (senior) program-committee-member of top AI conferences like NeurIPS/ICML and AAAI/IJCAI, and the executive-committee-member of CCF-AI-MAS.

I am looking for self-motivated interns. If interested, please drop me an email: hy.mao@pku.edu.cn.

毛航宇目前就职于商汤科技大模型事业部,担任资深研究员,负责强化学习和大语言模型等技术的预研及应用,培养杰出个人和未来之星各一名(整个集团只有7个名额),并且获得“研究开发部门年度优秀项目奖”。加入商汤之前,毛航宇就职于华为诺亚方舟实验室,担任主任研究员,以第一作者身份获得华为“创新先锋总裁奖”、国际人工智能竞赛NeurIPS20-MineRL冠军。加入华为之前,毛航宇在北京大学攻读博士学位,获得北京大学和北京市“优秀毕业生”、中国计算机学会多智能体学组“多智能体研究优秀博士论文奖”。毛航宇致力于推进智能决策技术在真实世界的落地应用,发表相关论文和专利40余篇,并且连续多年担任国际人工智能会议如AAAI/IJCAI/NeurIPS/ICML/ICLR的(高级)程序委员会成员、中国计算机学会多智能体学组的执行委员会成员。

Education Experience

Ph.D. @ PKU
Sept. 2015 - Jul. 2020

  • School of Electronics Engneering and Computer Science
  • Institute of Network Computing and Information Systems
  • Rank 1st among all students in our institute
  • Research Focus: Multi-Agent Reinforcement Learning
  • Advisor: Prof. Zhen Xiao
  • "Excellent Graduate" by PKU
  • "Excellent Doctoral Thesis Award" by China Computer Federation (CCF)
  • B.S. @ NUAA
    Sept. 2011 - Jul. 2015

  • School of Computer Science and Technology
  • Department of Software Engineering
  • Rank 1st among all students in our department
  • "National Scholarship" by NUAA
  • Selected Work Experience

    Principal Researcher (资深算法研究员)
    Dec. 2022 - now

  • 在算法研究、项目支撑之外,我还参与商汤奖学金评审、研究开发部门的立项和结项评审等事情。下面是我自己在推动的项目:

  • Time: May 2023 - now.
  • Task: LLM-based AI Agent for Complex Task Automation.
  • My Role: Project Manager.
  • Result: Improve the operation efficiency by 35%-70% in BAIFUZHANG - a commercial security system.
  • Result: (Up to 03/2024) Best for Text-to-SQL on Spider Leaderboard.
  • Award: "R&D Excellent Project" in 2023 (Top 7/50+).
  • Award: 培养1名杰出个人和1名未来之星(整个集团只有7个名额).

  • Time: Dec. 2022 - May 2023.
  • Task: RL-based Traffic Signal Control.
  • My Role: Technology Leader.
  • Result: Decrease ~10% travel time in real-world scenarios (at Shaoxing City, Zhejiang Province, China).
  • Senior, then promoted to Principal Researcher (高级,主任研究员)
    Jul. 2020 - Dec. 2022

  • Time: Jul. 2020 - Sept. 2022.
  • Task: RL-based Chip Test Case Generation.
  • My Role: Project Manager (Project ID: 9433586).
  • Result: Achieve 100% coverage rate automatically for the first time and save 90% test cases at the same time in real systems.
  • Award: First prize of "Innovation Pioneer President Award" (Top 2/500).
  • Award: 2012 STAR.
  • Award: STAR of HISILICON.
  • Award: "Excellent Project" in Noah's Ark Lab in 2022 (Top 5/50+).

  • Time: Apr. 2021 - Aug. 2021.
  • Task: RL-based Large Language Model Parameter Optimization.
  • My Role: Team Leader.
  • Result: Improve the success rate of the GPT-based dialog system from 90.4% to 94.8%, which is better than the best public score 93.0%.

  • Time: Jul. 2020 - Feb. 2021.
  • Task: NeurIPS MineRL Competition.
  • My Role: Team Leader.
  • Result: Rank 1st among 90+ teams and 700+ users (39.55 v.s 13.29).
  • Award: First Place Award.
  • Research Intern
    Jun. 2019 - Oct. 2019

  • Task: MARL-based Wi-Fi Parameter Optimization.
  • My Role: Team Leader.
  • Result: Control large-scale agents for the first time.
  • Award: Third prize of "Innovation Pioneer President Award" (Top 13/500).
  • Award: "Excellent Intern" (Top 10/200).
  • School-Enterprise Cooperation
    Jan. 2017 - Jan. 2019

  • Task: MARL-based Network Traffic Control.
  • My Role: Team Leader.
  • Result: Improve baselines by 30%+.
  • Award: Granted 1,000,000-CNY and "Excellent Project".
  • Selected Publication

    1. Google Scholar: https://scholar.google.com/citations?user=EtVHsgcAAAAJ
      * means equal contribution.
    1. Haoyuan Jiang, Ziyue Li, Zhishuai Li, Lei Bai, Hangyu Mao, Wolfgang Ketter, and Rui Zhao. A General Scenario-Agnostic Reinforcement Learning for Traffic Signal Control. IEEE Transactions on Intelligent Transportation Systems 2024.
    2. Bin Zhang, Hangyu Mao, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng Li, Ziyue Li, Rui Zhao, Lijuan Li, and Guoliang Fan. Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach. LLMAgents Workshop at ICLR 2024.
    3. Yilun Kong*, Jingqing Ruan*, Yihong Chen*, Bin Zhang*, Tianpeng Bao*, Shiwei Shi*, Guoqing Du*, Xiaoru Hu*, Hangyu Mao*, Ziyue Li, Xingyu Zeng, Rui Zhao, and Xueqian Wang. TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems. LLMAgents Workshop at ICLR 2024.
    4. Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin Zhang, Zhen Xiao, Junge Zhang, and Jiangjin Yin. PDiT: Interleaving Perception and Decision-making Transformers for Deep Rinforcement Learning. AAMAS 2024 (CCF-B).
    5. Jiaming Lu, Jingqing Ruan, Haoyuan Jiang, Ziyue Li, Hangyu Mao, and Rui Zhao. DuaLight: Enhancing Traffic Signal Control by Leveraging Scenario-Specific and Scenario-Shared Knowledge. AAMAS 2024 (CCF-B).
    6. Jingqing Ruan*, Yihong Chen*, Bin Zhang*, Zhiwei Xu*, Tianpeng Bao*, Guoqing Du*, Shiwei Shi*, Hangyu Mao*, Xingyu Zeng, and Rui Zhao. TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents. FMDM Workshop at NeurIPS 2023 (CCF-A).
    7. Shaokang Dong, Hangyu Mao, Shangdong Yang, Shengyu Zhu, Wenbin Li, Jianye Hao, and Yang Gao. Learning When to Explore in Multi-Agent Reinforcement Learning. TCYB 2023 (CCF-B).
    8. Jingqing Ruan, Xiaotian Hao, Dong Li, and Hangyu Mao. Learning to Collaborate by Grouping: a Consensus-oriented Strategy for Multi-agent Reinforcement Learning. ECAI 2023 (CCF-B).
    9. Mingzhe Xing, Hangyu Mao, Shenglin Yin, Lichen Pang, Zhengchao Zhang, Zhen Xiao, and Jieyi Long. A Dual-Agent Scheduler for Distributed Deep Learning Jobs on Public Cloud via Reinforcement Learning. KDD 2023 (CCF-A).
    10. 郝建业, 邵坤, 李凯, 李栋, 毛航宇, 胡舒悦, 王震. 博弈智能的研究与应用. 中国科学:信息科学 2023.
    11. Jianye Hao, Xiaotian Hao, Hangyu Mao, Weixun Wang, Yaodong Yang, Dong Li, Yan Zheng, and Zhen Wang. Boosting Multiagent Reinforcement Learning via Permutation Invariant and Permutation Equivariant Networks. ICLR 2023.
    12. Ming Yan, Junjie Chen*, Hangyu Mao*, Jiajun Jiang, Jianye Hao, Xingjian Li, Zhao Tian, Zhichao Chen, Dong Li, Zhangkong Xian, Yanwei Guo, Wulong Liu, Bin Wang, Yuefeng Sun, and Yongshun Cui. Achieving Last-Mile Functional Coverage in Testing Chip Design Software Implementations. ICSE 2023 (CCF-A).
    13. Xianjie Zhang, Yu Liu, Hangyu Mao, and Chao Yu. Common Belief Multi-Agent Reinforcement Learning Based on Variational Recurrent Models. Neurocomputing 2022 (CCF-C).
    14. Wenhan Huang, Kai Li, Kun Shao, Tianze Zhou, Matthew E. Taylor, Jun Luo, Dongge Wang, Hangyu Mao, Jianye Hao, Jun Wang, and Xiaotie Deng. Multiagent Q-learning with Sub-Team Coordination. NeurIPS 2022 (CCF-A).
    15. Lichen Pan, Jun Qian, Wei Xia, Hangyu Mao, Jun Yao, PengZe Li, and Zhen Xiao. Optimizing Communication in Deep Reinforcement Learning with XingTian. Middleware 2022 (CCF-B).
    16. Jinpeng Li, Guangyong Chen, Hangyu Mao, Danruo Deng, Dong Li, Jianye Hao, Qi Dou, and Pheng-Ann Heng. Flat-aware Cross-stage Distilled Framework for Imbalanced Medical Image Classification. MICCAI 2022 (CCF-B). Provisional Accept Recommendation (Top 13%).
    17. Mingzhe Xing, Hangyu Mao, and Zhen Xiao. Fast and Fine-grained Autoscaler for Streaming Jobs with Reinforcement Learning. IJCAI 2022 (CCF-A).
    18. Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, and Li Wang. What About Inputing Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator. AAAI 2022 (CCF-A).
    19. Wenhan Huang, Kai Li, Kun Shao, Tianze Zhou, Jun Luo, Dongge Wang, Hangyu Mao, Jianye Hao, Jun Wang, and Xiaotie Deng. Multiagent Q-learning with Sub-Team Coordination. Extended Abstract at AAMAS 2022 (CCF-B).
    20. Hangyu Mao*, Chao Wang*, Xiaotian Hao*, Yihuan Mao*, Yiming Lu*, Chengjie Wu*, Jianye Hao, Dong Li, and Pingzhong Tang. SEIHAI: A Sample-Efficient Hierarchical AI for the MineRL Competition. DAI 2021. Champion Solution for NeurIPS20 MineRL Competition (Top 1 among 90+ teams).
    21. Tianpei Yang*, Weixun Wang*, Hongyao Tang*, Jianye Hao, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Chengwei Zhang, Yujing Hu, Yingfeng Chen, and Changjie Fan. An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning. NeurIPS 2021 (CCF-A).
    22. Xianjie Zhang, Yu Liu, Xiujuan Xu, Qiong Huang, Hangyu Mao, and Anil Carie. Structural Relational Inference Actor-Critic for Multi-Agent Reinforcement Learning. Neurocomputing 2021 (CCF-C).
    23. Changmin Yu, Dong Li, Hangyu Mao, Jianye Hao, and Neil Burgess. Learning State Representations via Temporal Cycle-Consistency Constraint in Model-Based Reinforcement Learning. SSL-RL Workshop at ICLR 2021.
    24. Guss William Hebgen, ..., Hangyu Mao, ..., et al. Towards Robust and Domain Agnostic Reinforcement Learning Competitions: MineRL 2020. NeurIPS 2020 Competition and Demonstration Track (CCF-A).
    25. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong, and Yan Ni. Learning Multi-Agent Communication with Double Attentional Deep Reinforcement Learning. JAAMAS 2020 (CCF-B).
    26. Hangyu Mao, Wulong Liu, Jianye Hao, Jun Luo, Dong Li, Zhengchao Zhang, Jun Wang, and Zhen Xiao. Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning. AAAI 2020 (CCF-A). Long Oral Presentation (Top 5%).
    27. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong, and Yan Ni. Learning Agent Communication under Limited Bandwidth by Message Pruning. AAAI 2020 (CCF-A).
    28. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, and Zhibo Gong. Modelling the Dynamic Joint Policy of Teammates with Attention Multi-Agent DDPG. AAMAS 2019 (CCF-B).
    29. Hangyu Mao, Yang Xiao, Yuan Wang, Jiakang Wang, and Zhen Xiao. Topic-Specific Retweet Count Ranking for Weibo. PAKDD 2018 (CCF-C).
    30. Yuan Wang, Hangyu Mao, and Zhen Xiao. Identifying Influential Users’ Professions via the Microblogs They Forward. SocInf Workshop at IJCAI 2017 (CCF-A).
    31. Yang Xiao, Yuan Wang, Hangyu Mao, and Zhen Xiao. Predicting Restaurant Consumption Level through Social Media Footprints. COLING 2016 (CCF-B).
    1. Zhishuai Li*, Xiang Wang*, Jingjing Zhao*, Sun Yang*, Guoqing Du*, Xiaoru Hu*, Bin Zhang*, Yuxiao Ye*, Ziyue Li, Rui Zhao, and Hangyu Mao. PET-SQL: A Prompt-enhanced Two-stage Text-to-SQL Framework with Cross-consistency. Arxiv 2024.
    2. Bin Zhang*, Yuxiao Ye*, Guoqing Du*, Xiaoru Hu*, Zhishuai Li*, Sun Yang*, Chi Harold Liu, Rui Zhao, Ziyue Li, and Hangyu Mao. Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation. Arxiv 2024.
    3. Guanghu Sui*, Zhishuai Li*, Ziyue Li, Sun Yang, Jingqing Ruan, Hangyu Mao, and Rui Zhao. Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and Text-to-Function - with Real Applications in Traffic Domain. Arxiv 2023.
    4. Bin Zhang, Hangyu Mao, Lijuan Li, Zhiwei Xu, Dapeng Li, Rui Zhao, and Guoliang Fan. Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems. Arxiv 2023.
    5. Hangyu Mao, Rui Zhao, Hao Chen, Jianye Hao, Yiqun Chen, Dong Li, Junge Zhang, and Zhen Xiao. Transformer in Transformer as Backbone for Deep Reinforcement Learning. Arxiv 2022.
    6. Yiqun Chen, Hangyu Mao, Tianle Zhang, Shiguang Wu, Bin Zhang, Jianye Hao, Dong Li, Bin Wang, and Hongxing Chang. PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning. Arxiv 2022.
    7. Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, and Jianye Hao. Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment. Arxiv 2021.
    8. Hangyu Mao, Zhibo Gong, and Zhen Xiao. Reward Design in Cooperative Multi-Agent Reinforcement Learning for Packet Routing. Arxiv 2020.
    9. Hangyu Mao, Zhibo Gong, Zhengchao Zhang, Zhen Xiao, and Yan Ni. Learning Multi-Agent Communication under Limited-Bandwidth Restriction for Internet Packet Routing. Arxiv 2019.
    10. Hangyu Mao, Zhibo Gong, Yan Ni, and Zhen Xiao. Actor-Coordinator-Critic Net for "Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning. Arxiv 2017.
    1. Hangyu Mao, Wulong Liu, and Jianye Hao. Agent Training Method, Apparatus, and Computer-readable Storage Medium. U.S. Application No. 17877063, issued on Nov. 17, 2022. US Patent.
    2. 毛航宇、郭艳伟、冼章孔。确定芯片测试的输入的方法和装置。专利号:202210626411.6。已集成到实际产品。
    3. 陈俊洁、毛航宇、郝建业、孙月凤、姜佳君。一种芯片的测试用例生成方法、装置及存储介质。专利号:202111663515.6。已集成到实际产品。2022年4月被评为华为“潜在高价值”专利。
    4. 张正超、肖臻、毛航宇、潘丽晨。一种基于深度强化学习的集群资源管理和任务调度方法及系统。专利号:202010581407.3。
    5. 潘丽晨、毛航宇、肖臻、张正超。基于多智能体深度强化学习的集群资源调度方法及系统。专利号:202010322543.0。
    6. 毛航宇、刘武龙、郝建业。训练智能体的方法和装置。专利号:202010077714.8。已集成到实际产品。已申请国际专利。
    7. 毛航宇、张正超、肖臻、倪炎、龚志波。流量调度方法及装置。专利号:201811505121.6。
    8. 李本超、毛航宇、肖阳、肖臻。一种网络流量监测方法及网络设备。专利号:201710681276.4。
    9. 肖阳、陈凯、李本超、毛航宇、肖臻。一种多路径流量发送的方法及装置。专利号:201610915269.1。