感谢访问毛航宇的主页。毛航宇,现就职于快手科技,快意大模型知识增强研发负责人,同时兼任智能交互团队负责人。主要关注Agent, RAG, RL, LLM等技术,在ICLR及NeurIPS, ICML等CCF-A/B类会议和期刊上发表论文30余篇,申请国际、国内专利十余项,相关研究在企业场景落地并产生较大效益。曾担任上述国际会议的PC, Senior PC, Area Chair中国数据挖掘会议(CCDM)的论坛主席以及CCF多智能体学组的执行委员。本人和所带领的团队曾获全球数字经济大会“人工智能大模型-场景应用典型案例”(团队负责人)国际人工智能会议NeurIPS强化学习竞赛冠军(团队负责人)中国计算机学会“多智能体研究优秀博士论文奖”(当年度全国唯一)北京市和北京大学“优秀毕业生”(当年度北大网络所唯一博士)华为“创新先锋总裁奖”(当年度仅次于盘古大模型)

我在招有自驱力的实习、校招和社招同学,如果感兴趣可以给我发邮件:hy.mao@pku.edu.cn。之前合作的很多同学都有不错的产出和去向,例如中科院张X博士(合作期间发表多篇论文,拿到中科院自动化所教职)、中科院徐XX博士(合作期间发表多篇论文,拿到山东大学人工智能学院教职)、中科院阮XX博士(合作期间发表多篇论文,拿到蚂蚁和美团特殊计划offer)、北大邢XX博士(合作期间发表多篇论文,拿到中关村实验室offer)、中科院陈XX硕士(合作期间发表一篇IJCAI,进入人大读博深造)、天大郝XX博士(合作期间发表多篇论文,拿到字节offer)、清华吴XX博士(合作期间发表多篇论文,获得NeurIPS竞赛冠军)。我期待在奋斗的年纪遇到可以一起奋斗的你们,对于上进的同学,我会尽可能提供好的科研想法、大量的GPU资源、论文指导与修改、工作和教职的推荐信等。

Thanks for visiting Hangyu Mao's Homepage. I focus on research topics like Large Language Model (LLM), Reinforcement Learning (RL), and Multi-Agent Reinforcement Learning (MARL). I serve as the (senior) program-committee-member of top AI conferences like NeurIPS/ICML and AAAI/IJCAI, and the executive-committee-member of CCF-AI-MAS.

I am looking for self-motivated interns. If interested, please drop me an email: hy.mao@pku.edu.cn.

Selected Publication

  1. Full List on Google Scholar: https://scholar.google.com/citations?user=EtVHsgcAAAAJ
  2. * means equal contribution
    ** means corresponding author
  1. Jiangjin Yin, Hangyu Mao, Rongbo Zhu, and Shiwei Xu. Parallel Missing Tag Identification for Anonymous Multiple Users RFID Systems. SECON 2024 (CCF-B).
  2. Yilun Kong*, Jingqing Ruan*, YiHong Chen*, Bin Zhang*, Tianpeng Bao*, Shiwei Shi*, Guoqing Du*, Xiaoru Hu*, Hangyu Mao*|**, Ziyue Li**, Xingyu Zeng, Rui Zhao, and Xueqian Wang. TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Industry Systems. EMNLP 2024 Industry Track (CCF-B).
  3. Ziyue Li, Haoyuan Jiang, Hua Wei, Xuantang Xiong, Jingqing Ruan, Jiaming Lu, Hangyu Mao, and Rui Zhao. X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner. AIforCI Workshop at IJCAI 2024 (CCF-A).
  4. Sun Yang, Qiong Su, Zhishuai Li, Ziyue Li, Hangyu Mao, Chenxi Liu, and Rui Zhao. SQL-to-Schema Enhances Schema Linking in Text-to-SQL. DEXA 2024.
  5. Jingqing Ruan, Ziyue Li**, Hua Wei, Haoyuan Jiang, Jiaming Lu, Xuantang Xiong, Hangyu Mao**, and Rui Zhao. CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control. KDD 2024 (CCF-A).
  6. Bin Zhang, Hangyu Mao, Lijuan Li, Zhiwei Xu, Dapeng Li, Rui Zhao, and Guoliang Fan. Sequential Asynchronous Action Coordination in Multi-Agent Systems: A Stackelberg Decision Transformer Approach. ICML 2024 (CCF-A).
  7. Yiqun Chen, Hangyu Mao, Jiaxin Mao, Shiguang Wu, Tianle Zhang, Bin Zhang, Wei Yang, and Hongxing Chang. PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning. IJCAI 2024 (CCF-A).
  8. Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, Jingqing Ruan, Jiaming Lu, Hangyu Mao, and Rui Zhao. X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner. IJCAI 2024 (CCF-A).
  9. Haoyuan Jiang, Ziyue Li, Zhishuai Li, Lei Bai, Hangyu Mao, Wolfgang Ketter, and Rui Zhao. A General Scenario-Agnostic Reinforcement Learning for Traffic Signal Control. TITS 2024 (CCF-B).
  10. Bin Zhang, Hangyu Mao**, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng Li, Ziyue Li, Rui Zhao, Lijuan Li, and Guoliang Fan. Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach. LLMAgents Workshop at ICLR 2024.
  11. Yilun Kong*, Jingqing Ruan*, Yihong Chen*, Bin Zhang*, Tianpeng Bao*, Shiwei Shi*, Guoqing Du*, Xiaoru Hu*, Hangyu Mao*|**, Ziyue Li, Xingyu Zeng, Rui Zhao, and Xueqian Wang. TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems. LLMAgents Workshop at ICLR 2024.
  12. Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin Zhang, Zhen Xiao, Junge Zhang, and Jiangjin Yin. PDiT: Interleaving Perception and Decision-making Transformers for Deep Rinforcement Learning. AAMAS 2024 (CCF-B).
  13. Jiaming Lu, Jingqing Ruan, Haoyuan Jiang, Ziyue Li, Hangyu Mao, and Rui Zhao. DuaLight: Enhancing Traffic Signal Control by Leveraging Scenario-Specific and Scenario-Shared Knowledge. AAMAS 2024 (CCF-B).
  14. Jingqing Ruan*, Yihong Chen*, Bin Zhang*, Zhiwei Xu*, Tianpeng Bao*, Guoqing Du*, Shiwei Shi*, Hangyu Mao*|**, Xingyu Zeng, and Rui Zhao. TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents. FMDM Workshop at NeurIPS 2023 (CCF-A).
  15. Shaokang Dong, Hangyu Mao, Shangdong Yang, Shengyu Zhu, Wenbin Li, Jianye Hao, and Yang Gao. Learning When to Explore in Multi-Agent Reinforcement Learning. TCYB 2023 (CCF-B).
  16. Jingqing Ruan, Xiaotian Hao, Dong Li, and Hangyu Mao. Learning to Collaborate by Grouping: a Consensus-oriented Strategy for Multi-agent Reinforcement Learning. ECAI 2023 (CCF-B).
  17. Mingzhe Xing, Hangyu Mao, Shenglin Yin, Lichen Pang, Zhengchao Zhang, Zhen Xiao, and Jieyi Long. A Dual-Agent Scheduler for Distributed Deep Learning Jobs on Public Cloud via Reinforcement Learning. KDD 2023 (CCF-A).
  18. 郝建业, 邵坤, 李凯, 李栋, 毛航宇, 胡舒悦, 王震. 博弈智能的研究与应用. 中国科学:信息科学 2023 (中文刊物CCF-T1).
  19. Jianye Hao, Xiaotian Hao, Hangyu Mao, Weixun Wang, Yaodong Yang, Dong Li, Yan Zheng, and Zhen Wang. Boosting Multiagent Reinforcement Learning via Permutation Invariant and Permutation Equivariant Networks. ICLR 2023.
  20. Ming Yan, Junjie Chen*, Hangyu Mao*, Jiajun Jiang, Jianye Hao, Xingjian Li, Zhao Tian, Zhichao Chen, Dong Li, Zhangkong Xian, Yanwei Guo, Wulong Liu, Bin Wang, Yuefeng Sun, and Yongshun Cui. Achieving Last-Mile Functional Coverage in Testing Chip Design Software Implementations. ICSE 2023 (CCF-A).
  21. Xianjie Zhang, Yu Liu, Hangyu Mao, and Chao Yu. Common Belief Multi-Agent Reinforcement Learning Based on Variational Recurrent Models. Neurocomputing 2022 (CCF-C).
  22. Wenhan Huang, Kai Li, Kun Shao, Tianze Zhou, Matthew E. Taylor, Jun Luo, Dongge Wang, Hangyu Mao, Jianye Hao, Jun Wang, and Xiaotie Deng. Multiagent Q-learning with Sub-Team Coordination. NeurIPS 2022 (CCF-A).
  23. Lichen Pan, Jun Qian, Wei Xia, Hangyu Mao, Jun Yao, PengZe Li, and Zhen Xiao. Optimizing Communication in Deep Reinforcement Learning with XingTian. Middleware 2022 (CCF-B).
  24. Jinpeng Li, Guangyong Chen, Hangyu Mao, Danruo Deng, Dong Li, Jianye Hao, Qi Dou, and Pheng-Ann Heng. Flat-aware Cross-stage Distilled Framework for Imbalanced Medical Image Classification. MICCAI 2022 (CCF-B). Provisional Accept Recommendation (Top 13%).
  25. Mingzhe Xing, Hangyu Mao, and Zhen Xiao. Fast and Fine-grained Autoscaler for Streaming Jobs with Reinforcement Learning. IJCAI 2022 (CCF-A).
  26. Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, and Li Wang. What About Inputing Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator. AAAI 2022 (CCF-A).
  27. Wenhan Huang, Kai Li, Kun Shao, Tianze Zhou, Jun Luo, Dongge Wang, Hangyu Mao, Jianye Hao, Jun Wang, and Xiaotie Deng. Multiagent Q-learning with Sub-Team Coordination. Extended Abstract at AAMAS 2022 (CCF-B).
  28. Hangyu Mao*, Chao Wang*, Xiaotian Hao*, Yihuan Mao*, Yiming Lu*, Chengjie Wu*, Jianye Hao, Dong Li, and Pingzhong Tang. SEIHAI: A Sample-Efficient Hierarchical AI for the MineRL Competition. DAI 2021. Champion Solution for NeurIPS20 MineRL Competition (Top 1 among 90+ teams).
  29. Tianpei Yang*, Weixun Wang*, Hongyao Tang*, Jianye Hao, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Chengwei Zhang, Yujing Hu, Yingfeng Chen, and Changjie Fan. An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning. NeurIPS 2021 (CCF-A).
  30. Xianjie Zhang, Yu Liu, Xiujuan Xu, Qiong Huang, Hangyu Mao, and Anil Carie. Structural Relational Inference Actor-Critic for Multi-Agent Reinforcement Learning. Neurocomputing 2021 (CCF-C).
  31. Changmin Yu, Dong Li, Hangyu Mao, Jianye Hao, and Neil Burgess. Learning State Representations via Temporal Cycle-Consistency Constraint in Model-Based Reinforcement Learning. SSL-RL Workshop at ICLR 2021.
  32. Guss William Hebgen, ..., Hangyu Mao, ..., et al. Towards Robust and Domain Agnostic Reinforcement Learning Competitions: MineRL 2020. NeurIPS 2020 Competition and Demonstration Track (CCF-A).
  33. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong, and Yan Ni. Learning Multi-Agent Communication with Double Attentional Deep Reinforcement Learning. JAAMAS 2020 (CCF-B).
  34. Hangyu Mao, Wulong Liu, Jianye Hao, Jun Luo, Dong Li, Zhengchao Zhang, Jun Wang, and Zhen Xiao. Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning. AAAI 2020 (CCF-A). Long Oral Presentation (Top 5%).
  35. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong, and Yan Ni. Learning Agent Communication under Limited Bandwidth by Message Pruning. AAAI 2020 (CCF-A).
  36. Hangyu Mao, Zhengchao Zhang, Zhen Xiao, and Zhibo Gong. Modelling the Dynamic Joint Policy of Teammates with Attention Multi-Agent DDPG. AAMAS 2019 (CCF-B).
  37. Hangyu Mao, Yang Xiao, Yuan Wang, Jiakang Wang, and Zhen Xiao. Topic-Specific Retweet Count Ranking for Weibo. PAKDD 2018 (CCF-C).
  38. Yuan Wang, Hangyu Mao, and Zhen Xiao. Identifying Influential Users’ Professions via the Microblogs They Forward. SocInf Workshop at IJCAI 2017 (CCF-A).
  39. Yang Xiao, Yuan Wang, Hangyu Mao, and Zhen Xiao. Predicting Restaurant Consumption Level through Social Media Footprints. COLING 2016 (CCF-B).
  1. Hangyu Mao, Wulong Liu, and Jianye Hao. Agent Training Method, Apparatus, and Computer-readable Storage Medium. U.S. Application No. 17877063, issued on Nov. 17, 2022. US Patent.
  2. 毛航宇、郭艳伟、冼章孔。确定芯片测试的输入的方法和装置。专利号:202210626411.6。已集成到实际产品。
  3. 陈俊洁、毛航宇、郝建业、孙月凤、姜佳君。一种芯片的测试用例生成方法、装置及存储介质。专利号:202111663515.6。已集成到实际产品。2022年4月被评为华为“潜高”专利。
  4. 张正超、肖臻、毛航宇、潘丽晨。一种基于深度强化学习的集群资源管理和任务调度方法及系统。专利号:202010581407.3。
  5. 潘丽晨、毛航宇、肖臻、张正超。基于多智能体深度强化学习的集群资源调度方法及系统。专利号:202010322543.0。
  6. 毛航宇、刘武龙、郝建业。训练智能体的方法和装置。专利号:202010077714.8。已集成到实际产品。已申请国际专利。
  7. 毛航宇、张正超、肖臻、倪炎、龚志波。流量调度方法及装置。专利号:201811505121.6。
  8. 李本超、毛航宇、肖阳、肖臻。一种网络流量监测方法及网络设备。专利号:201710681276.4。
  9. 肖阳、陈凯、李本超、毛航宇、肖臻。一种多路径流量发送的方法及装置。专利号:201610915269.1。

Research Topic

LLM Research: Agents with Task Planning and Tool Usage
  1. TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Industry Systems. EMNLP 2024 Industry Track.
  2. Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach. LLMAgents Workshop at ICLR 2024.
  3. TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems. LLMAgents Workshop at ICLR 2024.
  4. TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents. FMDM Workshop at NeurIPS 2023.
LLM Research: Text-to-SQL, NL-to-SQL
  1. SQL-to-Schema Enhances Schema Linking in Text-to-SQL. DEXA 2024.
  2. PET-SQL: A Prompt-enhanced Two-stage Text-to-SQL Framework with Cross-consistency. Arxiv 2024.
  3. Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation. Arxiv 2024.
  4. Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and Text-to-Function - with Real Applications in Traffic Domain. Arxiv 2023.
Transformer-based RL and MARL
  1. CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control. KDD 2024.
  2. Sequential Asynchronous Action Coordination in Multi-Agent Systems: A Stackelberg Decision Transformer Approach. ICML 2024.
  3. X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner. IJCAI 2024.
  4. PDiT: Interleaving Perception and Decision-making Transformers for Deep Rinforcement Learning. AAMAS 2024.
  5. Transformer in Transformer as Backbone for Deep Reinforcement Learning. Arxiv 2022.
MARL Research: Noval Perspectives for Cooperation
  1. Sequential Asynchronous Action Coordination in Multi-Agent Systems: A Stackelberg Decision Transformer Approach. ICML 2024.
  2. PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning. IJCAI 2024.
  3. Boosting Multiagent Reinforcement Learning via Permutation Invariant and Permutation Equivariant Networks. ICLR 2023.
  4. Multiagent Q-learning with Sub-Team Coordination. NeurIPS 2022.
  5. SEIHAI: A Sample-Efficient Hierarchical AI for the MineRL Competition. DAI 2021.
  6. Learning Agent Communication under Limited Bandwidth by Message Pruning. AAAI 2020.
  7. Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning. AAAI 2020.
  8. Modelling the Dynamic Joint Policy of Teammates with Attention Multi-Agent DDPG. AAMAS 2019.
MARL Research: Grouping for Cooperation
  1. CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control. KDD 2024.
  2. Learning to Collaborate by Grouping: a Consensus-oriented Strategy for Multi-agent Reinforcement Learning. ECAI 2023.
  3. Multiagent Q-learning with Sub-Team Coordination. NeurIPS 2022.
  4. Structural Relational Inference Actor-Critic for Multi-Agent Reinforcement Learning. Neurocomputing 2021.
  5. Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning. AAAI 2020.
MARL Research: Cognition Consistency for Cooperation
  1. Common Belief Multi-Agent Reinforcement Learning Based on Variational Recurrent Models. Neurocomputing 2022.
  2. Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning. AAAI 2020.
MARL Research: Attention Mechanism for Cooperation
  1. Learning Multi-Agent Communication with Double Attentional Deep Reinforcement Learning. JAAMAS 2020.
  2. Modelling the Dynamic Joint Policy of Teammates with Attention Multi-Agent DDPG. AAMAS 2019.
MARL Research: Communication for Cooperation
  1. Learning Multi-Agent Communication with Double Attentional Deep Reinforcement Learning. JAAMAS 2020.
  2. Learning Agent Communication under Limited Bandwidth by Message Pruning. AAAI 2020.
  3. Learning Multi-Agent Communication under Limited-Bandwidth Restriction for Internet Packet Routing. Arxiv 2019.
  4. Actor-Coordinator-Critic Net for "Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning. Arxiv 2017.
MARL Applications: Traffic Signal Controlling
  1. CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control. KDD 2024.
  2. X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner. IJCAI 2024.
  3. A General Scenario-Agnostic Reinforcement Learning for Traffic Signal Control. IEEE Transactions on Intelligent Transportation Systems 2024.
  4. DuaLight: Enhancing Traffic Signal Control by Leveraging Scenario-Specific and Scenario-Shared Knowledge. AAMAS 2024.
MARL Applications: Cloud Resource Scheduling
  1. A Dual-Agent Scheduler for Distributed Deep Learning Jobs on Public Cloud via Reinforcement Learning. KDD 2023.
  2. Fast and Fine-grained Autoscaler for Streaming Jobs with Reinforcement Learning. IJCAI 2022.
MARL Applications: Internet Packet Routing
  1. Reward Design in Cooperative Multi-Agent Reinforcement Learning for Packet Routing. Arxiv 2020.
  2. Learning Multi-Agent Communication under Limited-Bandwidth Restriction for Internet Packet Routing. Arxiv 2019.