I work at the School of Computer Science as an assistant professor and master’s supervisor now, doing some fundamental research on reinforcement learning and multi-agent systems in the AI lab led by professor Lei Chen (陈蕾). I am also recruiting graduate students, if interested, feel free to email me at sdyang@njupt.edu.cn.

I graduated from the School of Information Science and Technology, Southwest Jiaotong University (西南交通大学信息学院) with a bachelor’s degree in 2013 and from the Department of Computer Science and Technology, Nanjing University (南京大学计算机系) with a Ph.D’s degree in 2020, advised by professor Yang Gao (高阳).

My research interest includes reinforcement learning, multi-agent systems and multi-armed bandits. I have published more than 20 papers at the top international AI conferences/journals such as AAAI, AAMAS, TNNLS, TCYB.

🔥 News

2024.08: One paper on “Multi-agent reinforcement learning” is accepted by ESWA.
2024.06: One paper on “Multi-agent reinforcement learning” is accepted by TITS.
2024.04: One paper on “Multi-agent reinforcement learning” is accepted by IJCAI 2024.
2024.04: Two papers on “Reinforcement learning” and “Multi-agent reinforcement learning” are accepted by the Journal of Software.
2024.04: One paper on “Reinforcement learning” is accepted by Chinese Journal of Computers.
2023.12: One paper on “Multi-agent reinforcement learning” is accepted by ICASSP 2024.
2023.10: One paper on “Multi-agent reinforcement learning” is accepted by TCYB.

📝 Publications

🐕 Reinforcement Learning

Shangdong Yang, Dingyuanhao Sun, Xingguo Chen. Off-Policy Temporal Difference Learning with Bellman Residuals, Mathematics, 2024.
Xingguo Chen, Guang Yang, Shangdong Yang, Huihui Wang, Shaokang Dong, Yang Gao. Online Attentive Kernel-Based Temporal Difference Learning, Knowledge-Based Systems, 2023.
Shangdong Yang, Miaoying Yu, Xingguo Chen, Wenbin Li, Lei Chen. Group-wise Contrastive Learning Based Sequence-aware Skill Discovery, CCFAI 2023 best paper/Journal of Software, 2024.
Hongye Cao, Xiao Liu, Shaokang Dong, Shangdong Yang, Jing Huo, Wenbin Li, Yang Gao. A Survey of Interpretability Research Methods for Reinforcement Learning, Chinese Journal of Computers, 2024, 8: 1853-1882.
Hongye Cao, Shangdong Yang, Jing Huo, Xingguo Chen, Yang Gao. Enhancing OOD Generalization in Offline Reinforcement Learning with Energy-Based Policy Optimization, ECAI 2023.
Xingguo Chen, Xingzhou Ma, Yang Li, Guang Yang, Shangdong Yang, Yang Gao. Modified Retrace for Off-Policy Temporal Difference Learning, UAI 2023.
Shangdong Yang, Huihui Wang, Shaokang Dong, Xingguo Chen. Leveraging Transition Exploratory Bonus for Efficient Exploration in Hard-Transiting Reinforcement Learning Problems, Future Generation Computer Systems, 2023, 145: 442-453.
Xiao Liu, Shuyang Liu, Bo An, Yang Gao, Shangdong Yang, Wenbin Li. Effective Interpretable Policy Distillation via Critical Experiences Identification, IEEE Intelligent Systems, 2023.
Xingguo Chen, Dingyuanhao Sun, Guang Yang, Shangdong Yang, Yang Gao. A Survey of Reinforcement Learning Algorithms from a Fixed Point Perspective, Chinese Journal of Computers, 2023.
Shangdong Yang, Yang Gao, Bo An, Hao Wang, Xingguo Chen. Efficient Average Reward Reinforcement Learning Using Constant Shifting Values, AAAI 2016.

🧑🏻‍🤝‍🧑🏼 Multi-agent Systems

Shaokang Dong, Chao Li, Shangdong Yang, Wenbin Li, Yang Gao. Decentralized Counterfactual Value with Threat Detection for Multi-Agent Reinforcement Learning in Mixed Cooperative and Competitive Environments, Expert Systems with Applications, 2024, 257: 125116.
Wubing Chen, Shangdong Yang, Wenbin Li, Yujing Hu, Xiao Liu, Yang Gao. Learning Multi-intersection Traffic Signal Control via Coevolutionary Multi-Agent Reinforcement Learning, IEEE Transactions on Intelligent Transportation Systems, 2024.
Chao Li, Yujing Hu, Shangdong Yang, Tangjie Lv, Changjie Fan, Wenbin Li, Chongjie Zhang, Yang Gao. STAR: Spatio-Temporal State Compression for Multi-Agent Tasks with Rich Observations, IJCAI 2024.
Shaokang Dong, Chao Li, Guang Yang, Zhenxing Ge, Hongye Cao, Wubing Chen, Shangdong Yang, Xingguo Chen, Wenbin Li, Yang Gao. Survey on Solutions and Applications for Mixed-motive Games, Journal of Software, 2024.
Chao Li, Shaokang Dong, Shangdong Yang, Hongye Cao, Wenbin Li, Yang Gao. Multi-agent Sparse Interaction Modeling Is an Anomaly Detection Problem, ICASSP 2024.
Shaokang Dong, Hangyu Mao, Shangdong Yang, Shengyu Zhu, Wenbin Li, Jianye Hao, Yang Gao. WToE: Learning When to Explore in Multi-Agent Reinforcement Learning, IEEE Transactions on Cybernetics, 2023.
Yunkai Zhuang, Shangdong Yang, Wenbin Li, Yang Gao. Convergence Analysis of Graphical Game-based Nash Q−learning Using the Interaction Detection Signal of N−step Return, ICASSP 2023.
Wubing Chen, Wenbin Li, Xiao Liu, Shangdong Yang, Yang Gao. Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient, AAAI 2023.
Zhenxing Ge, Shangdong Yang, Pinzhuo Tian, Zixuan Chen, Yang Gao. Modeling Rationality: Toward Better Performance Against Unknown Agents in Sequential Games, IEEE Transactions on Cybernetics, 2023.

🎰 Multi-armed Bandits

Shangdong Yang, Yang Gao. An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-optimal Mean Reward, IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(5): 2285-2291.
Shangdong Yang, Hao Wang, Chenyu Zhang, Yang Gao. Contextual Bandits with Hidden Features to Online Recommendation Via Sparse Interactions, IEEE Intelligent Systems, 2020, 35(5): 62-72.
Chenyu Zhang, Hao Wang, Shangdong Yang, Yang Gao. A Contextual Bandit Approach to Personalized Online Recommendation via Sparse Interactions, PAKDD 2019.
Shangdong Yang, Hao Wang, Yang Gao, Xingguo Chen. An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward, AAMAS 2018.

🍀 Others

Fan Meng, Qunli Yang, Zhengda He, Shangdong Yang*, Weidong Tang. GUARD: Multigranularity-based Unsupervised Anomaly Detection Algorithm for Multivariate Time Series, CCIS 2022.
Yansheng Wu, Chengju Li, Shangdong Yang*. New Galois Hulls of Generalized Reed-solomon Codes, Finite Fields and Their Applications, 2022, 83: 102084.
Chenyu Zhang, Hao Wang, Shangdong Yang, Yang Gao. Incremental Nonnegative Matrix Factorization Based on Matrix Sketching and k-means Clustering, IDEAL 2016.

🏆 Honors and Awards

2023.07, Outstanding Paper Award, CCFAI 2023
2022.09, Youth Fund of the National Natural Science Foundation of China, NSFC
2022.07, State Key Laboratory of Novel Software Technology Project, Nanjing University
2020.10, Double-Innovation Doctor Project, Jiangsu Province
2018.10, Innovation Ability Improvement Plan for Excellent Doctoral Students, Nanjing University
2017.10, CETC The 14th Research Institute Glarun Scholarship, Nanjing University
2010.10, National Scholarship, Southwest Jiaotong University

👨‍🎓 Educations

2013.09 - 2020.06, Doctor, Department of Computer Science and Technology, Nanjing University, Nanjing.
2016.10 - 2016.11, Visiting student, Data Science Lab led by professor Longbing Cao, University of Technology Sydney (UTS).
2009.09 - 2013.06, Bachelor, School of Information Science and Technology, Southwest Jiaotong University, Chengdu.
2006.09 - 2009.06, Huaiyin High School, Huai’an.

💬 Invited Talks

2018.08, Excellent Paper Report, the 5th Seminar on Agents and Multi-agent System of CCF

📚 Courses

Python Programming (Blended Teaching) (for NJUPT undergraduate students, Spring, 2025)
Python Programming (Blended Teaching) (for NJUPT undergraduate students, Fall, 2023-2024)
Data Structure (for NJUPT undergraduate students, Fall and Spring, 2021-2022)

Shangdong Yang