面向暂态安全约束的大规模电力系统快速收敛深度强化学习优化调度方法

Fast-converging Deep Reinforcement Learning for Optimal Dispatch of Large-scale Power Systems Under Transient Security Constraints

作者
期刊	现代电力系统通用与清洁能源学报
出版日期	2025年9月
卷/期	第 2025 卷第 5 期
技术分类	智能化与AI应用
技术标签	强化学习深度学习系统并网技术调峰调频
相关度评分	★★★★ 4.0 / 5.0
关键词

语言:

中文摘要

本文针对暂态安全约束最优潮流（TSC-OPF）中深度强化学习面临的高维状态/动作空间与稀疏奖励难题，提出改进MDP建模与DDPG-CL-PE-ED算法，显著提升训练效率与决策精度，在IEEE 39节点及实际710节点电网验证有效。

English Abstract

Power system optimal dispatch with transient secu-rity constraints is commonly represented as transient security-constrained optimal power flow(TSC-OPF).Deep reinforce-ment learning(DRL)-based TSC-OPF trains efficient decision-making agents that are adaptable to various scenarios and pro-vide solution results quickly.However,due to the high dimen-sionality of the state space and action spaces,as well as the non-smoothness of dynamic constraints,existing DRL-based TSC-OPF solution methods face a significant challenge of the sparse reward problem.To address this issue,a fast-converging DRL method for optimal dispatch of large-scale power systems under transient security constraints is proposed in this paper.The Markov decision process(MDP)modeling of TSC-OPF is im-proved by reducing the observation space and smoothing the re-ward design,thus facilitating agent training.An improved deep deterministic policy gradient algorithm with curriculum learn-ing,parallel exploration,and ensemble decision-making(DDPG-CL-PE-ED)is introduced to drastically enhance the efficiency of agent training and the accuracy of decision-making.The ef-fectiveness,efficiency,and accuracy of the proposed method are demonstrated through experiments in the IEEE 39-bus system and a practical 710-bus regional power grid.The source code of the proposed method is made public on GitHub.

SunView 深度解读

该研究对阳光电源iSolarCloud智能运维平台及PowerTitan/ST系列储能PCS的电网级协同调度具有重要参考价值。其快速收敛DRL算法可嵌入iSolarCloud实现毫秒级暂态安全感知下的光储联合调频调峰决策，提升构网型PCS在弱电网下的动态支撑能力。建议在PowerTitan集群控制模块中集成该算法框架，增强其在新型电力系统中应对故障扰动的自主响应能力，并优先在青海、甘肃等高比例新能源区域试点应用。