← 返回
基于安全运行机制的主动配电网人机协同强化学习电压/无功控制方法
Human-in-the-loop Reinforcement Learning Method for Volt/Var Control in Active Distribution Network with Safe Operation Mechanism
| 作者 | Yuechuan Tao · Zhao Yang Dong · Jing Qiu · Shuying Lai · Xianzhuo Sun · Jiaqi Ruan |
| 期刊 | IEEE Transactions on Sustainable Energy |
| 出版日期 | 2025年6月 |
| 技术分类 | 储能系统技术 |
| 技术标签 | 储能系统 强化学习 |
| 相关度评分 | ★★★★★ 5.0 / 5.0 |
| 关键词 | 分布式能源资源 主动配电网 人机共融深度强化学习 电压无功控制 安全裁剪近端策略优化算法 |
语言:
中文摘要
针对分布式能源接入带来的主动配电网运行复杂性,传统调压方法难以应对。本文提出一种融合人类经验的人机协同深度强化学习(HITL-DRL)框架,并引入安全约束裁剪的近端策略优化(SC-PPO)算法以保障学习过程的安全性。通过人类示范、反馈与对抗设置三种干预策略,提升学习效率与可解释性。仿真表明,该方法在IEEE 33节点系统中相较传统DRL算法具有更快的收敛速度与更强的鲁棒性,电压越限率降低73.4%,决策时间小于1毫秒,接近最优解性能,具备实时应用潜力。
English Abstract
In recent years, distributed energy resources (DERs) in power systems have been increasingly integrated into the distribution network. DERs will improve the flexibility and economy of active distribution networks (ADNs) while introducing increased complexity and challenges in maintaining stable and efficient system operations. The traditional voltage regulation methods struggle to cope with these complexities, highlighting the need for more advanced and adaptive control strategies for fast-response PVs and battery energy storage systems (BESS). This paper proposes a novel Human-in-the-loop deep reinforcement learning (HITL-DRL) framework for Volt/Var control in ADNs, addressing the limitations of the existing approaches by integrating human experience and knowledge into the learning process. Additionally, a Security-Clipped Proximal Policy Optimization (SC-PPO) algorithm is introduced to ensure safe operation during reinforcement learning. The paper explores three human-intervention strategies: human demonstration, human feedback, and setting adversary, which enhance the learning process by leveraging expert knowledge and experience. The proposed HITL-DRL framework demonstrates improved convergence speed, robustness, reduced exploration risk, and increased interpretability and trust, paving the way for more effective voltage regulation in complex power systems. The proposed HITL-DRL method is verified in the IEEE 33-bus system, demonstrating superior performance over standard DRL algorithms in terms of training speed and robustness, achieving the highest average reward and the second-fastest computational time. Compared to traditional PPO, our method significantly excels in managing unforeseen contingencies, resulting in a lower voltage violation rate of 73.4%. Compared with the model-based method, the strategy of HITL-DRL is very close to that of optimization results in terms of energy loss and voltage violation rates. However, HITL-DRL shows advantages in decision-making time, responding within 1 millisecond, which is capable of rapidly adapting to time-vary changes in ADNs.
S
SunView 深度解读
该人机协同强化学习电压/无功控制技术对阳光电源配电网侧储能系统具有重要应用价值。SC-PPO算法的安全约束机制可直接应用于PowerTitan储能系统的电压调节策略,保障分布式光伏并网场景下的安全运行。毫秒级决策响应能力契合ST系列储能变流器的实时控制需求,73.4%的电压越限率降低可显著提升含高比例SG逆变器接入的配电网电能质量。人类经验融合机制为iSolarCloud平台的智能运维提供新思路,可将运维专家经验转化为自适应控制策略,增强系统在复杂工况下的鲁棒性,推动储能系统从被动响应向主动调控升级。