面向主动配电网安全运行的人在环路强化学习伏/无功控制方法

Human-in-the-Loop Reinforcement Learning Method for Volt/Var Control in Active Distribution Network With Safe Operation Mechanism

作者	Yuechuan Tao · Zhao Yang Dong · Jing Qiu · Shuying Lai · Xianzhuo Sun · Jiaqi Ruan · Jiafeng Lin · Junhua Zhao
期刊	IEEE Transactions on Sustainable Energy
出版日期	2025年6月
卷/期	第 17 卷第 1 期
技术分类	控制与算法
技术标签	强化学习深度学习并网逆变器微电网
相关度评分	★★★★★ 5.0 / 5.0
关键词

语言:

中文摘要

本文提出一种人在环路深度强化学习（HITL-DRL）框架，结合人类专家经验与Security-Clipped PPO算法，提升主动配电网中光伏与储能系统的电压/无功快速协同调控能力，显著降低电压越限率（73.4%），增强鲁棒性与可解释性。

English Abstract

In recent years, distributed energy resources (DERs) in power systems have been increasingly integrated into the distribution network. DERs will improve the flexibility and economy of active distribution networks (ADNs) while introducing increased complexity and challenges in maintaining stable and efficient system operations. The traditional voltage regulation methods struggle to cope with these complexities, highlighting the need for more advanced and adaptive control strategies for fast-response PVs and battery energy storage systems (BESS). This paper proposes a novel Human-in-the-loop deep reinforcement learning (HITL-DRL) framework for Volt/Var control in ADNs, addressing the limitations of the existing approaches by integrating human experience and knowledge into the learning process. Additionally, a Security-Clipped Proximal Policy Optimization (SC-PPO) algorithm is introduced to ensure safe operation during reinforcement learning. The paper explores three human-intervention strategies: human demonstration, human feedback, and setting adversary, which enhance the learning process by leveraging expert knowledge and experience. The proposed HITL-DRL framework demonstrates improved convergence speed, robustness, reduced exploration risk, and increased interpretability and trust, paving the way for more effective voltage regulation in complex power systems. The proposed HITL-DRL method is verified in the IEEE 33-bus system, demonstrating superior performance over standard DRL algorithms in terms of training speed and robustness, achieving the highest average reward and the second-fastest computational time. Compared to traditional PPO, our method significantly excels in managing unforeseen contingencies, resulting in a lower voltage violation rate of 73.4%. Compared with the model-based method, the strategy of HITL-DRL is very close to that of optimization results in terms of energy loss and voltage violation rates. However, HITL-DRL shows advantages in decision-making time, responding within 1 millisecond, which is capable of rapidly adapting to time-vary changes in ADNs.

SunView 深度解读

该研究高度契合阳光电源在光储协同智能调控领域的战略布局。其HITL-DRL框架可直接赋能ST系列PCS、PowerTitan及iSolarCloud平台的Volt/Var自适应调节功能，提升组串式逆变器在弱电网下的动态无功响应精度与安全性；建议将SC-PPO算法嵌入iSolarCloud边缘控制器，实现分布式光伏电站群的在线安全策略优化，并为构网型光储系统提供可验证的AI控制范式。