← 返回
储能系统技术 强化学习 ★ 5.0

重新思考复杂约束满足下的安全策略学习:含储能单元的实时安全约束经济调度初探

Rethinking Safe Policy Learning for Complex Constraints Satisfaction: A Glimpse in Real-Time Security Constrained Economic Dispatch Integrating Energy Storage Units

作者 Jianxiong Hu · Yujian Ye · Yizhi Wu · Peilin Zhao · Liu Liu
期刊 IEEE Transactions on Power Systems
出版日期 2024年6月
技术分类 储能系统技术
技术标签 强化学习
相关度评分 ★★★★★ 5.0 / 5.0
关键词 实时安全约束经济调度 强化学习 时间耦合约束 安全深度强化学习 调度方法有效性
语言:

中文摘要

近年来,用于实时安全约束经济调度(RT - SCED)问题的强化学习(RL)受到了广泛的研究关注。然而,普通的强化学习方法难以确保系统和设备层面约束条件的满足,不得不对违反约束的情况分别进行惩罚。随着可再生能源渗透率的不断提高,大规模储能得以集成,这是因为储能能够缓解可再生能源的间歇性问题。这就使得实时安全约束经济调度问题需要满足时间耦合约束条件。现有的安全强化学习方法要么在每个时间步使用安全层纠正不安全的动作,这可能会导致在可行空间边界制定出次优动作,并且可能违反时间耦合约束;要么构建安全评估模型,这可能会违反单步约束。为解决这些局限性,本文提出了一种新颖的安全深度强化学习方法,该方法具有安全探索和安全优化模块,有助于全面满足单步和时间耦合约束条件。此外,策略网络采用残差网络架构,允许直接计算所有可控资源的实际调度值,以适应它们不同的功率输出范围。案例研究表明,在IEEE 39节点和118节点测试系统上,与最先进的模型驱动和数据驱动的基准方法相比,所提出的方法在成本效率、运行安全性、计算性能和可扩展性方面均有效。

English Abstract

Reinforcement learning (RL) for real-time security constrained economic dispatch (RT-SCED) problems have been the subject of significant research interest in recent years. However, ordinary RL approaches struggle to ensure satisfaction of system- and device-wise constraints, having to penalize constraint violations individually. With increasing penetration of renewable energy sources, large-scale energy storage integration is witnessed, driven by their ability to mitigate RES intermittency. This gives rise to the need of time-coupling constraint satisfaction in the RT-SCED problems. Existing safe RL methods either rectify unsafe actions at each time step with a safety layer, which may lead to sub-optimal actions devised at the boundary of feasible space, and may violate time-coupling constraints; or construct a safety evaluation model, which may violate single-step constraints. To address these limitations, this paper proposes a novel safe deep RL method, featuring safety exploration and safety optimization modules, facilitating comprehensive satisfaction of single-step and time-coupling constraints. Furthermore, the policy network features a residual network architecture and allows direct computation of real-value dispatch of all controllable resources, adapting to their distinct power output ranges. Case studies validate the effectiveness of the proposed method in cost efficiency, operational security, computational and scalability performance, compared to state-of-the-art model-driven and data-driven baseline methods, on the IEEE 39-bus and 118-bus test systems.
S

SunView 深度解读

该安全约束经济调度技术对阳光电源PowerTitan储能系统及iSolarCloud云平台具有重要应用价值。强化学习结合约束分层建模可直接应用于ST系列储能变流器的实时调度优化,通过安全感知奖励机制保障储能系统在参与电网调频、削峰填谷时满足SOC约束、功率爬坡率及电网安全约束。该方法可集成至iSolarCloud智能运维平台,实现多储能站点的协同优化调度,在保障系统安全运行的前提下提升经济收益。对于光储融合场景,该技术可优化SG逆变器与储能系统的协同控制策略,提升新能源消纳能力与电网友好性,为阳光电源构建智能调度决策引擎提供理论支撑。