← 返回
智能化与AI应用 ★ 5.0

基于强化学习的原位功率硬件在环接口控制最优方法:基于混合迭代自适应动态规划应对不确定动态的理论与应用

A Reinforcement-Learning, Optimal Approach to In Situ Power Hardware-in-the-Loop Interface Control for Testing Inverter-Based Resources: Theory and Application of the Adaptive Dynamic Programming Based on the Hybrid Iteration to Tackle Uncertain Dynamics

作者 Masoud Davari · Omar Qasem · Weinan Gao · Frede Blaabjerg · Panos C. Kotsampopoulos · Georg Lauss
期刊 IEEE Transactions on Industrial Electronics
出版日期 2024年11月
技术分类 智能化与AI应用
相关度评分 ★★★★★ 5.0 / 5.0
关键词 逆变器资源 功率硬件在环接口控制 自适应动态规划 混合迭代法 测试效果对比
语言:

中文摘要

对基于逆变器的资源(IBR)进行测试至关重要。本文提出了一种新颖的电力硬件在环(PHIL)接口控制(PHIL - IC)方法,该方法采用基于自适应动态规划(ADP,也称为近似动态规划)的强化学习方法,借助基于ADP的方法来加强基于PHIL仿真的IBR测试。由于与IBR、功率放大器、所有与基于PHIL仿真测试相关的组件及其延迟相关的整个系统(状态和干扰)的动态特性“不可用”或“不确定”,该方法采用输出反馈控制;它在考虑所有相关系统的所有不确定性和不可用信息的情况下,对PHIL - IC进行最优设计。为此,所提出的基于ADP的PHIL - IC采用了一种新的混合迭代(HI)方法,该方法不同于传统的ADP策略;与策略迭代方法相比,HI算法不需要可允许控制策略的先验知识。此外,所提出的HI方法具有二次收敛速度,比值迭代方法收敛速度快得多。因此,与值迭代方法相比,所提出的HI方法节省了大量的学习时间和迭代次数。将采用所提出方法的基于PHIL仿真测试的结果与采用比例谐振控制器(作为传统的PHIL - IC)以及基于μ综合的鲁棒PHIL - IC(作为当前最先进的PHIL - IC)的测试结果进行比较,揭示了所提出方法的有效性和实用性。这些比较结果是通过基于PHIL仿真测试中常用的理想变压器模型(也称为电压型接口)以及与电网相关的感兴趣模型的戴维南等效阻抗(电阻性、电阻 - 电感性和电感性)的实际案例得出的。

English Abstract

Testing inverter-based resources (IBRs) is of utmost importance. This paper proposes a novel power hardware-in-the-loop (PHIL) interface control (PHIL-IC) employing a reinforcement-learning approach based on adaptive dynamic programming (ADP, also known as approximate dynamic programming) to enhance the PHIL-simulation-based testing of IBRs by virtue of an ADP-based method. It deploys output feedback control because of “unavailable” or “uncertain” dynamics of the entire systems (states and disturbances) linked to IBRs, power amplifiers, all the components associated with the PHIL-simulation-based testing, and their delays; it optimally designs PHIL-IC while considering all uncertainties and unavailable information about all the systems involved. To this end, the proposed ADP-based PHIL-IC utilizes a new hybrid iteration (HI) method, which differs from the traditional ADP strategies; compared with the policy iteration method, the HI algorithm does not require prior knowledge of an admissible control policy. Moreover, with a quadratic rate of convergence, the proposed HI method converges much faster than the value iteration method. Therefore, the proposed HI method saves significant learning time and iterations compared to the value iteration method. Comparing the results of the PHIL-simulation-based testing utilizing the proposed method with those of the proportional-resonant controller (as the conventional PHIL-IC) and the robust PHIL-IC based on synthesis (as the current state-of-the-art PHIL-IC) reveals the effectiveness and practicality of the proposed method. Those comparative results are generated by the ideal transformer model (also known as voltage-type interface) commonly used in the PHIL-simulation-based testing and practical cases of the Thévenin equivalent impedance (resistive, resistive-inductive, and inductive ones) of the model of interest associated with the power networks.
S

SunView 深度解读

从阳光电源的业务视角来看,这项基于自适应动态规划的功率硬件在环(PHIL)测试技术具有重要的战略价值。该技术直接针对逆变器类资源(IBRs)的测试难题,这与我们光伏逆变器、储能变流器等核心产品的研发验证流程高度相关。

该论文提出的强化学习方法突破了传统PHIL测试的局限性。当前我们在进行逆变器并网测试时,常面临系统动态不确定、延迟难以补偿等问题,导致测试结果与实际工况存在偏差。该方法采用输出反馈控制,无需精确掌握整个系统的状态空间模型,这对于复杂的储能系统和多机并联场景尤为实用。其创新的混合迭代算法相比传统方法收敛速度显著提升,可大幅缩短我们的产品测试周期,加快迭代速度。

从应用价值看,该技术可显著提升我们大功率逆变器、储能PCS等产品在弱电网、高阻抗场景下的测试准确性。论文验证了不同戴维南等效阻抗类型下的有效性,这对应我们产品在不同电网条件下的适应性测试需求。特别是在"双碳"目标下,新能源渗透率持续提高,电网呈现高度不确定性,这种自适应测试方法的价值将更加凸显。

技术挑战方面,强化学习算法的工程化实现需要专业的控制理论和AI团队支撑,且需要建立完整的训练数据库。建议我们可以从实验室环境的小功率测试平台入手,逐步积累经验,最终应用于兆瓦级产品的型式试验,形成差异化的测试验证能力,强化技术护城河。