← 返回
基于强化学习与多目标模型预测控制的热电联产机组灵活经济运行双层优化策略
A bi-level optimization strategy for flexible and economic operation of the CHP units based on reinforcement learning and multi-objective MPC
| 作者 | Keyan Zhu · Guangming Zhang · Chen Zhu · Yuguang Niu · Jizhen Liu |
| 期刊 | Applied Energy |
| 出版日期 | 2025年1月 |
| 卷/期 | 第 391 卷 |
| 技术分类 | 电动汽车驱动 |
| 技术标签 | 模型预测控制MPC 强化学习 |
| 相关度评分 | ★★★★ 4.0 / 5.0 |
| 关键词 | A bi-level optimization strategy integrating RL and MOMPC is proposed |
语言:
中文摘要
摘要 提升热电联产(CHP)机组的综合性能对于消纳可再生能源和实现节能减排具有重要意义。为此,本文提出一种基于强化学习(RL)与多目标模型预测控制(MOMPC)的双层优化策略,以提升CHP机组的灵活性与经济运行性能。首先,构建了CHP机组模型,并将其各类参数纳入MOMPC的滚动优化过程中,作为下层跟随者以求解基础控制问题。其次,提出了一种融合双延迟深度确定性策略梯度(TD3)算法与MOMPC的双层优化策略(TD3-MOMPC),将TD3智能体设定为上层领导者;通过分解复杂的灵活性需求与CHP机组的优化控制序列,将任务分配给上层领导者与下层跟随者,实现双层交互式优化。第三,以上层的电力灵活性、供热质量与运行经济性作为引导目标,设计了多准则优化奖励函数。随后,将上层TD3智能体的动作设计为MOMPC滚动优化中的动态权重和时变预测时域,作为连接并引导双层优化的桥梁。最后,为验证该双层优化策略的有效性,在一台300 MW CHP机组上开展了大量变负荷与抗扰动测试。结果表明,所提策略能够有效提升机组的负荷灵活性、供热质量以及运行经济性。
English Abstract
Abstract Enhancing the comprehensive performance of the combined heat and power (CHP) units is crucial for accommodating renewable energy and achieving energy conservation. To this end, a bi-level optimization strategy based on reinforcement learning (RL) and multi-objective model predictive control (MOMPC) is proposed to enhance the CHP units flexibility and economic performance. Firstly, a CHP unit model is constructed, and its various parameters are incorporated into the rolling optimization of the MOMPC, serving as the lower-level follower to solve the fundamental control. Secondly, a bi-level optimization strategy integrating the twin delayed deep deterministic policy gradient (TD3) algorithm with MOMPC (TD3-MOMPC) is proposed. The TD3 agent is designated as the upper-level leader. By decomposing the complex flexibility requirements and the optimization control sequence of the CHP unit, tasks are assigned to both the upper-level leader and the lower-level follower for bi-level interactive optimization. Thirdly, with power flexibility, heating quality, and operational economy serving as leader guidance, a multi-criterion optimization reward function is designed for the upper-level. Then, the actions of the upper-level TD3 agent are designed as dynamic weights and time-varying prediction horizons for the rolling optimization of MOMPC, serving as a bridge to connect and guide the bi-level optimization. Finally, to verify the effectiveness of the bi-level optimization strategy, extensive tests on load variation and disturbance rejection were conducted on a 300 MW CHP unit. The results show that the proposed strategy enhances the unit's load flexibility, heating quality, and operational economy.
S
SunView 深度解读
该双层优化策略对阳光电源储能系统(ST系列PCS/PowerTitan)具有重要应用价值。TD3强化学习与多目标MPC结合的架构可借鉴至储能参与调频调峰场景:上层TD3智能体动态调整MPC权重和预测时域,下层MPC执行功率控制,实现灵活性与经济性平衡。该方法可优化储能系统在新能源消纳中的充放电策略,提升iSolarCloud平台智能调度能力。双层解耦思想亦适用于光储充一体站多目标协同控制,增强电网友好性与运营收益。