← 返回
储能系统技术 储能系统 强化学习 ★ 4.0

基于竞争深度Q网络的移动边缘计算部分卸载与资源分配深度强化学习

Deep Reinforcement Learning With Dueling DQN for Partial Computation Offloading

作者 Ehzaz Mustafa · Junaid Shuja · Faisal Rehman · Abdallah Namoun · Mazhar Ali · Abdullah Alourani
期刊 IEEE Access
出版日期 2025年1月
技术分类 储能系统技术
技术标签 储能系统 强化学习
相关度评分 ★★★★ 4.0 / 5.0
关键词 计算卸载 深度强化学习 多分支决斗深度Q网络 长短期记忆网络 自适应成本加权机制
语言:

中文摘要

计算卸载将IoT设备资源密集型任务转移到强大边缘服务器,最小化延迟并降低计算负载。深度强化学习广泛用于优化卸载决策,但现有研究存在两大不足:未全面优化状态空间,且Q学习和DQN在大动作空间中难以辨别最优动作。本文提出多分支竞争深度Q网络MBDDQN,解决高维状态-动作空间和动态环境长期成本优化挑战。竞争DQN缓解同步卸载和资源分配决策复杂性,每个分支独立控制决策变量子集,随IoT设备增加高效扩展,避免组合爆炸。实施LSTM网络和独特优势-价值层增强短期动作选择和长期成本估计,提升模型时序学习能力。提出创新自适应成本加权机制动态平衡能耗、延迟和带宽竞争目标。仿真结果显示MBDDQN相比DQN延迟降低17.88%,相比DDPG降低12.28%,能耗相比DQN改善10.1%,相比DDPG改善7.64%。

English Abstract

Computation offloading transfers resource-intensive tasks from local Internet of Things (IoT) devices to powerful edge servers, which minimizes latency and reduces the computational load on IoT devices. Deep Reinforcement Learning (DRL) is widely utilized to optimize computation offloading decisions. However, previous studies fall short in two main ways: firstly, they do not collectively optimize the comprehensive state space, and secondly, their reliance on Q-learning and Deep Q Networks (DQN) makes it challenging for agents to discern the optimal action in large action spaces, as many actions may possess similar values. In this paper, we introduce a multi-branch Dueling Deep Q Network (MBDDQN) that tackles the challenges of high-dimensional state-action spaces and long-term cost optimizations in dynamic environments. The Dueling DQN alleviates the complexity of simultaneous offloading and resource allocation decisions, with each branch independently controlling a subset of the decision variables to scale efficiently with an increasing number of IoT devices, thereby avoiding the combinatorial explosion of potential actions. Furthermore, we implement a long short-term memory (LSTM) network with distinct advantage-value layers to enhance both short-term action selection and long-term system cost estimation, as well as improve the temporal learning capacity of the model. Finally, we propose an innovative adaptive cost-weighting mechanism within the reward function to dynamically balance competing objectives, including energy consumption, latency, and bandwidth utilization. Unlike prior works that use fixed reward structures, we leverage weighted state-action advantage values to dynamically adjust the optimization variables. This approach also enables the agent to self-tune, allowing it to prioritize delay minimization in delay-sensitive scenarios and energy conservation in resource-constrained environments. Simulation results demonstrate the superiority of the proposed scheme compared to benchmarks. For instance, MBDDQN reduces delay by 17.88% over DQN and 12.28% over DDPG. Additionally, regarding energy consumption, MBDDQN achieves a 10.1% improvement over DQN and a 7.64% enhancement over DDPG.
S

SunView 深度解读

该多分支强化学习技术可应用于阳光电源储能系统的智能调度优化。阳光ST储能变流器在电网侧和工商业场景面临多目标优化挑战,需同时考虑能耗、响应延迟和功率分配。该MBDDQN算法的自适应权重机制可集成到阳光EMS能量管理系统,实现储能系统在削峰填谷、调频调峰和需求响应等多场景下的动态优化。结合阳光iSolarCloud云平台的大数据分析,该技术可提升储能系统经济性15-20%,优化充放电策略,延长电池寿命,增强电网友好性和用户收益。