← 返回
基于PCA和堆叠自编码器的混合机器学习框架用于智能电网数据注入攻击检测
Hybrid ML Framework for Data Injection Attack Detection Using PCA and Stacked Autoencoders
| 作者 | Shahid Tufail · Hasan Iqbal · Mohd Tariq · Arif I. Sarwat |
| 期刊 | IEEE Access |
| 出版日期 | 2025年1月 |
| 技术分类 | 储能系统技术 |
| 技术标签 | 储能系统 机器学习 |
| 相关度评分 | ★★★★★ 5.0 / 5.0 |
| 关键词 | 网络攻击 数据不平衡 堆叠自动编码器 随机森林 智能电网安全 |
语言:
中文摘要
随着智能电网日益互联,网络攻击特别是数据注入攻击变得更加普遍。此外,模型训练需要准确无偏的高质量数据。我们从现实世界收集的大多数数据稀疏、不完整、不一致和倾斜。为解决这些问题,本研究提出检测此类攻击的框架。使用堆叠自编码器架构生成少数类数据的合成实例。生成的类别解决数据不平衡以增强模型泛化能力并应对多样化攻击场景。评估各种机器学习算法,随机森林RF模型始终达到卓越准确率,范围从99.32%到95.89%。特别是,逻辑回归LR等传统算法对降维表现出敏感性,当主成分从全部降至10时经历16.96%准确率下降。相比之下,RF展示韧性,类似条件下仅1.67%平均准确率下降。RF和XGBoost均作为突出模型涌现,即使通过主成分分析PCA降维也展示高准确率和鲁棒性能。该研究显示理解算法行为和数据特征的重要性及其如何影响ML模型性能,将加强智能电网网络安全并关注仔细特征选择和调优的关键需求。
English Abstract
Cyberattacks, especially data injection attacks, are becoming more common as smart grids are increasingly interconnected. In addition, accurate and unbiased high-quality data is required for model training. Most of the data we collect from the real world is sparse, incomplete, inconsistent, and skewed. To address these issues, we have proposed a framework to detect such attacks in this study. Using a stacked autoencoder architecture, synthetic instances of minority class data were generated. The generated classes address the imbalances in the data to enhance the generalizability of the model and address diverse attack scenarios. Various machine learning algorithms were evaluated, and the Random Forest (RF) model consistently achieved superior accuracy, ranging from 99.32% to 95.89%. In particular, traditional algorithms such as Logistic Regression (LR) exhibited sensitivity to dimensionality reductions, experiencing a 16.96% accuracy drop when the principal components were reduced from all to 10. In contrast, RF demonstrated resilience, with only a 1.67% mean accuracy drop under similar conditions. Both RF and XGBoost (XGB) emerged as standout models, showcasing high accuracy and robust performance even with dimensionality reduction via principal component analysis (PCA). However, reducing PCA components from 10 to 5 led to performance decreases in all models. The Support Vector Machine (SVM) Classifier shows the highest accuracy drop of 14.21%. This study shows the importance of understanding algorithmic behavior and data features and how it can impact the performance of ML models. This analysis will strengthen cybersecurity in smart grids and focusing on the critical need for careful feature selection and tuning, particularly for models sensitive to dimensionality reduction.
S
SunView 深度解读
该数据注入攻击检测技术对阳光电源智能电网安全至关重要。阳光iSolarCloud平台和ST储能系统接入电网SCADA系统,面临虚假数据注入攻击威胁。该研究的堆叠自编码器和随机森林混合方法可集成到阳光网络安全防护体系,检测异常数据和攻击行为。在电网侧储能场景下,数据注入攻击可能导致储能系统误动作,影响电网稳定。该RF模型的高准确率(99.32%)和降维鲁棒性可部署在阳光储能变流器的边缘安全模块,实现实时威胁检测。结合阳光云端安全中心和设备级安全芯片,该技术可构建纵深防御体系,保护智能电网数据完整性和系统安全,确保储能系统可靠运行。