← 返回
IMAX:高能效多级流水线粗粒度线性阵列及应用
IMAX: A Power-Efficient Multilevel Pipelined CGLA and Applications
| 作者 | Tomoya Akabe · Vu Trung Duong LE · Yasuhiko Nakashima |
| 期刊 | IEEE Access |
| 出版日期 | 2025年1月 |
| 技术分类 | 储能系统技术 |
| 技术标签 | 储能系统 多电平 |
| 相关度评分 | ★★★★ 4.0 / 5.0 |
| 关键词 | 人工智能 IMAX3架构 硬件加速 性能提升 能效 |
语言:
中文摘要
人工智能应用快速进步推动对灵活高效硬件架构的需求增长。为应对这些需求,提出IMAX,一种新型粗粒度线性阵列架构,在线性结构中交替缓存存储器和处理单元以吸收不规则存储访问延迟,实现卓越性能和能效。IMAX3通过引入优化通信、双缓冲和先进稀疏矩阵乘法技术进一步增强架构,带来显著性能改进。Xilinx VPK180 SoC上实时评估显示IMAX3卓越能力:稀疏矩阵乘法比GTX 1080Ti快503倍,FFT能效是Jetson AGX Orin的10倍。此外IMAX3在矩阵乘法中优于相关架构,速度比STRELA快约23倍比RipTide快61倍,能效大幅提升。这些结果确认IMAX3作为尖端高能效灵活硬件平台超越传统GPU和其他架构,特别在实时处理和低功耗环境。IMAX3在硬件加速中树立新标准,使其成为AI驱动应用不断演变计算需求的理想解决方案。
English Abstract
The rapid advancement of artificial intelligence (AI) applications has driven a growing need for flexible and highly efficient hardware architectures. To address these demands, we propose IMAX, a novel coarse-grained linear array (CGLA) architecture that alternates cache memory and processing units in a linear structure to absorb irregular memory access latencies. This design achieves exceptional performance and energy efficiency. IMAX3 further enhances the architecture by introducing optimized communication, double buffering, and advanced sparse matrix multiplication (SpGEMM) techniques, delivering significant performance improvements. Real-time evaluations on the Xilinx VPK180 SoC demonstrate IMAX3’s remarkable capabilities: up to 503 times faster execution than GTX 1080Ti in SpGEMM and 10 times the energy efficiency of Jetson AGX Orin in FFT. Additionally, IMAX3 outperforms related architectures in matrix multiplication, achieving speeds approximately 23 times faster than STRELA and 61 times faster than RipTide, with substantial gains in energy efficiency. These results confirm IMAX3 as a cutting-edge, energy-efficient, and flexible hardware platform that surpasses traditional GPUs and other architectures, particularly in real-time processing and low-power environments. IMAX3 sets a new standard in hardware acceleration, making it an ideal solution for the evolving computational needs of AI-driven applications.
S
SunView 深度解读
该高能效硬件加速架构对阳光电源边缘AI应用具有参考价值。阳光智能逆变器和储能系统需要高效的边缘计算能力,该IMAX3架构的低功耗高性能特点与阳光产品需求契合。阳光可借鉴该多级流水线设计理念,优化逆变器和储能系统的FPGA/ASIC芯片设计,提升AI算法执行效率,降低功耗,增强实时控制和智能诊断能力,提升产品竞争力。