ISSN : 1738-6764
Modern convolutional neural networks (CNNs) are essential in information and communication technology (ICT) applications, including edge computing, IoT devices, and mobile platforms, where energy efficiency and throughput are critical. These systems increasingly utilize multi-precision arithmetic to optimize accuracy and resource efficiency. However, traditional methods that assign separate fixed-precision multipliers for different bit-widths are inefficient, as the largest multiplier often dominates the critical path, limiting overall performance. In this paper, we introduce two scalable, power-efficient multiplier architectures with runtime reconfigurability: R4RC16 and R4RC32. These architectures are designed for CNN acceleration under multi-precision pruning. Each design features a low-power mode (8-bit) and a default mode (16-bit for R4RC16 and 32-bit for R4RC32), allowing for dynamic precision adjustment during inference with minimal overhead. When operating in low-power mode, our proposed multipliers achieve up to 7.6× greater energy efficiency compared to state-of-the-art approximate logarithmic multipliers, and up to 13.8× compared to approximate Booth-based designs. Additionally, they provide 2× (for 16-bit) and 4× (for 32-bit) higher throughput than exact 8-bit multipliers when processing pruned CNN workloads. Notably, the overhead in low-power mode is nearly independent of the full bit-width, resulting in a nearly constant power-delay product across both 16-bit and 32-bit designs. These findings highlight the significance of reconfigurable arithmetic units as critical components of ICT infrastructure that support healthcare, education, and multimedia, enabling CNNs to dynamically balance accuracy, energy, and throughput with less than 1% area overhead.
