Array-Specific Dataflow Caches for High-Level Synthesis of Memory-Intensive Algorithms on FPGAs
Designs implemented on field-programmable gate arrays (FPGAs) via high-level synthesis (HLS) suffer from off-chip memory latency and bandwidth bottlenecks.FPGAs can access both large but slow off-chip memories (DRAM), and fast but small on-chip memories (block RAMs and registers).HLS tools allow exploiting the memory hierarchy in a scratchpad-like