The first thing to implement a CNN layer on an FPGA is to efficiently get the operation window from the input feature map (IFM). The approach called “slide window” is commonly used to catch the window from an IFM. In this blog, I’ll first introduce the basic idea of sliding window. Then, I’ll discuss the implementation of slide window considering both stride and padding. Some details can be learnt from VIVADO HLS 2D Convolution on hardware, and the implementation is modified from FPGA-ZynqNet.
Basic Idea
First of all, the basic idea of sliding window is listed as follows.
It maintains a 2D array to represent a currently target window.
Assume the width of a window is W. It mantains W-1 line buffers.
In each iteration, an element streams to the “slide window” core (function), which will be the most right-bottom window in that window.
In each iteration, we move a column in the window from right to left for one step.
For the last column, the elements are obtained from the line buffers and the new streaming element.
Finally, it update line buffers by moving the element from the currently considering column of Line Buffer X to the corresponding position in Line Buffer X-1. And the new element is store in the last line buffer.
Slide Window core function
In the following example, we consider the kernel size of 5 by 5, which is widely employed in CNNs. For other sizes, it can be easily implemented by modify the number of line buffers, and some parameters.
The inputs of slide window contains: (1) a newly streaming data; (2) a historic and updating window; (3) the currently considering column, (4) line buffers. The detailed codes are listed as follows.
In many applications, the window should be slided more than 1 step at each time. In addition, for the border, it may require us to add some additional paddings (e.g., 0). These two requriements can be realized on top of “window_generator_5_5”. In the following, I listed the code considers both stride and paddings. Meanwhile, the code can be the test bench to test wheather we can obtain the correct results.