Design NN on ZCU102 (2) — Slide Window in HLS

The first thing to implement a CNN layer on an FPGA is to efficiently get the operation window from the input feature map (IFM). The approach called “slide window” is commonly used to catch the window from an IFM. In this blog, I’ll first introduce the basic idea of sliding window. Then, I’ll discuss the implementation of slide window considering both stride and padding. Some details can be learnt from VIVADO HLS 2D Convolution on hardware, and the implementation is modified from FPGA-ZynqNet.

Basic Idea

First of all, the basic idea of sliding window is listed as follows.

  • It maintains a 2D array to represent a currently target window.
  • Assume the width of a window is W. It mantains W-1 line buffers.
  • In each iteration, an element streams to the “slide window” core (function), which will be the most right-bottom window in that window.
  • In each iteration, we move a column in the window from right to left for one step.
  • For the last column, the elements are obtained from the line buffers and the new streaming element.
  • Finally, it update line buffers by moving the element from the currently considering column of Line Buffer X to the corresponding position in Line Buffer X-1. And the new element is store in the last line buffer.

Slide Window core function

In the following example, we consider the kernel size of 5 by 5, which is widely employed in CNNs. For other sizes, it can be easily implemented by modify the number of line buffers, and some parameters.

The inputs of slide window contains: (1) a newly streaming data; (2) a historic and updating window; (3) the currently considering column, (4) line buffers. The detailed codes are listed as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Written by Weiwen Jiang: jiang.wwen@pitt.edu
void window_generator_5_5(
FPGA_DATA d_in,
FPGA_DATA win_out[kernel_size],
int column,
FPGA_DATA linebuf1[ifm_row+win_pad_size+win_pad_size],
FPGA_DATA linebuf2[ifm_row+win_pad_size+win_pad_size],
FPGA_DATA linebuf3[ifm_row+win_pad_size+win_pad_size],
FPGA_DATA linebuf4[ifm_row+win_pad_size+win_pad_size],
FPGA_DATA temp[kernel_row]
){
temp[0] = d_in;
temp[1] = linebuf1[column];
temp[2] = linebuf2[column];
temp[3] = linebuf3[column];
temp[4] = linebuf4[column];

for(int i=0;i<kernel_row-1;i++){
for(int j=i;j<kernel_size;j+=kernel_row){
win_out[j] = win_out[j+1];
}
}

int i=1;
for(int j=kernel_row-1;j<kernel_size;j+=kernel_row){
win_out[j] = temp[kernel_row-i];
i++;
}

linebuf1[column] = temp[0];
linebuf2[column] = temp[1];
linebuf3[column] = temp[2];
linebuf4[column] = temp[3];
}

Stride and Paddings

In many applications, the window should be slided more than 1 step at each time. In addition, for the border, it may require us to add some additional paddings (e.g., 0). These two requriements can be realized on top of “window_generator_5_5”.
In the following, I listed the code considers both stride and paddings. Meanwhile, the code can be the test bench to test wheather we can obtain the correct results.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// Written by Weiwen Jiang: jiang.wwen@pitt.edu
int main(){
int dim=6;

int A[dim*dim];
for(int i=0;i<dim;i++){
for(int j=0;j<dim;j++){
A[i*dim+j] = 10+j+i;
}
}

cout<<"Inputs"<<endl;
for(int i=0;i<dim;i++){
for(int j=0;j<dim;j++){
cout<<A[i*dim+j]<<" ";
}
cout<<endl;
}
cout<<endl;

int i_step=0,j_step=0;
FPGA_DATA window[kernel_size];
int src_buf_addr=0;
FPGA_DATA linebuf1[ifm_row+win_pad_size+win_pad_size];
FPGA_DATA linebuf2[ifm_row+win_pad_size+win_pad_size];
FPGA_DATA linebuf3[ifm_row+win_pad_size+win_pad_size];
FPGA_DATA linebuf4[ifm_row+win_pad_size+win_pad_size];
FPGA_DATA Line5[kernel_row]={0};

memset(linebuf1,0,(ifm_row+win_pad_size+win_pad_size)*sizeof(FPGA_DATA));
memset(linebuf2,0,(ifm_row+win_pad_size+win_pad_size)*sizeof(FPGA_DATA));
memset(linebuf3,0,(ifm_row+win_pad_size+win_pad_size)*sizeof(FPGA_DATA));
memset(linebuf4,0,(ifm_row+win_pad_size+win_pad_size)*sizeof(FPGA_DATA));


for(int i=0-win_pad_size;i<dim+win_pad_size;i++){
for(int j=0-win_pad_size;j<dim+win_pad_size;j++){
bool border = (i < 0 || i >=dim || j < 0 || j >= dim);
window_generator_5_5(border?(FPGA_DATA)0:A[src_buf_addr], window, j+win_pad_size, linebuf1, linebuf2,linebuf3,linebuf4,Line5);
if(!border)
src_buf_addr++;
if(j_step == win_stride)
j_step = 0;
if(i_step == win_stride)
i_step = 0;
bool valid = (j >= kernel_row-1-win_pad_size) && (i >= kernel_row-1-win_pad_size) && !j_step && !i_step;
if(valid){
cout<<"Iteration "<<i<<" "<<j<<endl;
for(int k=0;k<kernel_size;k++){
cout<<window[k]<<" ";
if((k+1)%kernel_row==0){
cout<<endl;
}
}
cout<<"===================="<<endl;
cout<<endl;
}
if(j>=kernel_row-1-win_pad_size){
j_step++;
}
}
j_step = 0;
if(i>=kernel_row-1-win_pad_size)
i_step++;
}
return 0;
}

Results for parameters (kenerl: 5*5; IFM: 6*6; stride: 2; padding 2)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
Inputs
10 11 12 13 14 15
11 12 13 14 15 16
12 13 14 15 16 17
13 14 15 16 17 18
14 15 16 17 18 19
15 16 17 18 19 20

Iteration 2 2
0 0 0 0 0
0 0 0 0 0
0 0 10 11 12
0 0 11 12 13
0 0 12 13 14
====================

Iteration 2 4
0 0 0 0 0
0 0 0 0 0
10 11 12 13 14
11 12 13 14 15
12 13 14 15 16
====================

Iteration 2 6
0 0 0 0 0
0 0 0 0 0
12 13 14 15 0
13 14 15 16 0
14 15 16 17 0
====================

Iteration 4 2
0 0 10 11 12
0 0 11 12 13
0 0 12 13 14
0 0 13 14 15
0 0 14 15 16
====================

Iteration 4 4
10 11 12 13 14
11 12 13 14 15
12 13 14 15 16
13 14 15 16 17
14 15 16 17 18
====================

Iteration 4 6
12 13 14 15 0
13 14 15 16 0
14 15 16 17 0
15 16 17 18 0
16 17 18 19 0
====================

Iteration 6 2
0 0 12 13 14
0 0 13 14 15
0 0 14 15 16
0 0 15 16 17
0 0 0 0 0
====================

Iteration 6 4
12 13 14 15 16
13 14 15 16 17
14 15 16 17 18
15 16 17 18 19
0 0 0 0 0
====================

Iteration 6 6
14 15 16 17 0
15 16 17 18 0
16 17 18 19 0
17 18 19 20 0
0 0 0 0 0
====================

July 14, 2018 Weiwen Jiang jiang.wwen@pitt.edu At UPITT