In this blog, I’ll outline the main operations implemented in HLS to generate the IP core for the lenet. Since the resource of ZCU102 is enough to store all weights and inputs in on-chip memory, we will allocate the buffer large enough for storing all paramenters. Therefore, PS only needs to send input feature map to IP core. And there is no extra data transmission between PS and PL except the intermidate data.
The design of programmable logic (PL) part, using Vivado HLS
In order to clearly show the process of Lenet on one FPGA, we will design an IP core for each layer. For each IP core, it contains two channels (in_stream and out_stream) and an integer. The weights and input feature map (IFM) will send to PL part through the in_stream, and output feature map (OFM) will be transmitted to PS part through the out_stream. The integer is an operation code, indicating what operation will be performed.
|
|
Definition of variables
We define three sets of variables related to IFM, OFM, and kernel, respectively.
First, let’s see the variables related to IFM.
IFM Notations | Definition |
---|---|
ifm_ch_mem | The number of channels that the on-chip buffers can stored |
ifm_ch_proc | The number of channels that will be processed in each iteration |
ifm_len | The number of pixels in each channel of IFM |
ifm_row,ifm_col | ifm_len=ifm_row*ifm_col |
Then, let’s see the variables related to OFM. They are similar with that of IFM.
OFM Notations | Definition |
---|---|
ofm_ch_mem | The number of channels that the on-chip buffers can stored |
ofm_ch_proc | The number of channels that will be processed in each iteration |
ofm_len | The number of pixels in each channel of IFM |
ofm_row,ifm_col | ofm_len=ofm_row*ofm_col |
Note that ifm_len and ofm_len are related, which is determined by the kernel size, stride, etc.
Finally, let’s see the variables related to kernel/weights.
Kernel Notations | Definition |
---|---|
kernel_size | The size of a kernel, e.g., for 5*5 kernel, kernel_size=25 |
kernel_row | The number of rows in one kernel |
The design of conv1 IP Core
We define the operation code as follows, i.e., when the IP core will perform the following operations when op=x.
- x=1: The IP core receives weights and bias from PS part.
- x=2: The IP core will perform four operations sequentially.
- It receives IFM from PS.
- It obtains the processing window.
- It performs convolution operation.
- It performs pool operation.
- It send out the intermidate results to PS.
The most important operations that affect the overall performance are the “window generation”, “convolution”, “pooling” operations. We will introduce these two operations one by one.
Window Generation
First, let’s talk about the window generation. The code is attached as follows.
The idea is that we use two line buffers to keep the processed data. Each time, we receive a new data from a column (c), it will be placed at the right-bottom of the window (win_out). The each column in the window will move left for 1 step. And data in the line buffer at column (c) will be the last column in the window. Finally, we will update the line buffers by moving the data in the column (c) of linebuf_x to linebuf_x+1. You can refer to the video VIVADO HLS 2D Convolution on hardware for more details.
Convolution
Now, let’s talk about the convolution operation. For each time, we’d like to perform ofm_ch_proc*ifm_ch_proc*kernel_size multiplications simultaneously. [Details can be found in our CASES 18 paper, titled “Heterogeneous FPGA-based Cost-Optimal Design for Timing-Constrained CNNs“]. Then, we will use adder tree to sum the results up.
|
|
In the final step, we invoke ADDER_TREE_25 to sum up the results obtained from the multiplications.
For the ease of implementation, in the adder tree, we regard the number of inputs as 2^5=32>25.
The detailed codes are given as follows.
|
|
Pooling
The pooling operation in Lenet is max pooling, and the window size is 2 by 2. For the implementation of pooling operation in HLS, we just need to traverse the OFM, and select the maximum value in each window to be the corresponding results in the OFM_POOL matrix.
The code is listed as follows.
|
|