Export: other / legacy¶
n2d2 "mnist24_16c4s2_24c5s2_150_10.ini" -export CPP_OpenCL
Export types:
C
C export using OpenMP;C_HLS
C export tailored for HLS with Vivado HLS;CPP_OpenCL
C++ export using OpenCL;CPP_Cuda
C++ export using Cuda;CPP_cuDNN
C++ export using cuDNN;SC_Spike
SystemC spike export.
Other program options related to the exports:
Option [default value] |
Description |
---|---|
|
Number of bits for the weights and signals. Must be 8, 16, 32 or 64 for integer export, or -32, -64 for floating point export. The number of bits can be arbitrary for the |
|
Number of stimuli used for the calibration. 0 = no calibration (default), -1 = use the full test dataset for calibration |
|
Number of KL passes for determining the layer output values distribution truncation threshold (0 = use the max. value, no truncation) |
|
If present, disable the use of unsigned data type in integer exports |
|
Max. number of stimuli to export (0 = no dataset export, -1 = unlimited) |
C export¶
Test the exported network:
cd export_C_int8
make
./bin/n2d2_test
The result should look like:
...
1652.00/1762 (avg = 93.757094%)
1653.00/1763 (avg = 93.760635%)
1654.00/1764 (avg = 93.764172%)
Tested 1764 stimuli
Success rate = 93.764172%
Process time per stimulus = 187.548186 us (12 threads)
Confusion matrix:
-------------------------------------------------
| T \ E | 0 | 1 | 2 | 3 |
-------------------------------------------------
| 0 | 329 | 1 | 5 | 2 |
| | 97.63% | 0.30% | 1.48% | 0.59% |
| 1 | 0 | 692 | 2 | 6 |
| | 0.00% | 98.86% | 0.29% | 0.86% |
| 2 | 11 | 27 | 609 | 55 |
| | 1.57% | 3.85% | 86.75% | 7.83% |
| 3 | 0 | 0 | 1 | 24 |
| | 0.00% | 0.00% | 4.00% | 96.00% |
-------------------------------------------------
T: Target E: Estimated
CPP_OpenCL export¶
The OpenCL export can run the generated program in GPU or CPU architectures. Compilation features:
Preprocessor command [default value] |
Description |
---|---|
|
Compile the binary with a synchronization between each layers and return the mean execution time of each layer. This preprocessor option can decrease performances. |
|
Generate the binary output of the OpenCL kernel .cl file use. The binary is store in the /bin folder. |
|
Indicate to the program to load an OpenCL kernel as a binary from the /bin folder instead of a .cl file. |
|
Use the CUDA OpenCL SDK locate at \({/usr/local/cuda}\) |
|
Use the MALI OpenCL SDK locate at \({/usr/Mali_OpenCL_SDK_vXXX}\) |
|
Use the INTEL OpenCL SDK locate at \({/opt/intel/opencl}\) |
|
Use the AMD OpenCL SDK locate at \({/opt/AMDAPPSDK-XXX}\) |
Program options related to the OpenCL export:
Option [default value] |
Description |
---|---|
|
If present, force to use a CPU architecture to run the program |
|
If present, force to use a GPU architecture to run the program |
|
Size of the batch to use |
|
Path to a specific input stimulus to test. For example: -stimulus \({/stimulus/env0000.pgm}\) command will test the file env0000.pgm of the stimulus folder. |
Test the exported network:
cd export_CPP_OpenCL_float32
make
./bin/n2d2_opencl_test -gpu
CPP_cuDNN export¶
The cuDNN export can run the generated program in NVIDIA GPU architecture. It use CUDA and cuDNN library. Compilation features:
Preprocessor command [default value] |
Description |
---|---|
|
Compile the binary with a synchronization between each layers and return the mean execution time of each layer. This preprocessor option can decrease performances. |
|
Compile the binary with the 32-bits architecture compatibility. |
Program options related to the cuDNN export:
Option [default value] |
Description |
---|---|
|
Size of the batch to use |
|
CUDA Device ID selection |
|
Path to a specific input stimulus to test. For example: -stimulus \({/stimulus/env0000.pgm}\) command will test the file env0000.pgm of the stimulus folder. |
Test the exported network:
cd export_CPP_cuDNN_float32
make
./bin/n2d2_cudnn_test
C_HLS export¶
Test the exported network:
cd export_C_HLS_int8
make
./bin/n2d2_test
Run the High-Level Synthesis (HLS) with Xilinx Vivado HLS:
vivado_hls -f run_hls.tcl
Layer compatibility table¶
Layer compatibility table in function of the export type:
Layer compatibility table |
Export type |
|||
---|---|---|---|---|
C |
C_HLS |
CPP_OpenCL |
CPP_TensorRT |
|
Conv |
✓ |
✓ |
✓ |
✓ |
Pool |
✓ |
✓ |
✓ |
✓ |
Fc |
✓ |
✓ |
✓ |
✓ |
Softmax |
✓ |
✗ |
✓ |
✓ |
FMP |
✓ |
✗ |
✓ |
✗ |
Deconv |
✗ |
✗ |
✗ |
✓ |
ElemWise |
✗ |
✗ |
✗ |
✓ |
Resize |
✓ |
✗ |
✗ |
✓ |
Padding |
✗ |
✗ |
✗ |
✓ |
LRN |
✗ |
✗ |
✗ |
✓ |
Anchor |
✗ |
✗ |
✗ |
✓ |
ObjectDet |
✗ |
✗ |
✗ |
✓ |
ROIPooling |
✗ |
✗ |
✗ |
✓ |
RP |
✗ |
✗ |
✗ |
✓ |
BatchNorm is not mentionned because batch normalization parameters are automatically fused with convolutions parameters with the command “-fuse”.