Hung LC

Kneron KL520 NPU mPCIe MiniCard Module

Appendix A – Toolchain User Manual 11 le M2AI-42-520 A.1 Toolchain User Manual The following attachment, Toolchain User Manual, is supplied with this document and is meant for use with AAEON products featuring the Kneron KL520 NPU Module. If you have any questions regarding this document or your AAEON product, please contact

MINI-AI-520 - AAEON

Safety Precautions. Please read the following safety instructions carefully. It is advised that you keep this manual for future references.

MINI-AI-520

Kneron MINI-AI-520 UM
MINI-AI-520
Kneron KL520 NPU mPCIe MiniCard Module
User 's Manual 2nd Ed
Last Updated: April 10, 2020

Kneron KL520 NPU Module

Copyright Notice
This document is copyrighted, 2020. All rights are reserved. The original manufacturer reserves the right to make improvements to the products described in this manual at any time without notice.
No part of this manual may be reproduced, copied, translated, or transmitted in any form or by any means without the prior written permission of the original manufacturer. Information provided in this manual is intended to be accurate and reliable. However, the original manufacturer assumes no responsibility for its use, or for any infringements upon the rights of third parties that may result from its use.
The material in this document is for product information only and is subject to change without notice. While reasonable efforts have been made in the preparation of this document to assure its accuracy, AAEON assumes no liabilities resulting from errors or omissions in this document, or from the use of the information contained herein.
AAEON reserves the right to make changes in the product design without notice to its
users.

MINI-AI-520

Preface

II

Kneron KL520 NPU Module

Acknowledgements
All other products' name or trademarks are properties of their respective owners.
 Microsoft Windows® and Windows® 10 are registered trademarks of Microsoft Corp.
 Ubuntu is a registered trademark of Canonical  Kneron and the Kneron logo are trademarks of Kneron Inc.  TensorFlowTM is a registered trademark of Google LLC  Apache, Apache MXNet, and MXNet are registered trademarks of the Apache
Software Foundation
All other product names or trademarks are properties of their respective owners. No ownership is implied or assumed for products, names or trademarks not herein listed by the publisher of this document.

MINI-AI-520

Preface

III

Kneron KL520 NPU Module

Packing List

Before setting up your product, please make sure the following items have been shipped:

Item  MINI-AI-520 M.2 Module  M3 screw

Quantity 1 2

If any of these items are missing or damaged, please contact your distributor or sales representative immediately.

MINI-AI-520

Preface

IV

Kneron KL520 NPU Module

About this Document
This User's Manual contains all the essential information, such as detailed descriptions and explanations on the product's hardware and software features (if any), its specifications, dimensions, jumper/connector settings/definitions, and driver installation instructions (if any), to facilitate users in setting up their product.
Users may refer to the product page on AAEON.com for the latest version of this document.

MINI-AI-520

Preface

V

Safety Precautions

Kneron KL520 NPU Module

MINI-AI-520

Please read the following safety instructions carefully. It is advised that you keep this manual for future references

1. All cautions and warnings on the device should be noted. 2. Make sure the power source matches the power rating of the device. 3. Position the power cord so that people cannot step on it. Do not place anything
over the power cord. 4. Always completely disconnect the power before working on the system's
hardware. 5. No connections should be made when the system is powered as a sudden rush
of power may damage sensitive electronic components. 6. If the device is not to be used for a long time, disconnect it from the power
supply to avoid damage by transient over-voltage. 7. Always disconnect this device from any power supply before cleaning. 8. While cleaning, use a damp cloth instead of liquid or spray detergents. 9. Make sure the device is installed near a power outlet and is easily accessible. 10. Keep this device away from humidity. 11. Place the device on a solid surface during installation to prevent falls. 12. Do not cover the openings on the device to ensure optimal heat dissipation. 13. Watch out for high temperatures when the system is running. 14. Do not touch the heat sink or heat spreader when the system is running 15. Never pour any liquid into the openings. This could cause fire or electric shock. 16. As most electronic components are sensitive to static electrical charge, be sure
to ground yourself to prevent static charge when installing the internal components. Use a grounding wrist strap and contain all electronic components in any static-shielded containers.

Preface

VI

Kneron KL520 NPU Module

17. If any of the following situations arises, please the contact our service personnel:

i.

Damaged power cord or plug

ii.

Liquid intrusion to the device

iii.

Exposure to moisture

iv.

Device is not working as expected or in a manner as described in

this manual

v.

The device is dropped or damaged

vi.

Any obvious signs of damage displayed on the device

18. Do not leave this device in an uncontrolled environment with temperatures

beyond the device's permitted storage temperatures (see chapter 1) to prevent

damage.

19. Do NOT disassemble the motherboard so as not to damage the system or void

your warranty.

20. If the thermal pad had been damaged, please contact AAEON's salesperson to

purchase a new one. Do NOT use those of other brands.

21. The Hex Cylinder Coppers on the front panel are not removable.

22. Repeatedly assemble and disassemble the system may cause damages to the

exterior paint and surface and screw holes.

23. Use the right size screwdriver.

24. Use the screwdriver correctly to remove screws from the system.

MINI-AI-520

Preface

VII

Kneron KL520 NPU Module

FCC Statement
This device complies with Part 15 FCC Rules. Operation is subject to the following two conditions: (1) this device may not cause harmful interference, and (2) this device must accept any interference received including interference that may cause undesired operation.
Caution:
There is a danger of explosion if the battery is incorrectly replaced. Replace only with the same or equivalent type recommended by the manufacturer. Dispose of used batteries according to the manufacturer's instructions and your local government's recycling or disposal directives.
Attention:
Il y a un risque d'explosion si la batterie est remplacée de façon incorrecte. Ne la remplacer qu'avec le même modèle ou équivalent recommandé par le constructeur. Recycler les batteries usées en accord avec les instructions du fabricant et les directives gouvernementales de recyclage.

MINI-AI-520

Preface

VIII

Kneron KL520 NPU Module

China RoHS Requirements (CN)



AAEON Embedded Box PC/ Industrial System 



 

   

 

(Pb) (Hg) (Cd) (Cr(VI)) (PBB)











(PBDE) 











































































O SJ/T 11363-2006 

X SJ/T 11363-2006 

  

MINI-AI-520

Preface

IX

Kneron KL520 NPU Module

China RoHS Requirement (EN)

Poisonous or Hazardous Substances or Elements in Products AAEON Embedded Box PC/ Industrial System

Component
PCB & Other Components Wires & Connectors for External Connections Chassis

Lead (Pb) 



Poisonous or Hazardous Substances or Elements

Mercury (Hg)

Cadmium (Cd)

Hexavalent Chromium
(Cr(VI))

Polybrominated Biphenyls (PBB)

Polybrominated Diphenyl Ethers
(PBDE)































CPU & RAM 











Hard Disk













PSU













OThe quantity of poisonous or hazardous substances or elements found in each of the component's parts is below the SJ/T 11363-2006-stipulated requirement.

X: The quantity of poisonous or hazardous substances or elements found in at least one of the component's parts is beyond the SJ/T 11363-2006-stipulated requirement.

Note: The Environment Friendly Use Period as labeled on this product is applicable under normal usage only

MINI-AI-520

Preface

X

Kneron KL520 NPU Module

Table of Contents
Chapter 1 - Product Specifications ........................................................................................ 1 1.1 MINI-AI-520 Kneron NPU Module Specifications ........................................... 2
Chapter 2 ­ Hardware Information .......................................................................................3 2.1 Dimensions .............................................................................................................. 4 2.2 Block Diagram......................................................................................................... 5 2.3 Board Design........................................................................................................... 6 2.4 List of Connectors ...................................................................................................7 2.4.1 UART Connector (CN2) ..........................................................................7 2.4.2 UART Connector (CN3) ......................................................................... 8 2.4.3 Mini-Card Connector (CN4) ................................................................. 8
Appendix A ­ Toolchain User Manual................................................................................. 10 A.1 Toolchain User Manual..........................................................................................11

MINI-AI-520

Preface

XI

Kneron KL520 NPU Module

Chapter 1
Chapter 1 - Product Specifications

MINI-AI-520

Kneron KL520 NPU Module

1.1 MINI-AI-520 Kneron NPU Module Specifications

System IC Type Support Framework Support Model
Memory Type NPU Power Efficiency Overall Power Consumption

Kneron KL520 Integrated SoC ONNX, TensorFlow, Keras, Caffe Vgg16, Resnet, GoogleNet, YOLO, Tiny YOLO, Lenet, MobileNet, DenseNet LPDDR2 0.56TOPS/W 0.5W

Other Specifications Operating Temperature Storage Temperature Operating Humidity Certification

32°F ~ 158°F (0°C ~ 70°C) -40°F ~ 185°F (-40°C ~ 85°C) 0% ~ 90% relative humidity, non-condensing CE/FCC Class A

MINI-AI-520

Chapter 1 ­ Product Specifications

2

Kneron KL520 NPU Module

Chapter 2
Chapter 2 ­ Hardware Information

MINI-AI-520

2.1 Dimensions

Kneron KL520 NPU Module

MINI-AI-520

Chapter 2 ­ Hardware Information

4

2.2 Block Diagram

Kneron KL520 NPU Module

MINI-AI-520

Chapter 2 ­ Hardware Information

5

2.3 Board Design

Kneron KL520 NPU Module

MINI-AI-520

Chapter 2 ­ Hardware Information

6

Kneron KL520 NPU Module

2.4 List of Connectors

This section details the connectors featured on the AI Core X module. This is a reference to help with setup and configuration for your application.

Label CN2 CN3 CN4

Connector Type UART Connector UART Connector Mini-Card Connector

2.4.1 UART Connector (CN2)

MINI-AI-520

Pin

Signal Description

1

UART0_RX

2

UART0_TX

3

GND

Chapter 2 ­ Hardware Information

7

2.4.2 UART Connector (CN3)

Kneron KL520 NPU Module

Pin

Signal Description

1

UART1_RX

2

UART1_TX

3

GND

2.4.3 Mini-Card Connector (CN4)

MINI-AI-520

Pin

Signal Description

Pin

Signal Description

1

X_PTN

27

GND

2

3.3V_M2

28

NC

3

NC

29

GND

4

GND

30

NC

5

NC

31

NC

Chapter 2 ­ Hardware Information

8

Kneron KL520 NPU Module

Pin

Signal Description

Pin

Signal Description

6

NC

32

NC

7

NC

33

NC

8

NC

34

GND

9

GND

35

GND

10

NC

36

USB_D-

11

NC

37

GND

12

NC

38

USB_D+

13

NC

39

3.3V_M2

14

NC

40

GND

15

GND

41

3.3V_M2

16

NC

42

NC

17

NC

43

GND

18

GND

44

NC

19

NC

45

NC

20

X_PTN

46

NC

21

GND

47

NC

22

NC

48

NC

23

NC

49

NC

24

3.3V_M2

50

GND

25

NC

51

NC

26

GND

52

3.3V_M2

MINI-AI-520

Chapter 2 ­ Hardware Information

9

Kneron KL520 NPU Module

Appendix A
Appendix A ­ Toolchain User Manual

MINI-AI-520

Kneron KL520 NPU Module

A.1 Toolchain User Manual
The following attachment, Toolchain User Manual, is supplied with this document and is meant for use with AAEON products featuring the Kneron KL520 NPU Module. If you have any questions regarding this document or your AAEON product, please contact your sales representative for assistance.

M2AI-2242-520

Appendix A ­ Toolchain User Manual

11

Kneron Toolchain User Guide
for AAEON Products with KL520 NPU

Kneron Toolchain User Guide

Supplemental Guide for KL520 NPU

Table of Contents
Chapter 1 Overview ................................................................................................................ 4 Chapter 2 Introduction ........................................................................................................... 5
2.1 Work Flow .............................................................................................................. 5 Chapter 3 Docker Installation ............................................................................................... 6
3.1 System Requirements .......................................................................................... 6 3.2 Installation .............................................................................................................. 6 Chapter 4 Sample Tutorial..................................................................................................... 7 4.1 Start the Docker Image ....................................................................................... 7 4.2 Converter................................................................................................................ 8
4.2.1 Keras to ONNX............................................................................................. 8 4.2.2 Tensorflow to ONNX................................................................................... 8 4.2.3 Pytorch to ONNX......................................................................................... 9 4.2.4 Pytorch-ONNX to ONNX ........................................................................... 9 4.2.5 Caffe to ONNX ............................................................................................. 9 4.2.6 ONNX to ONNX..........................................................................................10 4.2.7 Edit Function ................................................................................................10 4.3 FpAnalyser, Compiler and IpEvaluator............................................................12 4.3.1 Fill Input Parameters ..................................................................................12 4.3.2 Running the Program ................................................................................12 4.3.3 Get the Result ..............................................................................................13 4.4 Simulator and Emulator .....................................................................................13 4.4.1 Fill the Input Parameters ...........................................................................13 4.4.2 Running the Programs...............................................................................13 4.4.3 Get the Results ............................................................................................13 4.5 Compiler and Evaluator .................................................................................... 14 4.5.1 Fill the Input Parameters .......................................................................... 14 4.5.2 Running the Programs.............................................................................. 14

Table of Contents

2

Kneron Toolchain User Guide

4.5.3 Get the Result ............................................................................................. 14 4.6 FpAnalyser and Batch-Compile ....................................................................... 14
4.6.1 Fill the Input Parameters .......................................................................... 14 4.6.2 Running the Programs...............................................................................15 4.6.3 Get the Result ..............................................................................................16 4.7 Draw YOLO Result on Images ..........................................................................16 4.7.1 Steps ..............................................................................................................16 4.8 FAQ.........................................................................................................................17 4.8.1 How to configure the input_params.json?.....................................................17 4.8.2 Fails when implement models with SSD structure. ..............................21 4.8.3 Fails in the step of FpAnalyser ................................................................ 23 4.8.4 Other unsupported models.....................................................................23 4.8.5 The functions KDP520 NPU supports....................................................23 4.8.6 What's the meaning of simulator's output? .........................................24 4.8.7 How to configure the batch_compile_input_params.json?...............24 4.8.8 What's the meaning of the output files of batch-compile?..............26 4.8.9 How to use customized methods for image preprocess? ................ 27 Chapter 5 Firmware Management.....................................................................................28 5.1 Update Firmware ................................................................................................ 28 5.2 General Model Firmware .................................................................................. 29 5.3 Model Update ..................................................................................................... 30

Supplemental Guide for KL520 NPU

Table of Contents

3

Kneron Toolchain User Guide

Chapter 1 Overview

KDP toolchain is a software integrating a series of libraries to simulate the operation in the hardware KDP 520. Table 1 shows the list of functions KDP520 supports.

Table 1: KDP520 Supported Functions

Layers/Modules Convolution
Pooling
Activation Other processing

Functions/Parameters Convolution kernel dimentison: Stride Padding: Depthwise Conv Deconvolution Max pooling 3x3 Max pooling 2x2 Ave Pooling 3x3 Ave Pooling 2x2 global ave pooling global max pooling ReLu Leaky ReLU PReLU ReLU6 Batch Normalization Add Concatenation Dense/Fully Connected Flatten

Spec. 1x1 up to 11x11 1,2,4 0-15 Yes Use Upsampling + Conv stride 1,2,3 stride 1,2 stride 1,2,3 stride 1,2 Support Support Support Support Support Support Support Support Support Support Support

Supplemental Guide for KL520 NPU

Chapter 1 - Overview

4

Kneron Toolchain User Guide

Chapter 2 Introduction
2.1 Work Flow
To fully utilize Kneron SDK and get detailed information from the running programs, besides the toolchain GUI, Kneron provides a Linux command toolchain containing the following functions: (1) Converting deep learning models from different deep learning frameworks (Keras, Tensorflow, Pytorch, Caffe) to ONNX format; (2) Conducting fixed pointer analysis on the selected model and image dataset; compiling the related model file to Kneron IP's corresponding instructions, weight file, and data flow controls; (3) Running IP evaluator, as well as simulator and emulator on the selected model.

Supplemental Guide for KL520 NPU

Chapter 2 ­ Introduction

5

Kneron Toolchain User Guide

Chapter 3 Docker Installation
3.1 System Requirements System must be running Ubuntu 16.04
3.2 Installation Open Terminal and enter the following command: $ sudo chmod +x install_docker Next, enter the following command: $ sudo ./install_docker Docker is now installed.

Supplemental Guide for KL520 NPU

Chapter 3 ­ Docker Installation

6

Kneron Toolchain User Guide

Chapter 4 Sample Tutorial
4.1 Start the Docker Image
After installing docker, you can start the docker image you just pulled, and get a docker container to run the toolchain. When you start it, you need to configure a local folder as the one for communicating between your local environment and the container. For this example, let's call it Interactive Folder Assume the absolute path of the folder you configure is absolute_path_of_your_folder.
The start command is: docker run -it --rm -v absolute_path_of_your_folder:/data1 kneron/toolchain:linux_command_toolchain
For example, if the absolute path of the path folder you configure is /home/aaeon/Document/test_docker, and then the related command is docker run -it --rm -v /home/aaeon/Document/test_docker:/data1 kneron/toolchain:linux_command_toolchain
After running the start command, you'll enter into the docker container. Then, copy the example materials to the Interactive Folder by the following command: cp -r /workspace/examples/* /data1/

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

7

Kneron Toolchain User Guide

4.2 Converter
4.2.1 Keras to ONNX
Since the Onet model is in Keras format, you need to convert it from Keras to ONNX by the following command:
python /workspace/onnx-keras/generate_onnx.py -o absolute_path_of_output_model_file absolute_path_of_input_model_file -O -C --duplicate-shared-weights
For example:
python /workspace/onnx-keras/generate_onnx.py -o /data1/onet0.417197.onnx /data1/keras/onet0.417197.hdf5 -O -C --duplicateshared-weights
There might be some warning log when running this problem, and you can check whether the convert works successfully by checking whether the onnx file is generated.
If there's customized input shape for the model file, you need to use the following command:
python /workspace/onnx-keras/generate_onnx.py absolute_path_of_input_model_file -o absolute_path_of_output_model_file -I 1 model_input_width model_input_height num_of_channel
4.2.2 Tensorflow to ONNX
Use the following command to convert from Tensorflow to ONNX:
/workspace/scripts/tf2onnx.sh absolute_path_of_input_model_file absolute_path_of_output_onnx_model_file name_of_input_layer:0 name_of_output_layer:0
For example:
/workspace/scripts/tf2onnx.sh /data1/tensorflow/model/mnist.pb /data1/mnist.pb.onnx Placeholder:0 fc2/add:0

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

8

Kneron Toolchain User Guide

4.2.3 Pytorch to ONNX
Use the following command to convert from Pytorch to ONNX:
python /workspace/scripts/pytorch2onnx.py absolute_path_of_input_model_file channel_number model_input_height model_input_width absolute_path_of_output_model_file
For example:
python /workspace/scripts/pytorch2onnx.py /data1/pytorch/models/resnet34.pth 3 224 224 /data1/resnet34.onnx
4.2.4 Pytorch-ONNX to ONNX
Although pytorch support to produce onnx file, it still needs to be converted here.
python /workspace/scripts/pytorch2onnx.py absolute_path_of_input_pytorch_onnx_model_file channel_number model_input_height model_input_width absolute_path_of_output_model_file
4.2.5 Caffe to ONNX
Use the following command to convert from Caffe to ONNX:
python /workspace/onnx-caffe/generate_onnx.py -o absolute_path_of_output_onnx_model_file -w absolute_path_of_input_caffe_weight_file -n absolute_path_of_input_caffe_model_file
For example:
python /workspace/onnx-caffe/generate_onnx.py -o /data1/mobilenetv2.onnx -w /data1/caffe/models/mobilenetv2.caffemodel -n /data1/caffe/models/mobilenetv2.prototxt

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

9

Kneron Toolchain User Guide

4.2.6 ONNX to ONNX
If you have you own onnx file or if there's problem with the onnx converted by the command above, you need to run the following command: python /workspace/scripts/onnx2onnx.py absolute_path_of_your_input_onnx_model_file -o absolute_path_of_output_onnx_model_file (-m)
Add ­m when there is customized layer in your model.
This operation will optimize the mathematical computation in your model.
4.2.7 Edit Function

Supplemental Guide for KL520 NPU

4.2.7.1 Feature
There is a script called edit.py in the folder /workspace/scripts, and it is an simple ONNX editor which achieves the following functions:
(1) Add nop BN or Conv nodes.
(2) Delete specific nodes or inputs.
(3) Cut the graph from certain node (Delete all the nodes following the node).
(4) Reshape inputs and outputs
4.2.7.2 Usage
Usage of the Edit Function is as follows:
editor.py [-h] [-c CUT_NODE [CUT_NODE ...]] [--cut-type CUT_TYPE [CUT_TYPE ...]] [-d DELETE_NODE [DELETE_NODE ...]] [--delete-input DELETE_INPUT [DELETE_INPUT ...]] [-i INPUT_CHANGE [INPUT_CHANGE ...]] [-o OUTPUT_CHANGE [OUTPUT_CHANGE ...]] [--add-conv ADD_CONV [ADD_CONV ...]] [--add-bn ADD_BN [ADD_BN ...]] in_file out_file
Edit an ONNX model. The processing sequense is 'delete nodes/values' -> 'add nodes' -> 'change shapes'. Cutting cannot be done with other operations together.

Chapter 5 ­ Firmware Management

10

Kneron Toolchain User Guide

Supplemental Guide for KL520 NPU

Positional arguments: in_file input ONNX FILE out_file ouput ONNX FILE Optional arguments: -h, --help show this help message and exit -c CUT_NODE [CUT_NODE ...], --cut CUT_NODE [CUT_NODE ...] remove nodes from the given nodes(inclusive) --cut-type CUT_TYPE [CUT_TYPE ...] remove nodes by type from the given nodes(inclusive) -d DELETE_NODE [DELETE_NODE ...], --delete DELETE_NODE [DELETE_NODE ...] delete nodes by names and only those nodes --delete-input DELETE_INPUT [DELETE_INPUT ...] delete inputs by names -i INPUT_CHANGE [INPUT_CHANGE ...], --input INPUT_CHANGE [INPUT_CHANGE ...] change input shape (e.g. -i 'input_0 1 3 224 224') -o OUTPUT_CHANGE [OUTPUT_CHANGE ...], --output OUTPUT_CHANGE [OUTPUT_CHANGE ...] change output shape (e.g. -o 'input_0 1 3 224 224') --add-conv ADD_CONV [ADD_CONV ...] add nop conv using specific input --add-bn ADD_BN [ADD_BN ...] add nop bn using specific input

Chapter 5 ­ Firmware Management

11

Kneron Toolchain User Guide

4.2.7.3 Example
(1) In the /workspace/scripts/res folder, there is a VDSR model from Tensorflow. Convert this model first. cd /workspace/scripts && ./tf2onnx.sh res/vdsr_41_20layer_1.pb res/tmp.onnx images:0 output:0 (2) This ONNX file seems valid. But, it's channel last for the input and output. It is using Transpose to convert to channel first, affecting the performance. Use the editor to delete the Transpose and reset the shapes. cd /workspace/scripts && python editor.py res/tmp.onnx new.onnx -d Conv2D__6 Conv2D_19__84 -i 'images:0 1 3 41 41' -o 'output:0 1 3 41 41' Now, it has no Transpose and takes channel first inputs directly.
4.3 FpAnalyser, Compiler and IpEvaluator
4.3.1 Fill Input Parameters
Before running the programs, you need to configure the input parameters by the input_params.json in Interactive Folder. The initial file of input_params.json is for Keras Onet model. You can see the detailed explanation for the input parameters in the FAQ Question 1. 4.3.2 Running the Program
After filling the related parameters in input_params.json, you can run the programs by the following command: cd /workspace/scripts && ./fpAnalyserCompilerIpevaluator.sh.x After running this program, the folders called compiler and fpAnalyser will be generated in the Interactive tFolder, which store the result of compiler, ipEvaluator and fpAnalyser.

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

12

Kneron Toolchain User Guide

4.3.3 Get the Result In Interactive Folder, you'll find a folder called fpAnlayer, which contains the preprocessed image txt files; a folder called compiler, which contains the binary files generated by compiler, as well as evaluation result of ipEvaluator.
4.4 Simulator and Emulator
4.4.1 Fill the Input Parameters Fill the simulator and emulator input parameters in the input_params.json in Interactive Folder. Please refer to the FAQ question 1 to fill the related parameters. 4.4.2 Running the Programs For running the simulator:
cd /workspace/scripts && ./simulator.sh.x And a folder called simulator will be generated in Interactive Folder, which stores the result of the simulator. For running the emulator: cd /workspace/scripts && ./emulator.sh.x And a folder called emulator will be generated in Interactive Folder, which stores the result of the emulator.
4.4.3 Get the Results In Interactive Folder, you'll find a folder called simulator, which contains the output files of simulator; a folder called emulator, which contains the output folders of simulator. In each folder, there are three files: one is the input image file, one whose format is "temp***.txt" is the output of the last layer, and the other one is the preprocess image result.

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

13

Kneron Toolchain User Guide

4.5 Compiler and Evaluator
This part is similar with part 3.4, and the difference is that this part does not run fpAnalyser, it can be used when your model structure is prepared but hasn't been trained. 4.5.1 Fill the Input Parameters
Fill the simulator and emulator input parameters in the input_params.json in Interactive Folder. Please refer to the FAQ question 1 to fill the related parameters. 4.5.2 Running the Programs
For running the compiler and ip evaluator: cd /workspace/scripts && ./compilerIpevaluator.sh.x And a folder called simulator will be generated in Interactive Folder, which stores the result of the compiler and ipEvaluator. 4.5.3 Get the Result
In Interactive Folder, you'll find a folder called compiler, which contains the output files of the compiler and ipEvaluator.
4.6 FpAnalyser and Batch-Compile
This part is the instructions for batch-compile, which will generate the binary file requested by firmware. 4.6.1 Fill the Input Parameters
Fill the simulator and emulator input parameters in the /data1/batch_compile_input_params.json in Interactive Folder. Please refer to the FAQ question 7 to fill the related parameters.

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

14

Kneron Toolchain User Guide

The following two examples show how to configure the batch_compile_input_params.json.
(1) tiny_yolo_v3
{ "input_image_folder": ["/data1/caffe/images"], "img_channel": ["RGB"], "model_input_width": [224], "model_input_height": [224], "img_preprocess_method": ["yolo"], "input_onnx_file": ["/data1/yolov3-tiny-224.h5.onnx"], "keep_aspect_ratio": "True", "command_addr": "0x30000000", "weight_addr": "0x40000000", "sram_addr": "0x50000000", "dram_addr": "0x60000000", "whether_encryption": "No", "encryption_key": "0x12345678", "model_id_list": [19], "model_version_list": [1] }
(2) tiny_yolo_v3 and Onet
{ "input_image_folder": ["/data1/caffe/images", "/data1/keras/n000645"], "img_channel": ["RGB", "L"], "model_input_width": [224, 48], "model_input_height": [224, 48], "img_preprocess_method": ["yolo", "kneron"], "input_onnx_file": ["/data1/yolov3-tiny-224.h5.onnx", "/data1/onet0.417197.onnx"], "keep_aspect_ratio": "True", "command_addr": "0x30000000", "weight_addr": "0x40000000", "sram_addr": "0x50000000", "dram_addr": "0x60000000", "whether_encryption": "No", "encryption_key": "0x12345678", "model_id_list": [19, 20], "model_version_list": [1, 1] }
4.6.2 Running the Programs
For running the compiler and ip evaluator:
cd /workspace/scripts && ./fpAnalyserBatchCompile.sh.x
And a folder called batch_compile will be generated in Interactive Folder, which stores the result of the fpAnalyer and batch-compile. (all_models.bin & fw_info.bin are in batch compile/compiler folder)

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

15

Kneron Toolchain User Guide

4.6.3 Get the Result
In Interactive Folder, you'll find a folder called compiler, which contains the output files of the fpAnalyer and batch-compile. If you have questions for the meaning of the output files, please refer to the FAQ question 8.
4.7 Draw YOLO Result on Images
The toolchain also provides the function of drawing final result on images for yolo model, i.e. drawing the box and class name.
4.7.1 Steps
(1) Follow the part 3.4 Simulator and Emulator, it will generate the result of emulator for multiply images, and the result folder path is "/data1/emulator". In this folder, the original image, the preprocess image txt file and the final output of the model are classified in different folder.
(2) run the scripts to draw the yolo result cd /workspace/scripts/utils/yolo && python convert_sim_result_yolo.py
After this step, the drawing result will be saved in the subfolders of "/data1/emulator", with the format "imgname_thresh_xxx.png", xxx means the threshold for the box score, which means only the boxes with score higher than this threshold will be drawn in this image.

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

16

4.8 FAQ

Kneron Toolchain User Guide

Supplemental Guide for KL520 NPU

4.8.1 How to configure the input_params.json? By following the above instructions, the input_params.json will be saved in Interactive Folder. Please do not change the parameters' names. The parameters in input_params.json are: (1) input_image_folder The absolute path of input image folder for fpAnalyser. (2) img_channel Options: L, RGB The channel information after the input image is preprocessed. L means single channel. Input for fpAnalyer. (3) model_input_width The width of the model input size.Input for fpAnalyer. (4) model_input_height The height of the model input size. (5) img_preprocess_method Options: kneron, tensorflow, yolo, caffe, pytorch The image preprocess methods, input for fpAnlayer, and the related formats are following: "kneron": RGB/256 - 0.5, "tensorflow": RGB/127.5 - 1.0, "yolo": RGB/255.0 "pytorch": (RGB/255. -[0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]

Chapter 5 ­ Firmware Management

17

Kneron Toolchain User Guide

"caffe"(BGR format) BGR - [103.939, 116.779, 123.68] "customized": please refer to FAQ question 9 (6) input_onnx_file The absolute path of the onnx file, which works as the input file for fpAnalyser. (7) keep_aspect_ratio Options: True, False Indicates whether or not to keep the aspect ratio. (8) command_addr Address for command, input for compiler. (9) weight_addr Address for weight, input for compiler. (10) sram_addr Address for sram, input for compiler. (11) dram_addr Address for dram, input for compiler. (12) whether_encryption Option: Yes, No Whether add encryption on the bin files generated by compiler, input for compiler. (13) encryption_key Encryption key for bin files, input for compiler. (14) simulator_img_file Input for simulator. The absolute path of the image you want to inferenced by simulator. (15) emulator_img_folder

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

18

Kneron Toolchain User Guide

Supplemental Guide for KL520 NPU

The absolute path of the image folder you want to inferenced by emulator. (16) cmd_bin The absolute path of command binary file, which is the input file for simulator or emulator. (17) weight_bin The absolute path of weight binary file, which is the input file for simulator or emulator. (18) setup_bin The absolute path of setup binary file, which is the input file for simulator or emulator. (19) whether_npu_preprocess The option for whether simulator or emulator using the same image processing as the npu uses. If false, the parameters (20) - (25) will not be utilized. Parameters (20) - (23) is for npu image preprocessing. (21) raw_img_fmt The input image format for simulator and emulator Options: IMG, RGB565, NIR888 IMG: jpg/png/jpeg/bmp image files RGB565: binary file with rgb565 format; NIR888: binary file with nir888 format. (21) radix The radix information for the npu image process. The formula for radix is 7 ­ ceil(log2 (abs_max)) For example, if the image processing method we utilize is "kneron", which is introduced in the parameter (5). So the related image processing formula is "kneron": RGB/256 - 0.5, and the processed value range will be (0.5, 0.5), and then abs_max = max(abs(-0.5), abs(0.5)) = 0.5

Chapter 5 ­ Firmware Management

19

Kneron Toolchain User Guide

Radix = 7 ­ ceil(log2(abs_max)) = 7 - (-1) = 8 (22) pad_mode This is the option for the mode of adding paddings, and it will be utilized only when (7) keep_aspect_ratio is True. And it has two options: 0 and 1. 0 ­ If the original width is too small, the padding will be added at both right and left sides equally; if the original height is too small, the padding will be added at both up and down sides equally. 1 ­ If the original width is too small, the padding will be added at the right side onlyl if the original height is too small, the padding will be only added at the down side. (23) rotate It has three options: 0 ­ no rotating operation 1 ­ rotate 90 degrees in clockwise direction 2 ­ rotate 90 degrees in counter-clockwise direction (24) pCrop The parameters for cropping image. And it has four sub parameters. - bCropFirstly, whether cropping the image firstly, if false, the following parameters won't be utilized, and there won't be any cropping operations. -crop_x, cropy, the left-up cropping point coordinate. -crop_w, the width of the cropped image. -crop_h, the height of the cropped image. (25) imgSize: -width: input image width -height: input image height

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

20

Kneron Toolchain User Guide

4.8.2 Fails when implement models with SSD structure.
Currently, our NPU does not support SSD like network since it has Reshape and Concat operations in the end of the model, but we do offer a work around solution to this situation.
The reason we do not support Reshape and Concat operation is that
We do offer Reshape operation capability, However, we only support regular shape transportation. Which means you could flatten your data or extract some channels form the feature map. However, NPU does not expect complex transportation. For example, in Figure 13, you could notice there is a 1x12x4x5 feature map reshapes to 1x40x6.
For Concat operation, the NPU also supports channel-based feature map concatenation. However, it does not support Concate operation based on another axis. For example, in Figure 14, The concatenation on based on axis 1 and the following concatenation is based on axis 2.
The workaround we will offer is that deleting these Reshape and Concat operations and enable make the model to a multiple outputs model since they are in the end of the models and they do not change the output feature map data. So converted model should look like this as in Figure 15.

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

21

Kneron Toolchain User Guide

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

2 2

Kneron Toolchain User Guide

4.8.3 Fails in the step of FpAnalyser

When it shows the log "mystery", it means there are some customized layers in the model you input, which are not support now;
When it shows the log "start datapath analysis", you need to check whether you input the proper image preprocess parameters.
4.8.4 Other unsupported models

This version of SDK doesn't support the models in the following situations: (1) Have customized layers. 4.8.5 The functions KDP520 NPU supports

Layers/Modules Convolution
Pooling
Activation Other processing

Functions/Parameters Convolution kernel dimentison: Stride Padding: Depthwise Conv Deconvolution Max pooling 3x3 Max pooling 2x2 Ave Pooling 3x3 Ave Pooling 2x2 global ave pooling global max pooling ReLu Leaky ReLU PReLU ReLU6 Batch Normalization Add Concatenation Dense/Fully Connected Flatten

Spec. 1x1 up to 11x11 1,2,4 0-15 Yes Use Upsampling + Conv stride 1,2,3 stride 1,2 stride 1,2,3 stride 1,2 Support Support Support Support Support Support Support Support Support Support Support

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

23

Kneron Toolchain User Guide

4.8.6 What's the meaning of simulator's output?
estimate FPS float => average Frame Per Second
total time => total time duration for single image inference on NPU
MAC idle time => time duration when NPU MAC engine is waiting for weight loading or data loading
MAC running time => time duration when NPU MAC engine is running
average DRAM bandwidth => average DRAM bandwidth used by NPU to complete inference
total theoretical convolution time => theoretically minimum total run time of the model when MAC efficiency is 100%
MAC efficiency to total time => time ratio of the theoretical convolution time to the total time
4.8.7 How to configure the batch_compile_input_params.json?
By following the above instructions, the batch_compile_input_params.json will be saved in Interactive Folder.
Please do not change the parameters' names.
The parameters in batch_compile_input_params.json are:
(1) input_image_folder
The absolute path of input image folder for fpAnalyser. Since that batch-compile can compile more than one models together, the order of the input_imgae_folder is related to the order of input_onnx_file.
(2) img_channel
Options: L, RGB
The channel information after the input image is preprocessed. L means single channel. Input for fpAnalyer. Same as (1), the order of the img_channel is related to the order of input_onnx_file.
(3) model_input_width

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

24

Kneron Toolchain User Guide

Supplemental Guide for KL520 NPU

The width of the model input size.Input for fpAnalyer. Same as (1), the order of the model_input_width is related to the order of input_onnx_file. (4) model_input_height The height of the model input size. Same as (1), the order of the model_input_height is related to the order of input_onnx_file. (5) img_preprocess_method Options: kneron, tensorflow, yolo, caffe, pytorch. Same as (1), the order of the img_preprocess_method is related to the order of input_onnx_file. The image preprocess methods, input for fpAnlayer, and the related formats are following: "kneron": RGB/256 - 0.5, "tensorflow": RGB/127.5 - 1.0, "yolo": RGB/255.0 "pytorch": (RGB/255. -[0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225] "caffe"(BGR format) BGR - [103.939, 116.779, 123.68] (6) input_onnx_file The absolute path of the onnx file, which works as the input file for fpAnalyser. Since that the batch-compile can compile more than one models at a time. The order of the model in the array of input_onnx_file decides the model binary's order in all_models.bin (7) keep_aspect_ratio Options: True, False Indicates whether or not to keep the aspect ratio. (8) command_addr Address for command, input for compiler. (9) weight_addr Address for weight, input for compiler.

Chapter 5 ­ Firmware Management

25

Kneron Toolchain User Guide

Supplemental Guide for KL520 NPU

(10) sram_addr Address for sram, input for compiler. (11) dram_addr Address for dram, input for compiler. (12) whether_encryption Option: Yes, No Whether add encryption on the bin files generated by compiler, input for compiler. (13) encryption_key Encryption key for bin files, input for compiler. (14) model_id_list The list of model id information (15) model_version_list The list of model version information. 4.8.8 What's the meaning of the output files of batch-compile?
The result of 3.7 FpAnalyser and Batch-Compile is generated at a folder called batch_compile at Interactive Folder, and it has two sub-folders called fpAnalyser and compiler. In fpAnalyser subfolder, it has the folders with name format as input_img_txt_X, which contains the .txt files after image preprocessing. The index X is the related to the order of model file in FAQ question 7 (6), which means the folder input_img_txt_X is number X model's preprocess image text files. In compile subfolder, it will have the following files: all_model.bin, fw_info.bin, temp_X_ioinfo.csv. The X is still the order of the models. - all_model.bin and fw_info.bin is for firmware to use; -temp_X_ioinfo.csv contains the information that cpu node and output node. If you find the cpu node in temp_X_ioinfo.csv, whose format is "c,**,**", you need to implement and register this function in SDK.

Chapter 5 ­ Firmware Management

26

Kneron Toolchain User Guide

4.8.9 How to use customized methods for image preprocess?
(1) Configure the input_params.json, and fill the value of "img_preprocess_method" as "customized";
(2) Edit the file /workspace/scripts/img_preprocess.py, search for the text "#this is the customized part" and add your customized image preprocess method there.

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

27

Kneron Toolchain User Guide

Chapter 5 Firmware Management
5.1 Update Firmware Use the following steps and commands to update the firmware. Step 1: Install packages $ sudo apt-get install cmake $ sudo apt-get install libusb-1.0-0-dev $ sudo apt-get install g++ Step 2 $ cd kl520_sdk_<version>/host_lib Step 3: Change g++ file path in build.sh $ which g++ For example: /usr/bin/g++ $ nano build.sh

Supplemental Guide for KL520 NPU

$ sudo chmod +x build.sh

Chapter 5 ­ Firmware Management

28

Kneron Toolchain User Guide

$ sudo ./build.sh $ cd kl520.sdk.<version>/host_lib/example/build Program udt_fw Program udt_fw can be used to test the update firmware feature. The firmware files are in "../test_image/ota/work1". An alternate directory"work2"exists in the same directory, which could be used for testing of the switch of two banks. Usage: udt_fw fw_id (0 ­ no operation, 1 ­ scpu, 2 - ncpu) For example: To run update scpu firmware $ sudo ./udt_fw 1 To run update ncpu firmware $ sudo ./udt_fw 2 Note: After firmware update finishes, the KL520 is doing reset. The KL520 needs to be restarted manually. 5.2 General Model Firmware
Step 1: Go to terminal Ener the following command: cd kl520_sdk_<version>/ota Step 2: Copy fw_info.bin & all_models.bin in 3.6.2 BatchCompile to kl520_sdk_<version>/ota Step 3: Run below command line(fw_info.bin & all_models.bin generated from Batch Compile [refer Page 5]) sudo chmod +x gen_ota_binary_for_linux ./gen_ota_binary_for_linux -model fw_info.bin all_models.bin model_ota.bin The model file model_ota.bin is generated in kl520_sdk_<version>/ota

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

29

Kneron Toolchain User Guide

5.3 Model Update
Program udt_md Program udt_md can be used to test the update model feature. The model file is "kl520_sdk_<version>/host_lib/example/test_image/ota model_ota.bin ". Step 1: Replace model_ota.bin in test_image file by model_ota.bin that generated from 2. Generate model firmware Usage: udt_md model_id (0 ­ no operation, other value ­ model id) Step 2: To run update model with model id 1 $ sudo ./udt_md 1 To run update model with empty operation $ sudo ./udt_md 0 NoteAfter update model finishes, the KL520 is doing reset. The KL520 needs to be re-started manually, either by SPI or by JTAG. After the system is started successfully, KL520 can send back the response. APIs used: kdp_update_model()

Supplemental Guide for KL520 NPU

Chapter 5 ­ Firmware Management

30


Microsoft Word 2019