KNEREX

Introduction

KNEREX is an on-going project, which aims to develop tools to automatically analyze the statistical information of neural networks in deep learning and generate refined networks for dynamic fixed point.

Details

1. Problems and Motivations

Some special hardware like DSP and Tensor Processing Unit support fast calculations of integers. However, when training models, we use floats or doubles rather than integers. Thus the weights and attributes in the pre-trained neural networks are always floating numbers.

To support this type of hardware, we should convert the floats to integers. There comes a loss of significance. See this wiki for more details. The loss of significance will affect the performance of neural networks. We should provide the proper bit width for each calculation.

2. Aims

We want to develop tools to:

  1. automatically analyze the statistical information on neural networks;
  2. provide the proper bit width;
  3. generate refined neural networks for the hardware;
  4. satisfy all the constraints of different hardware (for example, bits shift of input and output on the convolution layer should not be larger than some number).

Besides, the tools should support:

  1. parallel analyzing: analyzing thousands of images parallelly.
  2. adjustable configuration: different hardware has different constraints, the tools should be adjustable for constraints.

3. Methods and Solutions

We are preparing to submit a paper. I will discuss this part after the paper comes to the world.

4. Performance

So far, for several typical neural networks (including ResNet50, Inception, etc). Compared to 32-bit floating computation, our integer computation utilizing at most 16 bits achieves very similar classification performance.

The experiments are on-going. I will discuss and present detailed numbers of performance when all results are available.

Related Technologies

  1. ONNX: We use ONNX to store neural networks.
  2. Protocol Buffers: We use Protocol Buffers to store statistical information on neural networks. Actually, ONNX itself uses Protocol Buffers.
  3. Jsoncpp: We use Jsoncpp to parse and generate JSON files.
  4. Concurrency: We develop a thread pool to support parallel computation.
  5. Algorithms: Neural networks are basically directed graphs. The algorithm in this project bases heavily on BFS, DFS, and Union-Find.

Programming Languages

  1. C++: 90%. The main projects are using C++.
  2. Bash: 5%. I write bash scripts for testing.
  3. Python: 5%. I write python scripts for testing, especially to validate the correctness compared to Keras and TensorFlow.