diff --git a/README.md b/README.md
index b3994c8..12ebd78 100644
--- a/README.md
+++ b/README.md
@@ -1,16 +1,12 @@
-[cars-yolo-output]: examples/assets/cars.gif "Sample Output with YOLO"
-[cows-tf-ssd-output]: examples/assets/cows.gif "Sample Output with SSD"
+# 应用领域
 
-# Multi-object trackers in Python
-Easy to use implementation of various multi-object tracking algorithms.
+本文立足于**将超声悬浮技术应用于超疏水表面上的液滴操控系统**，并在此基础上搭建**以机器视觉**为辅助的三轴式液滴操控系统。本文目的是利用神经网络实现液滴的目标检测与目标跟踪问题，实现液滴的自动化操控，并且提高液滴的操控精度。通过一个轻量化网络，使得在边缘计算设备上也能运行精准的液滴目标检测与跟踪算法。
 
-[![DOI](https://zenodo.org/badge/148338463.svg)](https://zenodo.org/badge/latestdoi/148338463)
+## Available Object Detector
 
-
-`YOLOv3 + CentroidTracker` |  `TF-MobileNetSSD + CentroidTracker`
-:-------------------------:|:-------------------------:
-![Cars with YOLO][cars-yolo-output]  |  ![Cows with tf-SSD][cows-tf-ssd-output]
-Video source: [link](https://flic.kr/p/L6qyxj) | Video source: [link](https://flic.kr/p/26WeEWy)
+```
+NanoDet-Plus
+```
 
 ## Available Multi Object Trackers
 
@@ -21,84 +17,18 @@ CentroidKF_Tracker
 SORT
 ```
 
-## Available OpenCV-based object detectors:
-
-```
-detector.TF_SSDMobileNetV2
-detector.Caffe_SSDMobileNet
-detector.YOLOv3
-```
-
 ## Installation
 
-Pip install for OpenCV (version 3.4.3 or later) is available [here](https://pypi.org/project/opencv-python/) and can be done with the following command:
-
 ```
-git clone https://github.com/adipandas/multi-object-tracker
+git clone https://github.com/vvEverett/multi-object-tracker.git
 cd multi-object-tracker
 pip install -r requirements.txt
-pip install -e .
+# pip install -e .
+python setup.py develop
+python setup_nanodet.py develop
 ```
 
-**Note - for using neural network models with GPU**  
-For using the opencv `dnn`-based object detection modules provided in this repository with GPU, you may have to compile a CUDA enabled version of OpenCV from source.  
-* To build opencv from source, refer the following links:
-[[link-1](https://docs.opencv.org/master/df/d65/tutorial_table_of_content_introduction.html)],
-[[link-2](https://www.pyimagesearch.com/2020/02/03/how-to-use-opencvs-dnn-module-with-nvidia-gpus-cuda-and-cudnn/)]
-
-## How to use?: Examples
+## How to use?
 
-The interface for each tracker is simple and similar. Please refer the example template below.
-
-```
-from motrackers import CentroidTracker # or IOUTracker, CentroidKF_Tracker, SORT
-input_data = ...
-detector = ...
-tracker = CentroidTracker(...) # or IOUTracker(...), CentroidKF_Tracker(...), SORT(...)
-while True:
-    done, image = <read(input_data)>
-    if done:
-        break
-    detection_bboxes, detection_confidences, detection_class_ids = detector.detect(image)
-    # NOTE: 
-    # * `detection_bboxes` are numpy.ndarray of shape (n, 4) with each row containing (bb_left, bb_top, bb_width, bb_height)
-    # * `detection_confidences` are numpy.ndarray of shape (n,);
-    # * `detection_class_ids` are numpy.ndarray of shape (n,).
-    output_tracks = tracker.update(detection_bboxes, detection_confidences, detection_class_ids)
-    # `output_tracks` is a list with each element containing tuple of
-    # (<frame>, <id>, <bb_left>, <bb_top>, <bb_width>, <bb_height>, <conf>, <x>, <y>, <z>)
-    for track in output_tracks:
-        frame, id, bb_left, bb_top, bb_width, bb_height, confidence, x, y, z = track
-        assert len(track) == 10
-        print(track)
-```
-
-Please refer [examples](https://github.com/adipandas/multi-object-tracker/tree/master/examples) folder of this repository for more details. You can clone and run the examples.
-
-## Pretrained object detection models
-
-You will have to download the pretrained weights for the neural-network models. 
-The shell scripts for downloading these are provided [here](https://github.com/adipandas/multi-object-tracker/tree/master/examples/pretrained_models) below respective folders.
-Please refer [DOWNLOAD_WEIGHTS.md](https://github.com/adipandas/multi-object-tracker/blob/master/DOWNLOAD_WEIGHTS.md) for more details.
-
-### Notes
-* There are some variations in implementations as compared to what appeared in papers of `SORT` and `IoU Tracker`.
-* In case you find any bugs in the algorithm, I will be happy to accept your pull request or you can create an issue to point it out.
-
-## References, Credits and Contributions
-Please see [REFERENCES.md](https://github.com/adipandas/multi-object-tracker/blob/master/docs/readme/REFERENCES.md) and [CONTRIBUTING.md](https://github.com/adipandas/multi-object-tracker/blob/master/docs/readme/CONTRIBUTING.md).
-
-## Citation
-
-If you use this repository in your work, please consider citing it with:
-```
-@misc{multiobjtracker_amd2018,
-  author = {Deshpande, Aditya M.},
-  title = {Multi-object trackers in Python},
-  year = {2020},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/adipandas/multi-object-tracker}},
-}
-```
+运行main.py即可开启对test.avi的液滴目标检测与跟踪。
 
diff --git a/config/LiquidDetect.yml b/config/LiquidDetect.yml
new file mode 100644
index 0000000..3e3c3af
--- /dev/null
+++ b/config/LiquidDetect.yml
@@ -0,0 +1,115 @@
+#Config File example
+save_dir: workspace/lqd
+model:
+  weight_averager:
+    name: ExpMovingAverager
+    decay: 0.9998
+  arch:
+    name: NanoDetPlus
+    detach_epoch: 10
+    backbone:
+      name: ShuffleNetV2
+      model_size: 1.0x
+      out_stages: [2,3,4]
+      activation: LeakyReLU
+    fpn:
+      name: GhostPAN
+      in_channels: [116, 232, 464]
+      out_channels: 96
+      kernel_size: 5
+      num_extra_level: 1
+      use_depthwise: True
+      activation: LeakyReLU
+    head:
+      name: NanoDetPlusHead
+      num_classes: 1
+      input_channel: 96
+      feat_channels: 96
+      stacked_convs: 2
+      kernel_size: 5
+      strides: [8, 16, 32, 64]
+      activation: LeakyReLU
+      reg_max: 7
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+    # Auxiliary head, only use in training time.
+    aux_head:
+      name: SimpleConvHead
+      num_classes: 1
+      input_channel: 192
+      feat_channels: 192
+      stacked_convs: 4
+      strides: [8, 16, 32, 64]
+      activation: LeakyReLU
+      reg_max: 7
+
+class_names: &class_names ['Liquid']  #Please fill in the category names (not include background category)
+data:
+  train:
+    name: XMLDataset
+    class_names: *class_names
+    img_path: lq/train/img
+    ann_path: lq/train/an
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.6, 1.4]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.8, 1.2]
+      saturation: [0.8, 1.2]
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+  val:
+    name: XMLDataset
+    class_names: *class_names
+    img_path: lq/valid/img
+    ann_path: lq/valid/an
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+device:
+  gpu_ids: [0] # Set like [0, 1, 2, 3] if you have multi-GPUs
+  workers_per_gpu: 8
+  batchsize_per_gpu: 4
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: AdamW
+    lr: 0.001
+    weight_decay: 0.05
+  warmup:
+    name: linear
+    steps: 500
+    ratio: 0.0001
+  total_epochs: 300
+  lr_schedule:
+    name: CosineAnnealingLR
+    T_max: 300
+    eta_min: 0.00005
+  val_intervals: 10
+grad_clip: 35
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
diff --git a/config/LiquidDetect416.yml b/config/LiquidDetect416.yml
new file mode 100644
index 0000000..1ca3ceb
--- /dev/null
+++ b/config/LiquidDetect416.yml
@@ -0,0 +1,115 @@
+#Config File example
+save_dir: workspace/lqd
+model:
+  weight_averager:
+    name: ExpMovingAverager
+    decay: 0.9998
+  arch:
+    name: NanoDetPlus
+    detach_epoch: 10
+    backbone:
+      name: ShuffleNetV2
+      model_size: 1.0x
+      out_stages: [2,3,4]
+      activation: LeakyReLU
+    fpn:
+      name: GhostPAN
+      in_channels: [116, 232, 464]
+      out_channels: 96
+      kernel_size: 5
+      num_extra_level: 1
+      use_depthwise: True
+      activation: LeakyReLU
+    head:
+      name: NanoDetPlusHead
+      num_classes: 1
+      input_channel: 96
+      feat_channels: 96
+      stacked_convs: 2
+      kernel_size: 5
+      strides: [8, 16, 32, 64]
+      activation: LeakyReLU
+      reg_max: 7
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+    # Auxiliary head, only use in training time.
+    aux_head:
+      name: SimpleConvHead
+      num_classes: 1
+      input_channel: 192
+      feat_channels: 192
+      stacked_convs: 4
+      strides: [8, 16, 32, 64]
+      activation: LeakyReLU
+      reg_max: 7
+
+class_names: &class_names ['Liquid']  #Please fill in the category names (not include background category)
+data:
+  train:
+    name: XMLDataset
+    class_names: *class_names
+    img_path: lq/train/img
+    ann_path: lq/train/an
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.6, 1.4]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.8, 1.2]
+      saturation: [0.8, 1.2]
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+  val:
+    name: XMLDataset
+    class_names: *class_names
+    img_path: lq/valid/img
+    ann_path: lq/valid/an
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+device:
+  gpu_ids: [0] # Set like [0, 1, 2, 3] if you have multi-GPUs
+  workers_per_gpu: 8
+  batchsize_per_gpu: 4
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: AdamW
+    lr: 0.001
+    weight_decay: 0.05
+  warmup:
+    name: linear
+    steps: 500
+    ratio: 0.0001
+  total_epochs: 300
+  lr_schedule:
+    name: CosineAnnealingLR
+    T_max: 300
+    eta_min: 0.00005
+  val_intervals: 10
+grad_clip: 35
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
diff --git a/config/convnext/nanodet-plus_convnext-nano_640.yml b/config/convnext/nanodet-plus_convnext-nano_640.yml
new file mode 100644
index 0000000..dfc0a85
--- /dev/null
+++ b/config/convnext/nanodet-plus_convnext-nano_640.yml
@@ -0,0 +1,130 @@
+save_dir: workspace/convnext/nanodet-plus_convnext-nano_640
+model:
+  weight_averager:
+    name: ExpMovingAverager
+    decay: 0.9998
+  arch:
+    name: NanoDetPlus
+    detach_epoch: 10
+    backbone:
+      name: TIMMWrapper
+      model_name: convnext_nano
+      features_only: True
+      pretrained: True
+      # output_stride: 32
+      out_indices: [1, 2, 3]
+    fpn:
+      name: GhostPAN
+      in_channels: [160, 320, 640]
+      out_channels: 128
+      kernel_size: 5
+      num_extra_level: 1
+      use_depthwise: True
+      activation: SiLU
+    head:
+      name: NanoDetPlusHead
+      num_classes: 80
+      input_channel: 128
+      feat_channels: 128
+      stacked_convs: 2
+      kernel_size: 5
+      strides: [8, 16, 32, 64]
+      activation: SiLU
+      reg_max: 7
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+    # Auxiliary head, only use in training time.
+    aux_head:
+      name: SimpleConvHead
+      num_classes: 80
+      input_channel: 256
+      feat_channels: 256
+      stacked_convs: 4
+      strides: [8, 16, 32, 64]
+      activation: SiLU
+      reg_max: 7
+data:
+  train:
+    name: CocoDataset
+    img_path: coco/train2017
+    ann_path: coco/annotations/instances_train2017.json
+    input_size: [640,640] #[w,h]
+    keep_ratio: False
+    pipeline:
+      perspective: 0.0
+      scale: [0.1, 2.0]
+      stretch: [[0.8, 1.2], [0.8, 1.2]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.6, 1.4]
+      saturation: [0.5, 1.2]
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+  val:
+    name: CocoDataset
+    img_path: coco/val2017
+    ann_path: coco/annotations/instances_val2017.json
+    input_size: [640,640] #[w,h]
+    keep_ratio: False
+    pipeline:
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+device:
+  gpu_ids: [0, 1, 2, 3]
+  workers_per_gpu: 8
+  batchsize_per_gpu: 24
+schedule:
+#  resume:
+#  load_model:
+  optimizer:
+    name: AdamW
+    lr: 0.001
+    weight_decay: 0.05
+    no_norm_decay: True
+    param_level_cfg:
+      backbone:
+        lr_mult: 0.1
+  warmup:
+    name: linear
+    steps: 500
+    ratio: 0.0001
+  total_epochs: 50
+  lr_schedule:
+    name: CosineAnnealingLR
+    T_max: 50
+    eta_min: 0.0005
+  val_intervals: 5
+grad_clip: 35
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+log:
+  interval: 50
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite0_320.yml b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite0_320.yml
new file mode 100644
index 0000000..1e43f10
--- /dev/null
+++ b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite0_320.yml
@@ -0,0 +1,118 @@
+# nanodet-EfficientNet-Lite0_320
+# COCO mAP(0.5:0.95) = 0.247
+#             AP_50  = 0.404
+#             AP_75  = 0.250
+#           AP_small = 0.079
+#               AP_m = 0.243
+#               AP_l = 0.406
+save_dir: workspace/efficient0_320
+model:
+  arch:
+    name: OneStageDetector
+    backbone:
+      name: EfficientNetLite
+      model_name: efficientnet_lite0
+      out_stages: [2,4,6]
+      activation: ReLU6
+    fpn:
+      name: PAN
+      in_channels: [40, 112, 320]
+      out_channels: 96
+      start_level: 0
+      num_outs: 3
+    head:
+      name: NanoDetHead
+      num_classes: 80
+      input_channel: 96
+      feat_channels: 96
+      activation: ReLU6
+      stacked_convs: 2
+      share_cls_reg: True
+      octave_base_scale: 5
+      scales_per_octave: 1
+      strides: [8, 16, 32]
+      reg_max: 7
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+data:
+  train:
+    name: CocoDataset
+    img_path: /coco/train2017
+    ann_path: /coco/annotations/instances_train2017.json
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.6, 1.4]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.6, 1.4]
+      saturation: [0.5, 1.2]
+      normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]]
+  val:
+    name: CocoDataset
+    img_path: /coco/val2017
+    ann_path: /coco/annotations/instances_val2017.json
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]]
+device:
+  gpu_ids: [0]
+  workers_per_gpu: 12
+  batchsize_per_gpu: 150
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: SGD
+    lr: 0.15
+    momentum: 0.9
+    weight_decay: 0.0001
+  warmup:
+    name: linear
+    steps: 500
+    ratio: 0.01
+  total_epochs: 190
+  lr_schedule:
+    name: MultiStepLR
+    milestones: [140,170,180,185]
+    gamma: 0.1
+  val_intervals: 1
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite1_416.yml b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite1_416.yml
new file mode 100644
index 0000000..2e83ab3
--- /dev/null
+++ b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite1_416.yml
@@ -0,0 +1,119 @@
+# nanodet-EfficientNet-Lite1_416
+# COCO mAP(0.5:0.95) = 0.303
+#             AP_50  = 0.471
+#             AP_75  = 0.313
+#           AP_small = 0.122
+#               AP_m = 0.321
+#               AP_l = 0.432
+save_dir: workspace/efficient1_416_SGD
+model:
+  arch:
+    name: OneStageDetector
+    backbone:
+      name: EfficientNetLite
+      model_name: efficientnet_lite1
+      out_stages: [2,4,6]
+      activation: ReLU6
+      pretrain: True
+    fpn:
+      name: PAN
+      in_channels: [40, 112, 320]
+      out_channels: 128
+      start_level: 0
+      num_outs: 3
+    head:
+      name: NanoDetHead
+      num_classes: 80
+      input_channel: 128
+      feat_channels: 128
+      stacked_convs: 3
+      activation: ReLU6
+      share_cls_reg: True
+      octave_base_scale: 8
+      scales_per_octave: 1
+      strides: [8, 16, 32]
+      reg_max: 10
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+data:
+  train:
+    name: CocoDataset
+    img_path: /coco/train2017
+    ann_path: /coco/annotations/instances_train2017.json
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.5, 1.5]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.6, 1.4]
+      saturation: [0.5, 1.2]
+      normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]]
+  val:
+    name: CocoDataset
+    img_path: /coco/val2017
+    ann_path: /coco/annotations/instances_val2017.json
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]]
+device:
+  gpu_ids: [0]
+  workers_per_gpu: 12
+  batchsize_per_gpu: 100
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: SGD
+    lr: 0.07
+    momentum: 0.9
+    weight_decay: 0.0001
+  warmup:
+    name: linear
+    steps: 500
+    ratio: 0.01
+  total_epochs: 170
+  lr_schedule:
+    name: MultiStepLR
+    milestones: [130,150,160,165]
+    gamma: 0.1
+  val_intervals: 5
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite2_512.yml b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite2_512.yml
new file mode 100644
index 0000000..62278a6
--- /dev/null
+++ b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite2_512.yml
@@ -0,0 +1,119 @@
+# nanodet-EfficientNet-Lite2_512
+# COCO mAP(0.5:0.95) = 0.326
+#             AP_50  = 0.501
+#             AP_75  = 0.344
+#           AP_small = 0.152
+#               AP_m = 0.342
+#               AP_l = 0.481
+save_dir: workspace/efficientlite2_512
+model:
+  arch:
+    name: OneStageDetector
+    backbone:
+      name: EfficientNetLite
+      model_name: efficientnet_lite2
+      out_stages: [2,4,6]
+      activation: ReLU6
+      pretrain: True
+    fpn:
+      name: PAN
+      in_channels: [48, 120, 352]
+      out_channels: 128
+      start_level: 0
+      num_outs: 3
+    head:
+      name: NanoDetHead
+      num_classes: 80
+      input_channel: 128
+      feat_channels: 128
+      stacked_convs: 4
+      activation: ReLU6
+      share_cls_reg: True
+      octave_base_scale: 5
+      scales_per_octave: 1
+      strides: [8, 16, 32]
+      reg_max: 10
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+data:
+  train:
+    name: CocoDataset
+    img_path: /coco/train2017
+    ann_path: /coco/annotations/instances_train2017.json
+    input_size: [512,512] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.5, 1.5]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.6, 1.4]
+      saturation: [0.5, 1.2]
+      normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]]
+  val:
+    name: CocoDataset
+    img_path: /coco/val2017
+    ann_path: /coco/annotations/instances_val2017.json
+    input_size: [512,512] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]]
+device:
+  gpu_ids: [0]
+  workers_per_gpu: 12
+  batchsize_per_gpu: 60
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: SGD
+    lr: 0.06
+    momentum: 0.9
+    weight_decay: 0.0001
+  warmup:
+    name: linear
+    steps: 300
+    ratio: 0.1
+  total_epochs: 135
+  lr_schedule:
+    name: MultiStepLR
+    milestones: [90,110,120,130]
+    gamma: 0.1
+  val_intervals: 5
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/config/legacy_v0.x_configs/RepVGG/nanodet-RepVGG-A0_416.yml b/config/legacy_v0.x_configs/RepVGG/nanodet-RepVGG-A0_416.yml
new file mode 100644
index 0000000..6694512
--- /dev/null
+++ b/config/legacy_v0.x_configs/RepVGG/nanodet-RepVGG-A0_416.yml
@@ -0,0 +1,115 @@
+# nanodet-EfficientNet-Lite1_416
+save_dir: workspace/RepVGG-A0-416
+model:
+  arch:
+    name: OneStageDetector
+    backbone:
+      name: RepVGG
+      arch: A0
+      out_stages: [2,3,4]
+      activation: ReLU
+      last_channel: 512
+      deploy: False
+    fpn:
+      name: PAN
+      in_channels: [96, 192, 512]
+      out_channels: 128
+      start_level: 0
+      num_outs: 3
+    head:
+      name: NanoDetHead
+      num_classes: 80
+      conv_type: Conv
+      input_channel: 128
+      feat_channels: 128
+      stacked_convs: 2
+      activation: ReLU
+      share_cls_reg: True
+      octave_base_scale: 8
+      scales_per_octave: 1
+      strides: [8, 16, 32]
+      reg_max: 10
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+data:
+  train:
+    name: CocoDataset
+    img_path: /coco/train2017
+    ann_path: /coco/annotations/instances_train2017.json
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.5, 1.5]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.6, 1.4]
+      saturation: [0.5, 1.2]
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+  val:
+    name: CocoDataset
+    img_path: /coco/val2017
+    ann_path: /coco/annotations/instances_val2017.json
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+device:
+  gpu_ids: [0]
+  workers_per_gpu: 1
+  batchsize_per_gpu: 100
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: SGD
+    lr: 0.07
+    momentum: 0.9
+    weight_decay: 0.0001
+  warmup:
+    name: linear
+    steps: 500
+    ratio: 0.01
+  total_epochs: 170
+  lr_schedule:
+    name: MultiStepLR
+    milestones: [130,150,160,165]
+    gamma: 0.1
+  val_intervals: 5
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/config/legacy_v0.x_configs/Transformer/nanodet-t.yml b/config/legacy_v0.x_configs/Transformer/nanodet-t.yml
new file mode 100644
index 0000000..cc9748a
--- /dev/null
+++ b/config/legacy_v0.x_configs/Transformer/nanodet-t.yml
@@ -0,0 +1,122 @@
+# NanoDet-m with transformer attention
+# COCO mAP(0.5:0.95) = 0.217
+#             AP_50  = 0.363
+#             AP_75  = 0.218
+#           AP_small = 0.069
+#               AP_m = 0.214
+#               AP_l = 0.364
+
+save_dir: workspace/nanodet_t
+model:
+  arch:
+    name: OneStageDetector
+    backbone:
+      name: ShuffleNetV2
+      model_size: 1.0x
+      out_stages: [2,3,4]
+      activation: LeakyReLU
+    fpn:
+      name: TAN # transformer attention network
+      in_channels: [116, 232, 464]
+      out_channels: 128
+      feature_hw: [20,20] # size for position embedding
+      num_heads: 8
+      num_encoders: 1
+      mlp_ratio: 4
+      dropout_ratio: 0.1
+      activation: LeakyReLU
+    head:
+      name: NanoDetHead
+      num_classes: 80
+      input_channel: 128
+      feat_channels: 128
+      stacked_convs: 2
+      share_cls_reg: True
+      octave_base_scale: 5
+      scales_per_octave: 1
+      strides: [8, 16, 32]
+      reg_max: 7
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+data:
+  train:
+    name: CocoDataset
+    img_path: coco/train2017
+    ann_path: coco/annotations/instances_train2017.json
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.6, 1.4]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.8, 1.2]
+      saturation: [0.8, 1.2]
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+  val:
+    name: CocoDataset
+    img_path: coco/val2017
+    ann_path: coco/annotations/instances_val2017.json
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+device:
+  gpu_ids: [0]
+  workers_per_gpu: 8
+  batchsize_per_gpu: 160
+schedule:
+  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: SGD
+    lr: 0.14
+    momentum: 0.9
+    weight_decay: 0.0001
+  warmup:
+    name: linear
+    steps: 500
+    ratio: 0.01
+  total_epochs: 190
+  lr_schedule:
+    name: MultiStepLR
+    milestones: [140,170,180,185]
+    gamma: 0.1
+  val_intervals: 10
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/config/legacy_v0.x_configs/nanodet-g.yml b/config/legacy_v0.x_configs/nanodet-g.yml
new file mode 100644
index 0000000..93cb982
--- /dev/null
+++ b/config/legacy_v0.x_configs/nanodet-g.yml
@@ -0,0 +1,122 @@
+# NanoDet-g-416 is designed for edge NPU, GPU or TPU with high parallel computing power but low memory bandwidth
+# COCO mAP(0.5:0.95) = 22.9
+# Flops = 4.2B
+# Params = 3.8M
+# COCO pre-trained weight link: https://drive.google.com/file/d/10uW7oqZKw231l_tr4C1bJWkbCXgBf7av/view?usp=sharing
+save_dir: workspace/nanodet_g
+model:
+  arch:
+    name: OneStageDetector
+    backbone:
+      name: CustomCspNet
+      net_cfg: [[ 'Conv', 3, 32, 3, 2],  # 1/2
+                [ 'MaxPool', 3, 2 ],  # 1/4
+                [ 'CspBlock', 32, 1, 3, 1 ],  # 1/4
+                [ 'CspBlock', 64, 2, 3, 2 ],  # 1/8
+                [ 'CspBlock', 128, 2, 3, 2 ],  # 1/16
+                [ 'CspBlock', 256, 3, 3, 2 ]]  # 1/32
+      out_stages: [3,4,5]
+      activation: LeakyReLU
+    fpn:
+      name: PAN
+      in_channels: [128, 256, 512]
+      out_channels: 128
+      start_level: 0
+      num_outs: 3
+    head:
+      name: NanoDetHead
+      num_classes: 80
+      conv_type: Conv
+      activation: LeakyReLU
+      input_channel: 128
+      feat_channels: 128
+      stacked_convs: 1
+      share_cls_reg: True
+      octave_base_scale: 8
+      scales_per_octave: 1
+      strides: [8, 16, 32]
+      reg_max: 10
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+data:
+  train:
+    name: CocoDataset
+    img_path: coco/train2017
+    ann_path: coco/annotations/instances_train2017.json
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.6, 1.4]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.6, 1.4]
+      saturation: [0.5, 1.2]
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+  val:
+    name: CocoDataset
+    img_path: coco/val2017
+    ann_path: coco/annotations/instances_val2017.json
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+device:
+  gpu_ids: [0]
+  workers_per_gpu: 10
+  batchsize_per_gpu: 128
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: SGD
+    lr: 0.1
+    momentum: 0.9
+    weight_decay: 0.0001
+  warmup:
+    name: linear
+    steps: 500
+    ratio: 0.01
+  total_epochs: 190
+  lr_schedule:
+    name: MultiStepLR
+    milestones: [130,160,175,185]
+    gamma: 0.1
+  val_intervals: 5
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/config/legacy_v0.x_configs/nanodet-m-0.5x.yml b/config/legacy_v0.x_configs/nanodet-m-0.5x.yml
new file mode 100644
index 0000000..f5e6e85
--- /dev/null
+++ b/config/legacy_v0.x_configs/nanodet-m-0.5x.yml
@@ -0,0 +1,117 @@
+# nanodet-m-0.5x
+# COCO mAP(0.5:0.95) = 0.135
+#             AP_50  = 0.245
+#             AP_75  = 0.129
+#           AP_small = 0.036
+#               AP_m = 0.119
+#               AP_l = 0.232
+save_dir: workspace/nanodet_m_0.5x
+model:
+  arch:
+    name: OneStageDetector
+    backbone:
+      name: ShuffleNetV2
+      model_size: 0.5x
+      out_stages: [2,3,4]
+      activation: LeakyReLU
+    fpn:
+      name: PAN
+      in_channels: [48, 96, 192]
+      out_channels: 96
+      start_level: 0
+      num_outs: 3
+    head:
+      name: NanoDetHead
+      num_classes: 80
+      input_channel: 96
+      feat_channels: 96
+      stacked_convs: 2
+      share_cls_reg: True
+      octave_base_scale: 5
+      scales_per_octave: 1
+      strides: [8, 16, 32]
+      reg_max: 7
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+data:
+  train:
+    name: CocoDataset
+    img_path: coco/train2017
+    ann_path: coco/annotations/instances_train2017.json
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.5, 1.5]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.6, 1.4]
+      saturation: [0.5, 1.2]
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+  val:
+    name: CocoDataset
+    img_path: coco/val2017
+    ann_path: coco/annotations/instances_val2017.json
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+device:
+  gpu_ids: [0]
+  workers_per_gpu: 8
+  batchsize_per_gpu: 96
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: SGD
+    lr: 0.07
+    momentum: 0.9
+    weight_decay: 0.0001
+  warmup:
+    name: linear
+    steps: 1000
+    ratio: 0.00001
+  total_epochs: 180
+  lr_schedule:
+    name: MultiStepLR
+    milestones: [130,160,175]
+    gamma: 0.1
+  val_intervals: 10
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 50
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/config/legacy_v0.x_configs/nanodet-m-1.5x-416.yml b/config/legacy_v0.x_configs/nanodet-m-1.5x-416.yml
new file mode 100644
index 0000000..f4ff310
--- /dev/null
+++ b/config/legacy_v0.x_configs/nanodet-m-1.5x-416.yml
@@ -0,0 +1,117 @@
+#nanodet-m-1.5x-416
+# COCO mAP(0.5:0.95) = 0.268
+#             AP_50  = 0.424
+#             AP_75  = 0.276
+#           AP_small = 0.098
+#               AP_m = 0.277
+#               AP_l = 0.420
+save_dir: workspace/nanodet_m_1.5x_416
+model:
+  arch:
+    name: OneStageDetector
+    backbone:
+      name: ShuffleNetV2
+      model_size: 1.5x
+      out_stages: [2,3,4]
+      activation: LeakyReLU
+    fpn:
+      name: PAN
+      in_channels: [176, 352, 704]
+      out_channels: 128
+      start_level: 0
+      num_outs: 3
+    head:
+      name: NanoDetHead
+      num_classes: 80
+      input_channel: 128
+      feat_channels: 128
+      stacked_convs: 2
+      share_cls_reg: True
+      octave_base_scale: 5
+      scales_per_octave: 1
+      strides: [8, 16, 32]
+      reg_max: 7
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+data:
+  train:
+    name: CocoDataset
+    img_path: coco/train2017
+    ann_path: coco/annotations/instances_train2017.json
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.5, 1.4]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.6, 1.4]
+      saturation: [0.5, 1.2]
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+  val:
+    name: CocoDataset
+    img_path: coco/val2017
+    ann_path: coco/annotations/instances_val2017.json
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+device:
+  gpu_ids: [0]
+  workers_per_gpu: 8
+  batchsize_per_gpu: 176
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: SGD
+    lr: 0.14
+    momentum: 0.9
+    weight_decay: 0.0001
+  warmup:
+    name: linear
+    steps: 300
+    ratio: 0.1
+  total_epochs: 280
+  lr_schedule:
+    name: MultiStepLR
+    milestones: [240,260,275]
+    gamma: 0.1
+  val_intervals: 10
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/config/legacy_v0.x_configs/nanodet-m-1.5x.yml b/config/legacy_v0.x_configs/nanodet-m-1.5x.yml
new file mode 100644
index 0000000..c622c2f
--- /dev/null
+++ b/config/legacy_v0.x_configs/nanodet-m-1.5x.yml
@@ -0,0 +1,117 @@
+#nanodet-m-1.5x
+# COCO mAP(0.5:0.95) = 0.235
+#             AP_50  = 0.384
+#             AP_75  = 0.239
+#           AP_small = 0.069
+#               AP_m = 0.235
+#               AP_l = 0.389
+save_dir: workspace/nanodet_m_1.5x
+model:
+  arch:
+    name: OneStageDetector
+    backbone:
+      name: ShuffleNetV2
+      model_size: 1.5x
+      out_stages: [2,3,4]
+      activation: LeakyReLU
+    fpn:
+      name: PAN
+      in_channels: [176, 352, 704]
+      out_channels: 128
+      start_level: 0
+      num_outs: 3
+    head:
+      name: NanoDetHead
+      num_classes: 80
+      input_channel: 128
+      feat_channels: 128
+      stacked_convs: 2
+      share_cls_reg: True
+      octave_base_scale: 5
+      scales_per_octave: 1
+      strides: [8, 16, 32]
+      reg_max: 7
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+data:
+  train:
+    name: CocoDataset
+    img_path: coco/train2017
+    ann_path: coco/annotations/instances_train2017.json
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.6, 1.4]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.6, 1.4]
+      saturation: [0.5, 1.2]
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+  val:
+    name: CocoDataset
+    img_path: coco/val2017
+    ann_path: coco/annotations/instances_val2017.json
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+device:
+  gpu_ids: [0]
+  workers_per_gpu: 8
+  batchsize_per_gpu: 192
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: SGD
+    lr: 0.14
+    momentum: 0.9
+    weight_decay: 0.0001
+  warmup:
+    name: linear
+    steps: 300
+    ratio: 0.1
+  total_epochs: 280
+  lr_schedule:
+    name: MultiStepLR
+    milestones: [240,260,275]
+    gamma: 0.1
+  val_intervals: 10
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/config/legacy_v0.x_configs/nanodet-m-416.yml b/config/legacy_v0.x_configs/nanodet-m-416.yml
new file mode 100644
index 0000000..58c84ad
--- /dev/null
+++ b/config/legacy_v0.x_configs/nanodet-m-416.yml
@@ -0,0 +1,117 @@
+#nanodet-m-416
+# COCO mAP(0.5:0.95) = 0.235
+#             AP_50  = 0.384
+#             AP_75  = 0.242
+#           AP_small = 0.082
+#               AP_m = 0.240
+#               AP_l = 0.375
+save_dir: workspace/nanodet_m_416
+model:
+  arch:
+    name: OneStageDetector
+    backbone:
+      name: ShuffleNetV2
+      model_size: 1.0x
+      out_stages: [2,3,4]
+      activation: LeakyReLU
+    fpn:
+      name: PAN
+      in_channels: [116, 232, 464]
+      out_channels: 96
+      start_level: 0
+      num_outs: 3
+    head:
+      name: NanoDetHead
+      num_classes: 80
+      input_channel: 96
+      feat_channels: 96
+      stacked_convs: 2
+      share_cls_reg: True
+      octave_base_scale: 5
+      scales_per_octave: 1
+      strides: [8, 16, 32]
+      reg_max: 7
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+data:
+  train:
+    name: CocoDataset
+    img_path: coco/train2017
+    ann_path: coco/annotations/instances_train2017.json
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.5, 1.4]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.6, 1.4]
+      saturation: [0.5, 1.2]
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+  val:
+    name: CocoDataset
+    img_path: coco/val2017
+    ann_path: coco/annotations/instances_val2017.json
+    input_size: [416,416] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+device:
+  gpu_ids: [0]
+  workers_per_gpu: 8
+  batchsize_per_gpu: 192
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: SGD
+    lr: 0.14
+    momentum: 0.9
+    weight_decay: 0.0001
+  warmup:
+    name: linear
+    steps: 300
+    ratio: 0.1
+  total_epochs: 280
+  lr_schedule:
+    name: MultiStepLR
+    milestones: [240,260,275]
+    gamma: 0.1
+  val_intervals: 10
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/config/legacy_v0.x_configs/nanodet-m.yml b/config/legacy_v0.x_configs/nanodet-m.yml
new file mode 100644
index 0000000..1c719fd
--- /dev/null
+++ b/config/legacy_v0.x_configs/nanodet-m.yml
@@ -0,0 +1,111 @@
+#Config File example
+save_dir: workspace/nanodet_m
+model:
+  arch:
+    name: OneStageDetector
+    backbone:
+      name: ShuffleNetV2
+      model_size: 1.0x
+      out_stages: [2,3,4]
+      activation: LeakyReLU
+    fpn:
+      name: PAN
+      in_channels: [116, 232, 464]
+      out_channels: 96
+      start_level: 0
+      num_outs: 3
+    head:
+      name: NanoDetHead
+      num_classes: 80
+      input_channel: 96
+      feat_channels: 96
+      stacked_convs: 2
+      share_cls_reg: True
+      octave_base_scale: 5
+      scales_per_octave: 1
+      strides: [8, 16, 32]
+      reg_max: 7
+      norm_cfg:
+        type: BN
+      loss:
+        loss_qfl:
+          name: QualityFocalLoss
+          use_sigmoid: True
+          beta: 2.0
+          loss_weight: 1.0
+        loss_dfl:
+          name: DistributionFocalLoss
+          loss_weight: 0.25
+        loss_bbox:
+          name: GIoULoss
+          loss_weight: 2.0
+data:
+  train:
+    name: CocoDataset
+    img_path: coco/train2017
+    ann_path: coco/annotations/instances_train2017.json
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      perspective: 0.0
+      scale: [0.6, 1.4]
+      stretch: [[1, 1], [1, 1]]
+      rotation: 0
+      shear: 0
+      translate: 0.2
+      flip: 0.5
+      brightness: 0.2
+      contrast: [0.6, 1.4]
+      saturation: [0.5, 1.2]
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+  val:
+    name: CocoDataset
+    img_path: coco/val2017
+    ann_path: coco/annotations/instances_val2017.json
+    input_size: [320,320] #[w,h]
+    keep_ratio: True
+    pipeline:
+      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
+device:
+  gpu_ids: [0]
+  workers_per_gpu: 8
+  batchsize_per_gpu: 192
+schedule:
+#  resume:
+#  load_model: YOUR_MODEL_PATH
+  optimizer:
+    name: SGD
+    lr: 0.14
+    momentum: 0.9
+    weight_decay: 0.0001
+  warmup:
+    name: linear
+    steps: 300
+    ratio: 0.1
+  total_epochs: 280
+  lr_schedule:
+    name: MultiStepLR
+    milestones: [240,260,275]
+    gamma: 0.1
+  val_intervals: 10
+evaluator:
+  name: CocoDetectionEvaluator
+  save_key: mAP
+
+log:
+  interval: 10
+
+class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
+              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
+              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
+              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
+              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
+              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
+              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']
diff --git a/examples/example_notebooks/logs.txt b/examples/example_notebooks/logs.txt
new file mode 100644
index 0000000..58d1c24
--- /dev/null
+++ b/examples/example_notebooks/logs.txt
@@ -0,0 +1,2 @@
+INFO:root:Press "Esc", "q" or "Q" to exit.
+INFO:root:Press "Esc", "q" or "Q" to exit.
diff --git a/examples/example_notebooks/mot_Nanodet.ipynb b/examples/example_notebooks/mot_Nanodet.ipynb
new file mode 100644
index 0000000..8e4080d
--- /dev/null
+++ b/examples/example_notebooks/mot_Nanodet.ipynb
@@ -0,0 +1,793 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Multiple object tracking with Nanodet-based object detection"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import cv2 as cv\n",
+    "from motrackers.detectors import Nanodet\n",
+    "from motrackers import CentroidTracker, CentroidKF_Tracker, SORT, IOUTracker\n",
+    "from motrackers.utils import draw_tracks\n",
+    "from nanodet.util import Logger, cfg, load_config, load_model_weight\n",
+    "import ipywidgets as widgets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "VIDEO_FILE = r\"D:\\shijue\\LiquidDrop\\22.avi\"\n",
+    "WEIGHTS_PATH = r'D:\\shijue\\multi-object-tracker\\weight\\LiquidV4.pth'\n",
+    "CONFIG_FILE_PATH = r'D:\\shijue\\multi-object-tracker\\config\\LiquidDetect416.yml'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "ee9c2b6ebbb3476791fc9262227dce83",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Select(description='MOTracker:', options=('CentroidTracker', 'CentroidKF_Tracker', 'SORT', 'IOUTracker'), valu…"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chosen_tracker = widgets.Select(\n",
+    "    options=[\"CentroidTracker\", \"CentroidKF_Tracker\", \"SORT\", \"IOUTracker\"],\n",
+    "    value='CentroidTracker',\n",
+    "    rows=5,\n",
+    "    description='MOTracker:',\n",
+    "    disabled=False\n",
+    ")\n",
+    "chosen_tracker"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if chosen_tracker.value == 'CentroidTracker':\n",
+    "    tracker = CentroidTracker(max_lost=0, tracker_output_format='mot_challenge')\n",
+    "elif chosen_tracker.value == 'CentroidKF_Tracker':\n",
+    "    tracker = CentroidKF_Tracker(max_lost=0, tracker_output_format='mot_challenge')\n",
+    "elif chosen_tracker.value == 'SORT':\n",
+    "    tracker = SORT(max_lost=3, tracker_output_format='mot_challenge', iou_threshold=0.3)\n",
+    "elif chosen_tracker.value == 'IOUTracker':\n",
+    "    tracker = IOUTracker(max_lost=2, iou_threshold=0.5, min_detection_confidence=0.4, max_detection_confidence=0.7,\n",
+    "                         tracker_output_format='mot_challenge')\n",
+    "else:\n",
+    "    print(\"Please choose one tracker from the above list.\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "model size is  1.0x\n",
+      "init weights...\n",
+      "=> loading pretrained model https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth\n",
+      "Finish initialize NanoDet-Plus Head.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\u001b[1m\u001b[35m[root]\u001b[0m\u001b[34m[04-10 23:38:06]\u001b[0m\u001b[32mINFO:\u001b[0m\u001b[37mPress \"Esc\", \"q\" or \"Q\" to exit.\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 导入模型文件\n",
+    "local_rank = 0\n",
+    "modelpath = WEIGHTS_PATH\n",
+    "device = \"cpu:0\"\n",
+    "config = CONFIG_FILE_PATH\n",
+    "logger = Logger(local_rank, use_tensorboard=False)\n",
+    "load_config(cfg, config)\n",
+    "detmodel = Nanodet(cfg, modelpath, logger, device)\n",
+    "logger.log('Press \"Esc\", \"q\" or \"Q\" to exit.')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "def main(video_path, model, tracker):\n",
+    "\n",
+    "    cap = cv.VideoCapture(video_path)\n",
+    "    while True:\n",
+    "        ok, image = cap.read()\n",
+    "\n",
+    "        if not ok:\n",
+    "            print(\"Cannot read the video feed.\")\n",
+    "            break\n",
+    "        \n",
+    "        meta, res = model.inference(image)\n",
+    "        bboxes,confidences,class_ids,updated_image  = model.visualize(res[0], meta, cfg.class_names, 0.43)\n",
+    "        \n",
+    "        tracks = tracker.update(bboxes, confidences, class_ids)\n",
+    "\n",
+    "        updated_image = draw_tracks(updated_image, tracks)\n",
+    "\n",
+    "        cv.imshow(\"image\", updated_image)\n",
+    "        if cv.waitKey(1) & 0xFF == ord('q'):\n",
+    "            break\n",
+    "\n",
+    "    cap.release()\n",
+    "    cv.destroyAllWindows()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "forward time: 0.077s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.066s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.069s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.059s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.062s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.097s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.000s | viz time: 0.003s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.045s | decode time: 0.001s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.064s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.062s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.001s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.058s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.070s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.057s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.060s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.071s | decode time: 0.004s | viz time: 0.001s\n",
+      "forward time: 0.063s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.059s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.057s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.057s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.057s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.058s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.017s | viz time: 0.001s\n",
+      "forward time: 0.057s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.061s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.059s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.057s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.059s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.061s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.061s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.058s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.004s | viz time: 0.001s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.059s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.000s | viz time: 0.000s\n",
+      "forward time: 0.072s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.056s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.061s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.057s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.004s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.002s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.056s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.076s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.056s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.060s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.056s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.058s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.070s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.064s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.068s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.064s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.058s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.062s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.066s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.065s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.059s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.057s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.075s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.059s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.056s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.060s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.059s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.057s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.004s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.058s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.078s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.058s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.001s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.061s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.057s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.061s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.057s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.056s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.057s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.063s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.063s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.046s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.044s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.100s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.059s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.043s | decode time: 0.010s | viz time: 0.003s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.001s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.054s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.104s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.064s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.060s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.058s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.056s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.058s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.007s | viz time: 0.000s\n",
+      "forward time: 0.056s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.061s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.063s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.062s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.057s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.059s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.056s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.044s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.044s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.043s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.044s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.044s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.044s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.001s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n",
+      "forward time: 0.055s | decode time: 0.004s | viz time: 0.001s\n",
+      "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.003s | viz time: 0.000s\n",
+      "forward time: 0.047s | decode time: 0.004s | viz time: 0.000s\n",
+      "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n",
+      "Cannot read the video feed.\n"
+     ]
+    }
+   ],
+   "source": [
+    "main(VIDEO_FILE, detmodel, tracker)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "ist",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/examples/example_notebooks/mot_YOLOv3.ipynb b/examples/example_notebooks/mot_YOLOv3.ipynb
index 248ee6d..f280059 100644
--- a/examples/example_notebooks/mot_YOLOv3.ipynb
+++ b/examples/example_notebooks/mot_YOLOv3.ipynb
@@ -46,7 +46,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "ae18feabad2649079498e476cb1cc240",
+       "model_id": "70c91504b2554928915ed6de8c9dfe63",
        "version_major": 2,
        "version_minor": 0
       },
@@ -54,8 +54,9 @@
        "Select(description='MOTracker:', options=('CentroidTracker', 'CentroidKF_Tracker', 'SORT', 'IOUTracker'), valu…"
       ]
      },
+     "execution_count": 3,
      "metadata": {},
-     "output_type": "display_data"
+     "output_type": "execute_result"
     }
    ],
    "source": [
@@ -145,7 +146,15 @@
    "cell_type": "code",
    "execution_count": 7,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Cannot read the video feed.\n"
+     ]
+    }
+   ],
    "source": [
     "main(VIDEO_FILE, model, tracker)"
    ]
@@ -160,9 +169,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "work_env",
+   "display_name": "ist",
    "language": "python",
-   "name": "work_env"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -174,7 +183,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.9"
+   "version": "3.8.16"
   }
  },
  "nbformat": 4,
diff --git a/logs.txt b/logs.txt
new file mode 100644
index 0000000..c19fe0a
--- /dev/null
+++ b/logs.txt
@@ -0,0 +1 @@
+INFO:root:Press "Esc", "q" or "Q" to exit.
diff --git a/main.py b/main.py
new file mode 100644
index 0000000..d578be3
--- /dev/null
+++ b/main.py
@@ -0,0 +1,64 @@
+import numpy as np
+import cv2 as cv
+from motrackers.detectors import Nanodet
+from motrackers import CentroidTracker, CentroidKF_Tracker, SORT, IOUTracker
+from motrackers.utils import draw_tracks
+from nanodet.util import Logger, cfg, load_config, load_model_weight
+
+VIDEO_FILE = "test.avi"
+WEIGHTS_PATH = 'weight/LiquidV5.pth'
+CONFIG_FILE_PATH = 'config/LiquidDetect416.yml'
+CHOSEN_TRACKER = 'SORT'
+CONFIDENCE_THRESHOLD = 0.4 # 目标检测的置信度筛选
+
+
+
+if CHOSEN_TRACKER == 'CentroidTracker':
+    tracker = CentroidTracker(max_lost=0, tracker_output_format='mot_challenge')
+elif CHOSEN_TRACKER == 'CentroidKF_Tracker':
+    tracker = CentroidKF_Tracker(max_lost=0, tracker_output_format='mot_challenge')
+elif CHOSEN_TRACKER == 'SORT':
+    tracker = SORT(max_lost=3, tracker_output_format='mot_challenge', iou_threshold=0.3)
+elif CHOSEN_TRACKER == 'IOUTracker':
+    tracker = IOUTracker(max_lost=2, iou_threshold=0.5, min_detection_confidence=0.4, max_detection_confidence=0.7,
+                         tracker_output_format='mot_challenge')
+else:
+    print("Please choose one tracker from the above list.")
+
+# 导入模型文件
+local_rank = 0
+modelpath = WEIGHTS_PATH
+device = "cpu:0"
+config = CONFIG_FILE_PATH
+logger = Logger(local_rank, use_tensorboard=False)
+load_config(cfg, config)
+detmodel = Nanodet(cfg, modelpath, logger, device)
+logger.log('Press "Esc", "q" or "Q" to exit.')
+
+def main(video_path, model, tracker):
+
+    cap = cv.VideoCapture(video_path)
+    while True:
+        ok, image = cap.read()
+
+        if not ok:
+            print("Cannot read the video feed.")
+            break
+        
+        meta, res = model.inference(image)
+        bboxes,confidences,class_ids,updated_image  = model.visualize(res[0], meta, cfg.class_names, CONFIDENCE_THRESHOLD)
+        
+        tracks = tracker.update(bboxes, confidences, class_ids)
+
+        updated_image = draw_tracks(updated_image, tracks)
+
+        cv.imshow("image", updated_image)
+        if cv.waitKey(1) & 0xFF == ord('q'):
+            break
+
+    cap.release()
+    cv.destroyAllWindows()
+    
+
+
+main(VIDEO_FILE, detmodel, tracker)
\ No newline at end of file
diff --git a/motrackers/detectors/__init__.py b/motrackers/detectors/__init__.py
index eacd013..58350f4 100644
--- a/motrackers/detectors/__init__.py
+++ b/motrackers/detectors/__init__.py
@@ -1,3 +1,4 @@
 from motrackers.detectors.tf import TF_SSDMobileNetV2
 from motrackers.detectors.caffe import Caffe_SSDMobileNet
 from motrackers.detectors.yolo import YOLOv3
+from motrackers.detectors.nanodet import Nanodet
diff --git a/motrackers/detectors/nanodet.py b/motrackers/detectors/nanodet.py
new file mode 100644
index 0000000..ff78c9a
--- /dev/null
+++ b/motrackers/detectors/nanodet.py
@@ -0,0 +1,80 @@
+import cv2
+import numpy as np
+from nanodet.data.batch_process import stack_batch_img
+from nanodet.data.collate import naive_collate
+from nanodet.data.transform import Pipeline
+from nanodet.model.arch import build_model
+from nanodet.util import Logger, cfg, load_config, load_model_weight
+from tool import infotrans
+import numpy as np
+import os
+import time
+import torch
+
+class Nanodet(object):
+    def __init__(self, cfg, model_path, logger, device="cpu:0"):
+        self.cfg = cfg
+        self.device = device
+        model = build_model(cfg.model)
+        ckpt = torch.load(model_path, map_location=lambda storage, loc: storage)
+        load_model_weight(model, ckpt, logger)
+        if cfg.model.arch.backbone.name == "RepVGG":
+            deploy_config = cfg.model
+            deploy_config.arch.backbone.update({"deploy": True})
+            deploy_model = build_model(deploy_config)
+            from nanodet.model.backbone.repvgg import repvgg_det_model_convert
+            model = repvgg_det_model_convert(model, deploy_model)
+        self.model = model.to(device).eval()
+        self.pipeline = Pipeline(cfg.data.val.pipeline, cfg.data.val.keep_ratio)
+
+    def inference(self, img):
+        self.image = img.copy()
+        img_info = {"id": 0}
+        if isinstance(img, str):
+            img_info["file_name"] = os.path.basename(img)
+            img = cv2.imread(img)
+        else:
+            img_info["file_name"] = None
+
+        height, width = img.shape[:2]
+        img_info["height"] = height
+        img_info["width"] = width
+        meta = dict(img_info=img_info, raw_img=img, img=img)
+        meta = self.pipeline(None, meta, self.cfg.data.val.input_size)
+        meta["img"] = torch.from_numpy(meta["img"].transpose(2, 0, 1)).to(self.device)
+        meta = naive_collate([meta])
+        meta["img"] = stack_batch_img(meta["img"], divisible=32)
+        with torch.no_grad():
+            results = self.model.inference(meta)
+        return meta, results
+
+    def visualize(self, dets, meta, class_names, score_thres, wait=0):
+        """
+        由可视化函数修改得的信息输出函数
+
+        Outputs:
+            bboxes (int): [x,y,w,h]
+            confidences (float): 置信度
+            class_ids (int): 类别
+        """
+        time1 = time.time()
+        result_img, all_box = self.model.head.show_result(
+            meta["raw_img"][0], dets, class_names, score_thres=score_thres, show=True
+        )
+        bboxes , confidences , class_ids = infotrans(all_box)
+        print("viz time: {:.3f}s".format(time.time() - time1))
+        self.class_names = dict(zip(class_ids,class_names))
+        np.random.seed(12345)
+        for bb, conf, cid in zip(bboxes, confidences, class_ids):
+            # bbox_colors = {key: np.random.randint(0, 255, size=(3,)).tolist() for key in self.class_names.keys()}
+            # clr = [int(c) for c in bbox_colors[cid]]
+            cv2.rectangle(self.image, (bb[0], bb[1]), (bb[0] + bb[2], bb[1] + bb[3]), (253,230,224), 2)
+            # label = "{}:{:.4f}".format(self.class_names[cid], conf)
+            # (label_width, label_height), baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)
+            # y_label = max(bb[1], label_height)
+            # cv2.rectangle(self.image, (bb[0], y_label - label_height), (bb[0] + label_width, y_label + baseLine),
+            #                 (255, 255, 255), cv2.FILLED)
+            # cv2.putText(self.image, label, (bb[0], y_label), cv2.FONT_HERSHEY_SIMPLEX, 0.5, clr, 2)
+        bboxes = np.array(bboxes).astype('int')
+        confidences = np.array(confidences)
+        return bboxes , confidences , class_ids , self.image
diff --git a/motrackers/detectors/yolo.py b/motrackers/detectors/yolo.py
index d6fd24d..e153fa5 100644
--- a/motrackers/detectors/yolo.py
+++ b/motrackers/detectors/yolo.py
@@ -23,7 +23,7 @@ def __init__(self, weights_path, configfile_path, labels_path, confidence_thresh
         object_names = load_labelsjson(labels_path)
 
         layer_names = self.net.getLayerNames()
-        if cv2.__version__ == '4.6.0':
+        if cv.__version__ == '4.6.0':
             self.layer_names = [layer_names[i - 1] for i in self.net.getUnconnectedOutLayers()]
         else:
             self.layer_names = [layer_names[i[0] - 1] for i in self.net.getUnconnectedOutLayers()]
diff --git a/nanodet/__about__.py b/nanodet/__about__.py
new file mode 100644
index 0000000..57c1e20
--- /dev/null
+++ b/nanodet/__about__.py
@@ -0,0 +1,24 @@
+import time
+
+_this_year = time.strftime("%Y")
+__version__ = "1.0.0-alpha"
+__author__ = "RangiLyu"
+__author_email__ = "lyuchqi@gmail.com"
+__license__ = "Apache-2.0"
+__copyright__ = f"Copyright (c) 2020-{_this_year}, {__author__}."
+__homepage__ = "https://github.com/RangiLyu/nanodet"
+
+__docs__ = (
+    "NanoDet: Deep learning object detection toolbox for super fast and "
+    "lightweight anchor-free object detection models."
+)
+
+__all__ = [
+    "__author__",
+    "__author_email__",
+    "__copyright__",
+    "__docs__",
+    "__homepage__",
+    "__license__",
+    "__version__",
+]
diff --git a/nanodet/__init__.py b/nanodet/__init__.py
new file mode 100644
index 0000000..c0a320a
--- /dev/null
+++ b/nanodet/__init__.py
@@ -0,0 +1,8 @@
+"""package info."""
+
+import os
+
+from nanodet.__about__ import *  # noqa: F401 F403
+
+_PACKAGE_ROOT = os.path.dirname(__file__)
+_PROJECT_ROOT = os.path.dirname(_PACKAGE_ROOT)
diff --git a/nanodet/data/batch_process.py b/nanodet/data/batch_process.py
new file mode 100644
index 0000000..f84170a
--- /dev/null
+++ b/nanodet/data/batch_process.py
@@ -0,0 +1,37 @@
+from typing import Sequence
+
+import torch
+import torch.nn.functional as F
+
+
+def stack_batch_img(
+    img_tensors: Sequence[torch.Tensor], divisible: int = 0, pad_value: float = 0.0
+) -> torch.Tensor:
+    """
+    Args:
+        img_tensors (Sequence[torch.Tensor]):
+        divisible (int):
+        pad_value (float): value to pad
+
+    Returns:
+        torch.Tensor.
+    """
+    assert len(img_tensors) > 0
+    assert isinstance(img_tensors, (tuple, list))
+    assert divisible >= 0
+    img_heights = []
+    img_widths = []
+    for img in img_tensors:
+        assert img.shape[:-2] == img_tensors[0].shape[:-2]
+        img_heights.append(img.shape[-2])
+        img_widths.append(img.shape[-1])
+    max_h, max_w = max(img_heights), max(img_widths)
+    if divisible > 0:
+        max_h = (max_h + divisible - 1) // divisible * divisible
+        max_w = (max_w + divisible - 1) // divisible * divisible
+
+    batch_imgs = []
+    for img in img_tensors:
+        padding_size = [0, max_w - img.shape[-1], 0, max_h - img.shape[-2]]
+        batch_imgs.append(F.pad(img, padding_size, value=pad_value))
+    return torch.stack(batch_imgs, dim=0).contiguous()
diff --git a/nanodet/data/collate.py b/nanodet/data/collate.py
new file mode 100644
index 0000000..b559c1a
--- /dev/null
+++ b/nanodet/data/collate.py
@@ -0,0 +1,84 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import collections
+import re
+
+import torch
+# from torch._six import string_classes
+
+string_classes = (str, bytes)
+
+np_str_obj_array_pattern = re.compile(r"[SaUO]")
+
+default_collate_err_msg_format = (
+    "default_collate: batch must contain tensors, numpy arrays, numbers, "
+    "dicts or lists; found {}"
+)
+
+
+def collate_function(batch):
+    r"""Puts each data field into a tensor with outer dimension batch size"""
+
+    elem = batch[0]
+    elem_type = type(elem)
+    if isinstance(elem, torch.Tensor):
+        out = None
+        if torch.utils.data.get_worker_info() is not None:
+            # If we're in a background process, concatenate directly into a
+            # shared memory tensor to avoid an extra copy
+            numel = sum([x.numel() for x in batch])
+            storage = elem.storage()._new_shared(numel)
+            out = elem.new(storage)
+        return torch.stack(batch, 0, out=out)
+    elif (
+        elem_type.__module__ == "numpy"
+        and elem_type.__name__ != "str_"
+        and elem_type.__name__ != "string_"
+    ):
+        elem = batch[0]
+        if elem_type.__name__ == "ndarray":
+            # array of string classes and object
+            if np_str_obj_array_pattern.search(elem.dtype.str) is not None:
+                raise TypeError(default_collate_err_msg_format.format(elem.dtype))
+
+            return batch
+        elif elem.shape == ():  # scalars
+            return batch
+    elif isinstance(elem, float):
+        return torch.tensor(batch, dtype=torch.float64)
+    elif isinstance(elem, int):
+        return torch.tensor(batch)
+    elif isinstance(elem, string_classes):
+        return batch
+    elif isinstance(elem, collections.abc.Mapping):
+        return {key: collate_function([d[key] for d in batch]) for key in elem}
+    elif isinstance(elem, tuple) and hasattr(elem, "_fields"):  # namedtuple
+        return elem_type(*(collate_function(samples) for samples in zip(*batch)))
+    elif isinstance(elem, collections.abc.Sequence):
+        transposed = zip(*batch)
+        return [collate_function(samples) for samples in transposed]
+
+    raise TypeError(default_collate_err_msg_format.format(elem_type))
+
+
+def naive_collate(batch):
+    """Only collate dict value in to a list. E.g. meta data dict and img_info
+    dict will be collated."""
+
+    elem = batch[0]
+    if isinstance(elem, dict):
+        return {key: naive_collate([d[key] for d in batch]) for key in elem}
+    else:
+        return batch
diff --git a/nanodet/data/dataset/__init__.py b/nanodet/data/dataset/__init__.py
new file mode 100644
index 0000000..92c405b
--- /dev/null
+++ b/nanodet/data/dataset/__init__.py
@@ -0,0 +1,41 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import copy
+import warnings
+
+from .coco import CocoDataset
+from .xml_dataset import XMLDataset
+
+
+def build_dataset(cfg, mode):
+    dataset_cfg = copy.deepcopy(cfg)
+    name = dataset_cfg.pop("name")
+    if name == "coco":
+        warnings.warn(
+            "Dataset name coco has been deprecated. Please use CocoDataset instead."
+        )
+        return CocoDataset(mode=mode, **dataset_cfg)
+    elif name == "xml_dataset":
+        warnings.warn(
+            "Dataset name xml_dataset has been deprecated. "
+            "Please use XMLDataset instead."
+        )
+        return XMLDataset(mode=mode, **dataset_cfg)
+    elif name == "CocoDataset":
+        return CocoDataset(mode=mode, **dataset_cfg)
+    elif name == "XMLDataset":
+        return XMLDataset(mode=mode, **dataset_cfg)
+    else:
+        raise NotImplementedError("Unknown dataset type!")
diff --git a/nanodet/data/dataset/base.py b/nanodet/data/dataset/base.py
new file mode 100644
index 0000000..c47d578
--- /dev/null
+++ b/nanodet/data/dataset/base.py
@@ -0,0 +1,123 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import random
+from abc import ABCMeta, abstractmethod
+from typing import Dict, Optional, Tuple
+
+import numpy as np
+from torch.utils.data import Dataset
+
+from ..transform import Pipeline
+
+
+class BaseDataset(Dataset, metaclass=ABCMeta):
+    """
+    A base class of detection dataset. Referring from MMDetection.
+    A dataset should have images, annotations and preprocessing pipelines
+    NanoDet use [xmin, ymin, xmax, ymax] format for box and
+     [[x0,y0], [x1,y1] ... [xn,yn]] format for key points.
+    instance masks should decode into binary masks for each instance like
+    {
+        'bbox': [xmin,ymin,xmax,ymax],
+        'mask': mask
+     }
+    segmentation mask should decode into binary masks for each class.
+    Args:
+        img_path (str): image data folder
+        ann_path (str): annotation file path or folder
+        use_instance_mask (bool): load instance segmentation data
+        use_seg_mask (bool): load semantic segmentation data
+        use_keypoint (bool): load pose keypoint data
+        load_mosaic (bool): using mosaic data augmentation from yolov4
+        mode (str): 'train' or 'val' or 'test'
+        multi_scale (Tuple[float, float]): Multi-scale factor range.
+    """
+
+    def __init__(
+        self,
+        img_path: str,
+        ann_path: str,
+        input_size: Tuple[int, int],
+        pipeline: Dict,
+        keep_ratio: bool = True,
+        use_instance_mask: bool = False,
+        use_seg_mask: bool = False,
+        use_keypoint: bool = False,
+        load_mosaic: bool = False,
+        mode: str = "train",
+        multi_scale: Optional[Tuple[float, float]] = None,
+    ):
+        assert mode in ["train", "val", "test"]
+        self.img_path = img_path
+        self.ann_path = ann_path
+        self.input_size = input_size
+        self.pipeline = Pipeline(pipeline, keep_ratio)
+        self.keep_ratio = keep_ratio
+        self.use_instance_mask = use_instance_mask
+        self.use_seg_mask = use_seg_mask
+        self.use_keypoint = use_keypoint
+        self.load_mosaic = load_mosaic
+        self.multi_scale = multi_scale
+        self.mode = mode
+
+        self.data_info = self.get_data_info(ann_path)
+
+    def __len__(self):
+        return len(self.data_info)
+
+    def __getitem__(self, idx):
+        if self.mode == "val" or self.mode == "test":
+            return self.get_val_data(idx)
+        else:
+            while True:
+                data = self.get_train_data(idx)
+                if data is None:
+                    idx = self.get_another_id()
+                    continue
+                return data
+
+    @staticmethod
+    def get_random_size(
+        scale_range: Tuple[float, float], image_size: Tuple[int, int]
+    ) -> Tuple[int, int]:
+        """
+        Get random image shape by multi-scale factor and image_size.
+        Args:
+            scale_range (Tuple[float, float]): Multi-scale factor range.
+                Format in [(width, height), (width, height)]
+            image_size (Tuple[int, int]): Image size. Format in (width, height).
+
+        Returns:
+            Tuple[int, int]
+        """
+        assert len(scale_range) == 2
+        scale_factor = random.uniform(*scale_range)
+        width = int(image_size[0] * scale_factor)
+        height = int(image_size[1] * scale_factor)
+        return width, height
+
+    @abstractmethod
+    def get_data_info(self, ann_path):
+        pass
+
+    @abstractmethod
+    def get_train_data(self, idx):
+        pass
+
+    @abstractmethod
+    def get_val_data(self, idx):
+        pass
+
+    def get_another_id(self):
+        return np.random.random_integers(0, len(self.data_info) - 1)
diff --git a/nanodet/data/dataset/coco.py b/nanodet/data/dataset/coco.py
new file mode 100644
index 0000000..3c46b14
--- /dev/null
+++ b/nanodet/data/dataset/coco.py
@@ -0,0 +1,158 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import cv2
+import numpy as np
+import torch
+from pycocotools.coco import COCO
+
+from .base import BaseDataset
+
+
+class CocoDataset(BaseDataset):
+    def get_data_info(self, ann_path):
+        """
+        Load basic information of dataset such as image path, label and so on.
+        :param ann_path: coco json file path
+        :return: image info:
+        [{'license': 2,
+          'file_name': '000000000139.jpg',
+          'coco_url': 'http://images.cocodataset.org/val2017/000000000139.jpg',
+          'height': 426,
+          'width': 640,
+          'date_captured': '2013-11-21 01:34:01',
+          'flickr_url':
+              'http://farm9.staticflickr.com/8035/8024364858_9c41dc1666_z.jpg',
+          'id': 139},
+         ...
+        ]
+        """
+        self.coco_api = COCO(ann_path)
+        self.cat_ids = sorted(self.coco_api.getCatIds())
+        self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)}
+        self.cats = self.coco_api.loadCats(self.cat_ids)
+        self.class_names = [cat["name"] for cat in self.cats]
+        self.img_ids = sorted(self.coco_api.imgs.keys())
+        img_info = self.coco_api.loadImgs(self.img_ids)
+        return img_info
+
+    def get_per_img_info(self, idx):
+        img_info = self.data_info[idx]
+        file_name = img_info["file_name"]
+        height = img_info["height"]
+        width = img_info["width"]
+        id = img_info["id"]
+        if not isinstance(id, int):
+            raise TypeError("Image id must be int.")
+        info = {"file_name": file_name, "height": height, "width": width, "id": id}
+        return info
+
+    def get_img_annotation(self, idx):
+        """
+        load per image annotation
+        :param idx: index in dataloader
+        :return: annotation dict
+        """
+        img_id = self.img_ids[idx]
+        ann_ids = self.coco_api.getAnnIds([img_id])
+        anns = self.coco_api.loadAnns(ann_ids)
+        gt_bboxes = []
+        gt_labels = []
+        gt_bboxes_ignore = []
+        if self.use_instance_mask:
+            gt_masks = []
+        if self.use_keypoint:
+            gt_keypoints = []
+        for ann in anns:
+            if ann.get("ignore", False):
+                continue
+            x1, y1, w, h = ann["bbox"]
+            if ann["area"] <= 0 or w < 1 or h < 1:
+                continue
+            if ann["category_id"] not in self.cat_ids:
+                continue
+            bbox = [x1, y1, x1 + w, y1 + h]
+            if ann.get("iscrowd", False):
+                gt_bboxes_ignore.append(bbox)
+            else:
+                gt_bboxes.append(bbox)
+                gt_labels.append(self.cat2label[ann["category_id"]])
+                if self.use_instance_mask:
+                    gt_masks.append(self.coco_api.annToMask(ann))
+                if self.use_keypoint:
+                    gt_keypoints.append(ann["keypoints"])
+        if gt_bboxes:
+            gt_bboxes = np.array(gt_bboxes, dtype=np.float32)
+            gt_labels = np.array(gt_labels, dtype=np.int64)
+        else:
+            gt_bboxes = np.zeros((0, 4), dtype=np.float32)
+            gt_labels = np.array([], dtype=np.int64)
+        if gt_bboxes_ignore:
+            gt_bboxes_ignore = np.array(gt_bboxes_ignore, dtype=np.float32)
+        else:
+            gt_bboxes_ignore = np.zeros((0, 4), dtype=np.float32)
+        annotation = dict(
+            bboxes=gt_bboxes, labels=gt_labels, bboxes_ignore=gt_bboxes_ignore
+        )
+        if self.use_instance_mask:
+            annotation["masks"] = gt_masks
+        if self.use_keypoint:
+            if gt_keypoints:
+                annotation["keypoints"] = np.array(gt_keypoints, dtype=np.float32)
+            else:
+                annotation["keypoints"] = np.zeros((0, 51), dtype=np.float32)
+        return annotation
+
+    def get_train_data(self, idx):
+        """
+        Load image and annotation
+        :param idx:
+        :return: meta-data (a dict containing image, annotation and other information)
+        """
+        img_info = self.get_per_img_info(idx)
+        file_name = img_info["file_name"]
+        image_path = os.path.join(self.img_path, file_name)
+        img = cv2.imread(image_path)
+        if img is None:
+            print("image {} read failed.".format(image_path))
+            raise FileNotFoundError("Cant load image! Please check image path!")
+        ann = self.get_img_annotation(idx)
+        meta = dict(
+            img=img, img_info=img_info, gt_bboxes=ann["bboxes"], gt_labels=ann["labels"]
+        )
+        if self.use_instance_mask:
+            meta["gt_masks"] = ann["masks"]
+        if self.use_keypoint:
+            meta["gt_keypoints"] = ann["keypoints"]
+
+        input_size = self.input_size
+        if self.multi_scale:
+            input_size = self.get_random_size(self.multi_scale, input_size)
+
+        meta = self.pipeline(self, meta, input_size)
+
+        meta["img"] = torch.from_numpy(meta["img"].transpose(2, 0, 1))
+        return meta
+
+    def get_val_data(self, idx):
+        """
+        Currently no difference from get_train_data.
+        Not support TTA(testing time augmentation) yet.
+        :param idx:
+        :return:
+        """
+        # TODO: support TTA
+        return self.get_train_data(idx)
diff --git a/nanodet/data/dataset/xml_dataset.py b/nanodet/data/dataset/xml_dataset.py
new file mode 100644
index 0000000..5300660
--- /dev/null
+++ b/nanodet/data/dataset/xml_dataset.py
@@ -0,0 +1,157 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import os
+import time
+import xml.etree.ElementTree as ET
+from collections import defaultdict
+
+from pycocotools.coco import COCO
+
+from .coco import CocoDataset
+
+
+def get_file_list(path, type=".xml"):
+    file_names = []
+    for maindir, subdir, file_name_list in os.walk(path):
+        for filename in file_name_list:
+            apath = os.path.join(maindir, filename)
+            ext = os.path.splitext(apath)[1]
+            if ext == type:
+                file_names.append(filename)
+    return file_names
+
+
+class CocoXML(COCO):
+    def __init__(self, annotation):
+        """
+        Constructor of Microsoft COCO helper class for
+        reading and visualizing annotations.
+        :param annotation: annotation dict
+        :return:
+        """
+        # load dataset
+        self.dataset, self.anns, self.cats, self.imgs = dict(), dict(), dict(), dict()
+        self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
+        dataset = annotation
+        assert type(dataset) == dict, "annotation file format {} not supported".format(
+            type(dataset)
+        )
+        self.dataset = dataset
+        self.createIndex()
+
+
+class XMLDataset(CocoDataset):
+    def __init__(self, class_names, **kwargs):
+        self.class_names = class_names
+        super(XMLDataset, self).__init__(**kwargs)
+
+    def xml_to_coco(self, ann_path):
+        """
+        convert xml annotations to coco_api
+        :param ann_path:
+        :return:
+        """
+        logging.info("loading annotations into memory...")
+        tic = time.time()
+        ann_file_names = get_file_list(ann_path, type=".xml")
+        logging.info("Found {} annotation files.".format(len(ann_file_names)))
+        image_info = []
+        categories = []
+        annotations = []
+        for idx, supercat in enumerate(self.class_names):
+            categories.append(
+                {"supercategory": supercat, "id": idx + 1, "name": supercat}
+            )
+        ann_id = 1
+        for idx, xml_name in enumerate(ann_file_names):
+            tree = ET.parse(os.path.join(ann_path, xml_name))
+            root = tree.getroot()
+            file_name = root.find("filename").text
+            width = int(root.find("size").find("width").text)
+            height = int(root.find("size").find("height").text)
+            info = {
+                "file_name": file_name,
+                "height": height,
+                "width": width,
+                "id": idx + 1,
+            }
+            image_info.append(info)
+            for _object in root.findall("object"):
+                category = _object.find("name").text
+                if category not in self.class_names:
+                    logging.warning(
+                        "WARNING! {} is not in class_names! "
+                        "Pass this box annotation.".format(category)
+                    )
+                    continue
+                for cat in categories:
+                    if category == cat["name"]:
+                        cat_id = cat["id"]
+                xmin = int(_object.find("bndbox").find("xmin").text)
+                ymin = int(_object.find("bndbox").find("ymin").text)
+                xmax = int(_object.find("bndbox").find("xmax").text)
+                ymax = int(_object.find("bndbox").find("ymax").text)
+                w = xmax - xmin
+                h = ymax - ymin
+                if w < 0 or h < 0:
+                    logging.warning(
+                        "WARNING! Find error data in file {}! Box w and "
+                        "h should > 0. Pass this box annotation.".format(xml_name)
+                    )
+                    continue
+                coco_box = [max(xmin, 0), max(ymin, 0), min(w, width), min(h, height)]
+                ann = {
+                    "image_id": idx + 1,
+                    "bbox": coco_box,
+                    "category_id": cat_id,
+                    "iscrowd": 0,
+                    "id": ann_id,
+                    "area": coco_box[2] * coco_box[3],
+                }
+                annotations.append(ann)
+                ann_id += 1
+
+        coco_dict = {
+            "images": image_info,
+            "categories": categories,
+            "annotations": annotations,
+        }
+        logging.info(
+            "Load {} xml files and {} boxes".format(len(image_info), len(annotations))
+        )
+        logging.info("Done (t={:0.2f}s)".format(time.time() - tic))
+        return coco_dict
+
+    def get_data_info(self, ann_path):
+        """
+        Load basic information of dataset such as image path, label and so on.
+        :param ann_path: coco json file path
+        :return: image info:
+        [{'file_name': '000000000139.jpg',
+          'height': 426,
+          'width': 640,
+          'id': 139},
+         ...
+        ]
+        """
+        coco_dict = self.xml_to_coco(ann_path)
+        self.coco_api = CocoXML(coco_dict)
+        self.cat_ids = sorted(self.coco_api.getCatIds())
+        self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)}
+        self.cats = self.coco_api.loadCats(self.cat_ids)
+        self.img_ids = sorted(self.coco_api.imgs.keys())
+        img_info = self.coco_api.loadImgs(self.img_ids)
+        return img_info
diff --git a/nanodet/data/transform/__init__.py b/nanodet/data/transform/__init__.py
new file mode 100644
index 0000000..c30ae76
--- /dev/null
+++ b/nanodet/data/transform/__init__.py
@@ -0,0 +1,17 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .pipeline import Pipeline
+
+__all__ = ["Pipeline"]
diff --git a/nanodet/data/transform/color.py b/nanodet/data/transform/color.py
new file mode 100644
index 0000000..9eb0236
--- /dev/null
+++ b/nanodet/data/transform/color.py
@@ -0,0 +1,70 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import random
+
+import cv2
+import numpy as np
+
+
+def random_brightness(img, delta):
+    img += random.uniform(-delta, delta)
+    return img
+
+
+def random_contrast(img, alpha_low, alpha_up):
+    img *= random.uniform(alpha_low, alpha_up)
+    return img
+
+
+def random_saturation(img, alpha_low, alpha_up):
+    hsv_img = cv2.cvtColor(img.astype(np.float32), cv2.COLOR_BGR2HSV)
+    hsv_img[..., 1] *= random.uniform(alpha_low, alpha_up)
+    img = cv2.cvtColor(hsv_img, cv2.COLOR_HSV2BGR)
+    return img
+
+
+def normalize(meta, mean, std):
+    img = meta["img"].astype(np.float32)
+    mean = np.array(mean, dtype=np.float64).reshape(1, -1)
+    stdinv = 1 / np.array(std, dtype=np.float64).reshape(1, -1)
+    cv2.subtract(img, mean, img)
+    cv2.multiply(img, stdinv, img)
+    meta["img"] = img
+    return meta
+
+
+def _normalize(img, mean, std):
+    mean = np.array(mean, dtype=np.float32).reshape(1, 1, 3) / 255
+    std = np.array(std, dtype=np.float32).reshape(1, 1, 3) / 255
+    img = (img - mean) / std
+    return img
+
+
+def color_aug_and_norm(meta, kwargs):
+    img = meta["img"].astype(np.float32) / 255
+
+    if "brightness" in kwargs and random.randint(0, 1):
+        img = random_brightness(img, kwargs["brightness"])
+
+    if "contrast" in kwargs and random.randint(0, 1):
+        img = random_contrast(img, *kwargs["contrast"])
+
+    if "saturation" in kwargs and random.randint(0, 1):
+        img = random_saturation(img, *kwargs["saturation"])
+    # cv2.imshow('trans', img)
+    # cv2.waitKey(0)
+    img = _normalize(img, *kwargs["normalize"])
+    meta["img"] = img
+    return meta
diff --git a/nanodet/data/transform/mosaic.py b/nanodet/data/transform/mosaic.py
new file mode 100644
index 0000000..e69de29
diff --git a/nanodet/data/transform/pipeline.py b/nanodet/data/transform/pipeline.py
new file mode 100644
index 0000000..71b8f7d
--- /dev/null
+++ b/nanodet/data/transform/pipeline.py
@@ -0,0 +1,59 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import functools
+import warnings
+from typing import Dict, Tuple
+
+from torch.utils.data import Dataset
+
+from .color import color_aug_and_norm
+from .warp import ShapeTransform, warp_and_resize
+
+
+class LegacyPipeline:
+    def __init__(self, cfg, keep_ratio):
+        warnings.warn(
+            "Deprecated warning! Pipeline from nanodet v0.x has been deprecated,"
+            "Please use new Pipeline and update your config!"
+        )
+        self.warp = functools.partial(
+            warp_and_resize, warp_kwargs=cfg, keep_ratio=keep_ratio
+        )
+        self.color = functools.partial(color_aug_and_norm, kwargs=cfg)
+
+    def __call__(self, meta, dst_shape):
+        meta = self.warp(meta, dst_shape=dst_shape)
+        meta = self.color(meta=meta)
+        return meta
+
+
+class Pipeline:
+    """Data process pipeline. Apply augmentation and pre-processing on
+    meta_data from dataset.
+
+    Args:
+        cfg (Dict): Data pipeline config.
+        keep_ratio (bool): Whether to keep aspect ratio when resizing image.
+
+    """
+
+    def __init__(self, cfg: Dict, keep_ratio: bool):
+        self.shape_transform = ShapeTransform(keep_ratio, **cfg)
+        self.color = functools.partial(color_aug_and_norm, kwargs=cfg)
+
+    def __call__(self, dataset: Dataset, meta: Dict, dst_shape: Tuple[int, int]):
+        meta = self.shape_transform(meta, dst_shape=dst_shape)
+        meta = self.color(meta=meta)
+        return meta
diff --git a/nanodet/data/transform/warp.py b/nanodet/data/transform/warp.py
new file mode 100644
index 0000000..a102348
--- /dev/null
+++ b/nanodet/data/transform/warp.py
@@ -0,0 +1,352 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+import random
+from typing import Dict, Optional, Tuple
+
+import cv2
+import numpy as np
+
+
+def get_flip_matrix(prob=0.5):
+    F = np.eye(3)
+    if random.random() < prob:
+        F[0, 0] = -1
+    return F
+
+
+def get_perspective_matrix(perspective=0.0):
+    """
+
+    :param perspective:
+    :return:
+    """
+    P = np.eye(3)
+    P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)
+    P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)
+    return P
+
+
+def get_rotation_matrix(degree=0.0):
+    """
+
+    :param degree:
+    :return:
+    """
+    R = np.eye(3)
+    a = random.uniform(-degree, degree)
+    R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=1)
+    return R
+
+
+def get_scale_matrix(ratio=(1, 1)):
+    """
+
+    :param ratio:
+    """
+    Scl = np.eye(3)
+    scale = random.uniform(*ratio)
+    Scl[0, 0] *= scale
+    Scl[1, 1] *= scale
+    return Scl
+
+
+def get_stretch_matrix(width_ratio=(1, 1), height_ratio=(1, 1)):
+    """
+
+    :param width_ratio:
+    :param height_ratio:
+    """
+    Str = np.eye(3)
+    Str[0, 0] *= random.uniform(*width_ratio)
+    Str[1, 1] *= random.uniform(*height_ratio)
+    return Str
+
+
+def get_shear_matrix(degree):
+    """
+
+    :param degree:
+    :return:
+    """
+    Sh = np.eye(3)
+    Sh[0, 1] = math.tan(
+        random.uniform(-degree, degree) * math.pi / 180
+    )  # x shear (deg)
+    Sh[1, 0] = math.tan(
+        random.uniform(-degree, degree) * math.pi / 180
+    )  # y shear (deg)
+    return Sh
+
+
+def get_translate_matrix(translate, width, height):
+    """
+
+    :param translate:
+    :return:
+    """
+    T = np.eye(3)
+    T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation
+    T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation
+    return T
+
+
+def get_resize_matrix(raw_shape, dst_shape, keep_ratio):
+    """
+    Get resize matrix for resizing raw img to input size
+    :param raw_shape: (width, height) of raw image
+    :param dst_shape: (width, height) of input image
+    :param keep_ratio: whether keep original ratio
+    :return: 3x3 Matrix
+    """
+    r_w, r_h = raw_shape
+    d_w, d_h = dst_shape
+    Rs = np.eye(3)
+    if keep_ratio:
+        C = np.eye(3)
+        C[0, 2] = -r_w / 2
+        C[1, 2] = -r_h / 2
+
+        if r_w / r_h < d_w / d_h:
+            ratio = d_h / r_h
+        else:
+            ratio = d_w / r_w
+        Rs[0, 0] *= ratio
+        Rs[1, 1] *= ratio
+
+        T = np.eye(3)
+        T[0, 2] = 0.5 * d_w
+        T[1, 2] = 0.5 * d_h
+        return T @ Rs @ C
+    else:
+        Rs[0, 0] *= d_w / r_w
+        Rs[1, 1] *= d_h / r_h
+        return Rs
+
+
+def warp_and_resize(
+    meta: Dict,
+    warp_kwargs: Dict,
+    dst_shape: Tuple[int, int],
+    keep_ratio: bool = True,
+):
+    # TODO: background, type
+    raw_img = meta["img"]
+    height = raw_img.shape[0]  # shape(h,w,c)
+    width = raw_img.shape[1]
+
+    # center
+    C = np.eye(3)
+    C[0, 2] = -width / 2
+    C[1, 2] = -height / 2
+
+    # do not change the order of mat mul
+    if "perspective" in warp_kwargs and random.randint(0, 1):
+        P = get_perspective_matrix(warp_kwargs["perspective"])
+        C = P @ C
+    if "scale" in warp_kwargs and random.randint(0, 1):
+        Scl = get_scale_matrix(warp_kwargs["scale"])
+        C = Scl @ C
+    if "stretch" in warp_kwargs and random.randint(0, 1):
+        Str = get_stretch_matrix(*warp_kwargs["stretch"])
+        C = Str @ C
+    if "rotation" in warp_kwargs and random.randint(0, 1):
+        R = get_rotation_matrix(warp_kwargs["rotation"])
+        C = R @ C
+    if "shear" in warp_kwargs and random.randint(0, 1):
+        Sh = get_shear_matrix(warp_kwargs["shear"])
+        C = Sh @ C
+    if "flip" in warp_kwargs:
+        F = get_flip_matrix(warp_kwargs["flip"])
+        C = F @ C
+    if "translate" in warp_kwargs and random.randint(0, 1):
+        T = get_translate_matrix(warp_kwargs["translate"], width, height)
+    else:
+        T = get_translate_matrix(0, width, height)
+    M = T @ C
+    # M = T @ Sh @ R @ Str @ P @ C
+    ResizeM = get_resize_matrix((width, height), dst_shape, keep_ratio)
+    M = ResizeM @ M
+    img = cv2.warpPerspective(raw_img, M, dsize=tuple(dst_shape))
+    meta["img"] = img
+    meta["warp_matrix"] = M
+    if "gt_bboxes" in meta:
+        boxes = meta["gt_bboxes"]
+        meta["gt_bboxes"] = warp_boxes(boxes, M, dst_shape[0], dst_shape[1])
+    if "gt_masks" in meta:
+        for i, mask in enumerate(meta["gt_masks"]):
+            meta["gt_masks"][i] = cv2.warpPerspective(mask, M, dsize=tuple(dst_shape))
+
+    # TODO: keypoints
+    # if 'gt_keypoints' in meta:
+
+    return meta
+
+
+def warp_boxes(boxes, M, width, height):
+    n = len(boxes)
+    if n:
+        # warp points
+        xy = np.ones((n * 4, 3))
+        xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(
+            n * 4, 2
+        )  # x1y1, x2y2, x1y2, x2y1
+        xy = xy @ M.T  # transform
+        xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8)  # rescale
+        # create new boxes
+        x = xy[:, [0, 2, 4, 6]]
+        y = xy[:, [1, 3, 5, 7]]
+        xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
+        # clip boxes
+        xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width)
+        xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height)
+        return xy.astype(np.float32)
+    else:
+        return boxes
+
+
+# def warp_keypoints(keypoints, M, width, height):
+#     n = len(keypoints)
+#     if n:
+#         # warp points
+#         xy = np.ones((n * 4, 3))
+#         # x1y1, x2y2, x1y2, x2y1
+#         xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(n * 4, 2)
+#         xy = xy @ M.T  # transform
+#         xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8)  # rescale
+#         # create new boxes
+#         x = xy[:, [0, 2, 4, 6]]
+#         y = xy[:, [1, 3, 5, 7]]
+#         xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
+#         # clip boxes
+#         xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width)
+#         xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height)
+#         return xy
+
+
+def get_minimum_dst_shape(
+    src_shape: Tuple[int, int],
+    dst_shape: Tuple[int, int],
+    divisible: Optional[int] = None,
+) -> Tuple[int, int]:
+    """Calculate minimum dst shape"""
+    src_w, src_h = src_shape
+    dst_w, dst_h = dst_shape
+
+    if src_w / src_h < dst_w / dst_h:
+        ratio = dst_h / src_h
+    else:
+        ratio = dst_w / src_w
+
+    dst_w = int(ratio * src_w)
+    dst_h = int(ratio * src_h)
+
+    if divisible and divisible > 0:
+        dst_w = max(divisible, int((dst_w + divisible - 1) // divisible * divisible))
+        dst_h = max(divisible, int((dst_h + divisible - 1) // divisible * divisible))
+    return dst_w, dst_h
+
+
+class ShapeTransform:
+    """Shape transforms including resize, random perspective, random scale,
+    random stretch, random rotation, random shear, random translate,
+    and random flip.
+
+    Args:
+        keep_ratio: Whether to keep aspect ratio of the image.
+        divisible: Make image height and width is divisible by a number.
+        perspective: Random perspective factor.
+        scale: Random scale ratio.
+        stretch: Width and height stretch ratio range.
+        rotation: Random rotate degree.
+        shear: Random shear degree.
+        translate: Random translate ratio.
+        flip: Random flip probability.
+    """
+
+    def __init__(
+        self,
+        keep_ratio: bool,
+        divisible: int = 0,
+        perspective: float = 0.0,
+        scale: Tuple[int, int] = (1, 1),
+        stretch: Tuple = ((1, 1), (1, 1)),
+        rotation: float = 0.0,
+        shear: float = 0.0,
+        translate: float = 0.0,
+        flip: float = 0.0,
+        **kwargs
+    ):
+        self.keep_ratio = keep_ratio
+        self.divisible = divisible
+        self.perspective = perspective
+        self.scale_ratio = scale
+        self.stretch_ratio = stretch
+        self.rotation_degree = rotation
+        self.shear_degree = shear
+        self.flip_prob = flip
+        self.translate_ratio = translate
+
+    def __call__(self, meta_data, dst_shape):
+        raw_img = meta_data["img"]
+        height = raw_img.shape[0]  # shape(h,w,c)
+        width = raw_img.shape[1]
+
+        # center
+        C = np.eye(3)
+        C[0, 2] = -width / 2
+        C[1, 2] = -height / 2
+
+        P = get_perspective_matrix(self.perspective)
+        C = P @ C
+
+        Scl = get_scale_matrix(self.scale_ratio)
+        C = Scl @ C
+
+        Str = get_stretch_matrix(*self.stretch_ratio)
+        C = Str @ C
+
+        R = get_rotation_matrix(self.rotation_degree)
+        C = R @ C
+
+        Sh = get_shear_matrix(self.shear_degree)
+        C = Sh @ C
+
+        F = get_flip_matrix(self.flip_prob)
+        C = F @ C
+
+        T = get_translate_matrix(self.translate_ratio, width, height)
+        M = T @ C
+
+        if self.keep_ratio:
+            dst_shape = get_minimum_dst_shape(
+                (width, height), dst_shape, self.divisible
+            )
+
+        ResizeM = get_resize_matrix((width, height), dst_shape, self.keep_ratio)
+        M = ResizeM @ M
+        img = cv2.warpPerspective(raw_img, M, dsize=tuple(dst_shape))
+        meta_data["img"] = img
+        meta_data["warp_matrix"] = M
+        if "gt_bboxes" in meta_data:
+            boxes = meta_data["gt_bboxes"]
+            meta_data["gt_bboxes"] = warp_boxes(boxes, M, dst_shape[0], dst_shape[1])
+        if "gt_masks" in meta_data:
+            for i, mask in enumerate(meta_data["gt_masks"]):
+                meta_data["gt_masks"][i] = cv2.warpPerspective(
+                    mask, M, dsize=tuple(dst_shape)
+                )
+
+        return meta_data
diff --git a/nanodet/evaluator/__init__.py b/nanodet/evaluator/__init__.py
new file mode 100644
index 0000000..4285845
--- /dev/null
+++ b/nanodet/evaluator/__init__.py
@@ -0,0 +1,25 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import copy
+
+from .coco_detection import CocoDetectionEvaluator
+
+
+def build_evaluator(cfg, dataset):
+    evaluator_cfg = copy.deepcopy(cfg)
+    name = evaluator_cfg.pop("name")
+    if name == "CocoDetectionEvaluator":
+        return CocoDetectionEvaluator(dataset)
+    else:
+        raise NotImplementedError
diff --git a/nanodet/evaluator/coco_detection.py b/nanodet/evaluator/coco_detection.py
new file mode 100644
index 0000000..5b51d54
--- /dev/null
+++ b/nanodet/evaluator/coco_detection.py
@@ -0,0 +1,149 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import contextlib
+import copy
+import io
+import itertools
+import json
+import logging
+import os
+import warnings
+
+import numpy as np
+from pycocotools.cocoeval import COCOeval
+from tabulate import tabulate
+
+logger = logging.getLogger("NanoDet")
+
+
+def xyxy2xywh(bbox):
+    """
+    change bbox to coco format
+    :param bbox: [x1, y1, x2, y2]
+    :return: [x, y, w, h]
+    """
+    return [
+        bbox[0],
+        bbox[1],
+        bbox[2] - bbox[0],
+        bbox[3] - bbox[1],
+    ]
+
+
+class CocoDetectionEvaluator:
+    def __init__(self, dataset):
+        assert hasattr(dataset, "coco_api")
+        self.class_names = dataset.class_names
+        self.coco_api = dataset.coco_api
+        self.cat_ids = dataset.cat_ids
+        self.metric_names = ["mAP", "AP_50", "AP_75", "AP_small", "AP_m", "AP_l"]
+
+    def results2json(self, results):
+        """
+        results: {image_id: {label: [bboxes...] } }
+        :return coco json format: {image_id:
+                                   category_id:
+                                   bbox:
+                                   score: }
+        """
+        json_results = []
+        for image_id, dets in results.items():
+            for label, bboxes in dets.items():
+                category_id = self.cat_ids[label]
+                for bbox in bboxes:
+                    score = float(bbox[4])
+                    detection = dict(
+                        image_id=int(image_id),
+                        category_id=int(category_id),
+                        bbox=xyxy2xywh(bbox),
+                        score=score,
+                    )
+                    json_results.append(detection)
+        return json_results
+
+    def evaluate(self, results, save_dir, rank=-1):
+        results_json = self.results2json(results)
+        if len(results_json) == 0:
+            warnings.warn(
+                "Detection result is empty! Please check whether "
+                "training set is too small (need to increase val_interval "
+                "in config and train more epochs). Or check annotation "
+                "correctness."
+            )
+            empty_eval_results = {}
+            for key in self.metric_names:
+                empty_eval_results[key] = 0
+            return empty_eval_results
+        json_path = os.path.join(save_dir, "results{}.json".format(rank))
+        json.dump(results_json, open(json_path, "w"))
+        coco_dets = self.coco_api.loadRes(json_path)
+        coco_eval = COCOeval(
+            copy.deepcopy(self.coco_api), copy.deepcopy(coco_dets), "bbox"
+        )
+        coco_eval.evaluate()
+        coco_eval.accumulate()
+
+        # use logger to log coco eval results
+        redirect_string = io.StringIO()
+        with contextlib.redirect_stdout(redirect_string):
+            coco_eval.summarize()
+        logger.info("\n" + redirect_string.getvalue())
+
+        # print per class AP
+        headers = ["class", "AP50", "mAP"]
+        colums = 6
+        per_class_ap50s = []
+        per_class_maps = []
+        precisions = coco_eval.eval["precision"]
+        # dimension of precisions: [TxRxKxAxM]
+        # precision has dims (iou, recall, cls, area range, max dets)
+        assert len(self.class_names) == precisions.shape[2]
+
+        for idx, name in enumerate(self.class_names):
+            # area range index 0: all area ranges
+            # max dets index -1: typically 100 per image
+            precision_50 = precisions[0, :, idx, 0, -1]
+            precision_50 = precision_50[precision_50 > -1]
+            ap50 = np.mean(precision_50) if precision_50.size else float("nan")
+            per_class_ap50s.append(float(ap50 * 100))
+
+            precision = precisions[:, :, idx, 0, -1]
+            precision = precision[precision > -1]
+            ap = np.mean(precision) if precision.size else float("nan")
+            per_class_maps.append(float(ap * 100))
+
+        num_cols = min(colums, len(self.class_names) * len(headers))
+        flatten_results = []
+        for name, ap50, mAP in zip(self.class_names, per_class_ap50s, per_class_maps):
+            flatten_results += [name, ap50, mAP]
+
+        row_pair = itertools.zip_longest(
+            *[flatten_results[i::num_cols] for i in range(num_cols)]
+        )
+        table_headers = headers * (num_cols // len(headers))
+        table = tabulate(
+            row_pair,
+            tablefmt="pipe",
+            floatfmt=".1f",
+            headers=table_headers,
+            numalign="left",
+        )
+        logger.info("\n" + table)
+
+        aps = coco_eval.stats[:6]
+        eval_results = {}
+        for k, v in zip(self.metric_names, aps):
+            eval_results[k] = v
+        return eval_results
diff --git a/nanodet/model/arch/__init__.py b/nanodet/model/arch/__init__.py
new file mode 100644
index 0000000..c15509b
--- /dev/null
+++ b/nanodet/model/arch/__init__.py
@@ -0,0 +1,42 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import copy
+import warnings
+
+from .nanodet_plus import NanoDetPlus
+from .one_stage_detector import OneStageDetector
+
+
+def build_model(model_cfg):
+    model_cfg = copy.deepcopy(model_cfg)
+    name = model_cfg.arch.pop("name")
+    if name == "GFL":
+        warnings.warn(
+            "Model architecture name is changed to 'OneStageDetector'. "
+            "The name 'GFL' is deprecated, please change the model->arch->name "
+            "in your YAML config file to OneStageDetector."
+        )
+        model = OneStageDetector(
+            model_cfg.arch.backbone, model_cfg.arch.fpn, model_cfg.arch.head
+        )
+    elif name == "OneStageDetector":
+        model = OneStageDetector(
+            model_cfg.arch.backbone, model_cfg.arch.fpn, model_cfg.arch.head
+        )
+    elif name == "NanoDetPlus":
+        model = NanoDetPlus(**model_cfg.arch)
+    else:
+        raise NotImplementedError
+    return model
diff --git a/nanodet/model/arch/nanodet_plus.py b/nanodet/model/arch/nanodet_plus.py
new file mode 100644
index 0000000..0de099d
--- /dev/null
+++ b/nanodet/model/arch/nanodet_plus.py
@@ -0,0 +1,57 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import copy
+
+import torch
+
+from ..head import build_head
+from .one_stage_detector import OneStageDetector
+
+
+class NanoDetPlus(OneStageDetector):
+    def __init__(
+        self,
+        backbone,
+        fpn,
+        aux_head,
+        head,
+        detach_epoch=0,
+    ):
+        super(NanoDetPlus, self).__init__(
+            backbone_cfg=backbone, fpn_cfg=fpn, head_cfg=head
+        )
+        self.aux_fpn = copy.deepcopy(self.fpn)
+        self.aux_head = build_head(aux_head)
+        self.detach_epoch = detach_epoch
+
+    def forward_train(self, gt_meta):
+        img = gt_meta["img"]
+        feat = self.backbone(img)
+        fpn_feat = self.fpn(feat)
+        if self.epoch >= self.detach_epoch:
+            aux_fpn_feat = self.aux_fpn([f.detach() for f in feat])
+            dual_fpn_feat = (
+                torch.cat([f.detach(), aux_f], dim=1)
+                for f, aux_f in zip(fpn_feat, aux_fpn_feat)
+            )
+        else:
+            aux_fpn_feat = self.aux_fpn(feat)
+            dual_fpn_feat = (
+                torch.cat([f, aux_f], dim=1) for f, aux_f in zip(fpn_feat, aux_fpn_feat)
+            )
+        head_out = self.head(fpn_feat)
+        aux_head_out = self.aux_head(dual_fpn_feat)
+        loss, loss_states = self.head.loss(head_out, gt_meta, aux_preds=aux_head_out)
+        return head_out, loss, loss_states
diff --git a/nanodet/model/arch/one_stage_detector.py b/nanodet/model/arch/one_stage_detector.py
new file mode 100644
index 0000000..e791d9f
--- /dev/null
+++ b/nanodet/model/arch/one_stage_detector.py
@@ -0,0 +1,68 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import time
+
+import torch
+import torch.nn as nn
+
+from ..backbone import build_backbone
+from ..fpn import build_fpn
+from ..head import build_head
+
+
+class OneStageDetector(nn.Module):
+    def __init__(
+        self,
+        backbone_cfg,
+        fpn_cfg=None,
+        head_cfg=None,
+    ):
+        super(OneStageDetector, self).__init__()
+        self.backbone = build_backbone(backbone_cfg)
+        if fpn_cfg is not None:
+            self.fpn = build_fpn(fpn_cfg)
+        if head_cfg is not None:
+            self.head = build_head(head_cfg)
+        self.epoch = 0
+
+    def forward(self, x):
+        x = self.backbone(x)
+        if hasattr(self, "fpn"):
+            x = self.fpn(x)
+        if hasattr(self, "head"):
+            x = self.head(x)
+        return x
+
+    def inference(self, meta):
+        with torch.no_grad():
+            # torch.cuda.synchronize()
+            time1 = time.time()
+            preds = self(meta["img"])
+            # torch.cuda.synchronize()
+            time2 = time.time()
+            print("forward time: {:.3f}s".format((time2 - time1)), end=" | ")
+            results = self.head.post_process(preds, meta)
+            # torch.cuda.synchronize()
+            print("decode time: {:.3f}s".format((time.time() - time2)), end=" | ")
+        return results
+
+    def forward_train(self, gt_meta):
+        preds = self(gt_meta["img"])
+        loss, loss_states = self.head.loss(preds, gt_meta)
+
+        return preds, loss, loss_states
+
+    def set_epoch(self, epoch):
+        self.epoch = epoch
diff --git a/nanodet/model/backbone/__init__.py b/nanodet/model/backbone/__init__.py
new file mode 100644
index 0000000..e66cdff
--- /dev/null
+++ b/nanodet/model/backbone/__init__.py
@@ -0,0 +1,47 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import copy
+
+from .custom_csp import CustomCspNet
+from .efficientnet_lite import EfficientNetLite
+from .ghostnet import GhostNet
+from .mobilenetv2 import MobileNetV2
+from .repvgg import RepVGG
+from .resnet import ResNet
+from .shufflenetv2 import ShuffleNetV2
+from .timm_wrapper import TIMMWrapper
+
+
+def build_backbone(cfg):
+    backbone_cfg = copy.deepcopy(cfg)
+    name = backbone_cfg.pop("name")
+    if name == "ResNet":
+        return ResNet(**backbone_cfg)
+    elif name == "ShuffleNetV2":
+        return ShuffleNetV2(**backbone_cfg)
+    elif name == "GhostNet":
+        return GhostNet(**backbone_cfg)
+    elif name == "MobileNetV2":
+        return MobileNetV2(**backbone_cfg)
+    elif name == "EfficientNetLite":
+        return EfficientNetLite(**backbone_cfg)
+    elif name == "CustomCspNet":
+        return CustomCspNet(**backbone_cfg)
+    elif name == "RepVGG":
+        return RepVGG(**backbone_cfg)
+    elif name == "TIMMWrapper":
+        return TIMMWrapper(**backbone_cfg)
+    else:
+        raise NotImplementedError
diff --git a/nanodet/model/backbone/custom_csp.py b/nanodet/model/backbone/custom_csp.py
new file mode 100644
index 0000000..441d149
--- /dev/null
+++ b/nanodet/model/backbone/custom_csp.py
@@ -0,0 +1,168 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch
+import torch.nn as nn
+
+from ..module.conv import ConvModule
+
+
+class TinyResBlock(nn.Module):
+    def __init__(
+        self, in_channels, kernel_size, norm_cfg, activation, res_type="concat"
+    ):
+        super(TinyResBlock, self).__init__()
+        assert in_channels % 2 == 0
+        assert res_type in ["concat", "add"]
+        self.res_type = res_type
+        self.in_conv = ConvModule(
+            in_channels,
+            in_channels // 2,
+            kernel_size,
+            padding=(kernel_size - 1) // 2,
+            norm_cfg=norm_cfg,
+            activation=activation,
+        )
+        self.mid_conv = ConvModule(
+            in_channels // 2,
+            in_channels // 2,
+            kernel_size,
+            padding=(kernel_size - 1) // 2,
+            norm_cfg=norm_cfg,
+            activation=activation,
+        )
+        if res_type == "add":
+            self.out_conv = ConvModule(
+                in_channels // 2,
+                in_channels,
+                kernel_size,
+                padding=(kernel_size - 1) // 2,
+                norm_cfg=norm_cfg,
+                activation=activation,
+            )
+
+    def forward(self, x):
+        x = self.in_conv(x)
+        x1 = self.mid_conv(x)
+        if self.res_type == "add":
+            return self.out_conv(x + x1)
+        else:
+            return torch.cat((x1, x), dim=1)
+
+
+class CspBlock(nn.Module):
+    def __init__(
+        self,
+        in_channels,
+        num_res,
+        kernel_size=3,
+        stride=0,
+        norm_cfg=dict(type="BN", requires_grad=True),
+        activation="LeakyReLU",
+    ):
+        super(CspBlock, self).__init__()
+        assert in_channels % 2 == 0
+        self.in_conv = ConvModule(
+            in_channels,
+            in_channels,
+            kernel_size,
+            stride,
+            padding=(kernel_size - 1) // 2,
+            norm_cfg=norm_cfg,
+            activation=activation,
+        )
+        res_blocks = []
+        for i in range(num_res):
+            res_block = TinyResBlock(in_channels, kernel_size, norm_cfg, activation)
+            res_blocks.append(res_block)
+        self.res_blocks = nn.Sequential(*res_blocks)
+        self.res_out_conv = ConvModule(
+            in_channels,
+            in_channels,
+            kernel_size,
+            padding=(kernel_size - 1) // 2,
+            norm_cfg=norm_cfg,
+            activation=activation,
+        )
+
+    def forward(self, x):
+        x = self.in_conv(x)
+        x1 = self.res_blocks(x)
+        x1 = self.res_out_conv(x1)
+        out = torch.cat((x1, x), dim=1)
+        return out
+
+
+class CustomCspNet(nn.Module):
+    def __init__(
+        self,
+        net_cfg,
+        out_stages,
+        norm_cfg=dict(type="BN", requires_grad=True),
+        activation="LeakyReLU",
+    ):
+        super(CustomCspNet, self).__init__()
+        assert isinstance(net_cfg, list)
+        assert set(out_stages).issubset(i for i in range(len(net_cfg)))
+        self.out_stages = out_stages
+        self.activation = activation
+        self.stages = nn.ModuleList()
+        for stage_cfg in net_cfg:
+            if stage_cfg[0] == "Conv":
+                in_channels, out_channels, kernel_size, stride = stage_cfg[1:]
+                stage = ConvModule(
+                    in_channels,
+                    out_channels,
+                    kernel_size,
+                    stride,
+                    padding=(kernel_size - 1) // 2,
+                    norm_cfg=norm_cfg,
+                    activation=activation,
+                )
+            elif stage_cfg[0] == "CspBlock":
+                in_channels, num_res, kernel_size, stride = stage_cfg[1:]
+                stage = CspBlock(
+                    in_channels, num_res, kernel_size, stride, norm_cfg, activation
+                )
+            elif stage_cfg[0] == "MaxPool":
+                kernel_size, stride = stage_cfg[1:]
+                stage = nn.MaxPool2d(
+                    kernel_size, stride, padding=(kernel_size - 1) // 2
+                )
+            else:
+                raise ModuleNotFoundError
+            self.stages.append(stage)
+        self._init_weight()
+
+    def forward(self, x):
+        output = []
+        for i, stage in enumerate(self.stages):
+            x = stage(x)
+            if i in self.out_stages:
+                output.append(x)
+        return tuple(output)
+
+    def _init_weight(self):
+        for m in self.modules():
+            if self.activation == "LeakyReLU":
+                nonlinearity = "leaky_relu"
+            else:
+                nonlinearity = "relu"
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(
+                    m.weight, mode="fan_out", nonlinearity=nonlinearity
+                )
+            elif isinstance(m, nn.BatchNorm2d):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
diff --git a/nanodet/model/backbone/efficientnet_lite.py b/nanodet/model/backbone/efficientnet_lite.py
new file mode 100644
index 0000000..090ab7c
--- /dev/null
+++ b/nanodet/model/backbone/efficientnet_lite.py
@@ -0,0 +1,287 @@
+import math
+
+import torch
+import torch.functional as F
+import torch.utils.model_zoo as model_zoo
+from torch import nn
+
+from ..module.activation import act_layers
+
+efficientnet_lite_params = {
+    # width_coefficient, depth_coefficient, image_size, dropout_rate
+    "efficientnet_lite0": [1.0, 1.0, 224, 0.2],
+    "efficientnet_lite1": [1.0, 1.1, 240, 0.2],
+    "efficientnet_lite2": [1.1, 1.2, 260, 0.3],
+    "efficientnet_lite3": [1.2, 1.4, 280, 0.3],
+    "efficientnet_lite4": [1.4, 1.8, 300, 0.3],
+}
+
+model_urls = {
+    "efficientnet_lite0": "https://github.com/RangiLyu/EfficientNet-Lite/releases/download/v1.0/efficientnet_lite0.pth",  # noqa: E501
+    "efficientnet_lite1": "https://github.com/RangiLyu/EfficientNet-Lite/releases/download/v1.0/efficientnet_lite1.pth",  # noqa: E501
+    "efficientnet_lite2": "https://github.com/RangiLyu/EfficientNet-Lite/releases/download/v1.0/efficientnet_lite2.pth",  # noqa: E501
+    "efficientnet_lite3": "https://github.com/RangiLyu/EfficientNet-Lite/releases/download/v1.0/efficientnet_lite3.pth",  # noqa: E501
+    "efficientnet_lite4": "https://github.com/RangiLyu/EfficientNet-Lite/releases/download/v1.0/efficientnet_lite4.pth",  # noqa: E501
+}
+
+
+def round_filters(filters, multiplier, divisor=8, min_width=None):
+    """Calculate and round number of filters based on width multiplier."""
+    if not multiplier:
+        return filters
+    filters *= multiplier
+    min_width = min_width or divisor
+    new_filters = max(min_width, int(filters + divisor / 2) // divisor * divisor)
+    # Make sure that round down does not go down by more than 10%.
+    if new_filters < 0.9 * filters:
+        new_filters += divisor
+    return int(new_filters)
+
+
+def round_repeats(repeats, multiplier):
+    """Round number of filters based on depth multiplier."""
+    if not multiplier:
+        return repeats
+    return int(math.ceil(multiplier * repeats))
+
+
+def drop_connect(x, drop_connect_rate, training):
+    if not training:
+        return x
+    keep_prob = 1.0 - drop_connect_rate
+    batch_size = x.shape[0]
+    random_tensor = keep_prob
+    random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=x.dtype, device=x.device)
+    binary_mask = torch.floor(random_tensor)
+    x = (x / keep_prob) * binary_mask
+    return x
+
+
+class MBConvBlock(nn.Module):
+    def __init__(
+        self,
+        inp,
+        final_oup,
+        k,
+        s,
+        expand_ratio,
+        se_ratio,
+        has_se=False,
+        activation="ReLU6",
+    ):
+        super(MBConvBlock, self).__init__()
+
+        self._momentum = 0.01
+        self._epsilon = 1e-3
+        self.input_filters = inp
+        self.output_filters = final_oup
+        self.stride = s
+        self.expand_ratio = expand_ratio
+        self.has_se = has_se
+        self.id_skip = True  # skip connection and drop connect
+
+        # Expansion phase
+        oup = inp * expand_ratio  # number of output channels
+        if expand_ratio != 1:
+            self._expand_conv = nn.Conv2d(
+                in_channels=inp, out_channels=oup, kernel_size=1, bias=False
+            )
+            self._bn0 = nn.BatchNorm2d(
+                num_features=oup, momentum=self._momentum, eps=self._epsilon
+            )
+
+        # Depthwise convolution phase
+        self._depthwise_conv = nn.Conv2d(
+            in_channels=oup,
+            out_channels=oup,
+            groups=oup,  # groups makes it depthwise
+            kernel_size=k,
+            padding=(k - 1) // 2,
+            stride=s,
+            bias=False,
+        )
+        self._bn1 = nn.BatchNorm2d(
+            num_features=oup, momentum=self._momentum, eps=self._epsilon
+        )
+
+        # Squeeze and Excitation layer, if desired
+        if self.has_se:
+            num_squeezed_channels = max(1, int(inp * se_ratio))
+            self._se_reduce = nn.Conv2d(
+                in_channels=oup, out_channels=num_squeezed_channels, kernel_size=1
+            )
+            self._se_expand = nn.Conv2d(
+                in_channels=num_squeezed_channels, out_channels=oup, kernel_size=1
+            )
+
+        # Output phase
+        self._project_conv = nn.Conv2d(
+            in_channels=oup, out_channels=final_oup, kernel_size=1, bias=False
+        )
+        self._bn2 = nn.BatchNorm2d(
+            num_features=final_oup, momentum=self._momentum, eps=self._epsilon
+        )
+        self._relu = act_layers(activation)
+
+    def forward(self, x, drop_connect_rate=None):
+        """
+        :param x: input tensor
+        :param drop_connect_rate: drop connect rate (float, between 0 and 1)
+        :return: output of block
+        """
+
+        # Expansion and Depthwise Convolution
+        identity = x
+        if self.expand_ratio != 1:
+            x = self._relu(self._bn0(self._expand_conv(x)))
+        x = self._relu(self._bn1(self._depthwise_conv(x)))
+
+        # Squeeze and Excitation
+        if self.has_se:
+            x_squeezed = F.adaptive_avg_pool2d(x, 1)
+            x_squeezed = self._se_expand(self._relu(self._se_reduce(x_squeezed)))
+            x = torch.sigmoid(x_squeezed) * x
+
+        x = self._bn2(self._project_conv(x))
+
+        # Skip connection and drop connect
+        if (
+            self.id_skip
+            and self.stride == 1
+            and self.input_filters == self.output_filters
+        ):
+            if drop_connect_rate:
+                x = drop_connect(x, drop_connect_rate, training=self.training)
+            x += identity  # skip connection
+        return x
+
+
+class EfficientNetLite(nn.Module):
+    def __init__(
+        self, model_name, out_stages=(2, 4, 6), activation="ReLU6", pretrain=True
+    ):
+        super(EfficientNetLite, self).__init__()
+        assert set(out_stages).issubset(i for i in range(0, 7))
+        assert model_name in efficientnet_lite_params
+
+        self.model_name = model_name
+        # Batch norm parameters
+        momentum = 0.01
+        epsilon = 1e-3
+        width_multiplier, depth_multiplier, _, dropout_rate = efficientnet_lite_params[
+            model_name
+        ]
+        self.drop_connect_rate = 0.2
+        self.out_stages = out_stages
+
+        mb_block_settings = [
+            # repeat|kernel_size|stride|expand|input|output|se_ratio
+            [1, 3, 1, 1, 32, 16, 0.25],  # stage0
+            [2, 3, 2, 6, 16, 24, 0.25],  # stage1 - 1/4
+            [2, 5, 2, 6, 24, 40, 0.25],  # stage2 - 1/8
+            [3, 3, 2, 6, 40, 80, 0.25],  # stage3
+            [3, 5, 1, 6, 80, 112, 0.25],  # stage4 - 1/16
+            [4, 5, 2, 6, 112, 192, 0.25],  # stage5
+            [1, 3, 1, 6, 192, 320, 0.25],  # stage6 - 1/32
+        ]
+
+        # Stem
+        out_channels = 32
+        self.stem = nn.Sequential(
+            nn.Conv2d(3, out_channels, kernel_size=3, stride=2, padding=1, bias=False),
+            nn.BatchNorm2d(num_features=out_channels, momentum=momentum, eps=epsilon),
+            act_layers(activation),
+        )
+
+        # Build blocks
+        self.blocks = nn.ModuleList([])
+        for i, stage_setting in enumerate(mb_block_settings):
+            stage = nn.ModuleList([])
+            (
+                num_repeat,
+                kernal_size,
+                stride,
+                expand_ratio,
+                input_filters,
+                output_filters,
+                se_ratio,
+            ) = stage_setting
+            # Update block input and output filters based on width multiplier.
+            input_filters = (
+                input_filters
+                if i == 0
+                else round_filters(input_filters, width_multiplier)
+            )
+            output_filters = round_filters(output_filters, width_multiplier)
+            num_repeat = (
+                num_repeat
+                if i == 0 or i == len(mb_block_settings) - 1
+                else round_repeats(num_repeat, depth_multiplier)
+            )
+
+            # The first block needs to take care of stride and filter size increase.
+            stage.append(
+                MBConvBlock(
+                    input_filters,
+                    output_filters,
+                    kernal_size,
+                    stride,
+                    expand_ratio,
+                    se_ratio,
+                    has_se=False,
+                )
+            )
+            if num_repeat > 1:
+                input_filters = output_filters
+                stride = 1
+            for _ in range(num_repeat - 1):
+                stage.append(
+                    MBConvBlock(
+                        input_filters,
+                        output_filters,
+                        kernal_size,
+                        stride,
+                        expand_ratio,
+                        se_ratio,
+                        has_se=False,
+                    )
+                )
+
+            self.blocks.append(stage)
+        self._initialize_weights(pretrain)
+
+    def forward(self, x):
+        x = self.stem(x)
+        output = []
+        idx = 0
+        for j, stage in enumerate(self.blocks):
+            for block in stage:
+                drop_connect_rate = self.drop_connect_rate
+                if drop_connect_rate:
+                    drop_connect_rate *= float(idx) / len(self.blocks)
+                x = block(x, drop_connect_rate)
+                idx += 1
+            if j in self.out_stages:
+                output.append(x)
+        return output
+
+    def _initialize_weights(self, pretrain=True):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                m.weight.data.normal_(0, math.sqrt(2.0 / n))
+                if m.bias is not None:
+                    m.bias.data.zero_()
+            elif isinstance(m, nn.BatchNorm2d):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
+        if pretrain:
+            url = model_urls[self.model_name]
+            if url is not None:
+                pretrained_state_dict = model_zoo.load_url(url)
+                print("=> loading pretrained model {}".format(url))
+                self.load_state_dict(pretrained_state_dict, strict=False)
+
+    def load_pretrain(self, path):
+        state_dict = torch.load(path)
+        self.load_state_dict(state_dict, strict=True)
diff --git a/nanodet/model/backbone/ghostnet.py b/nanodet/model/backbone/ghostnet.py
new file mode 100644
index 0000000..06c7119
--- /dev/null
+++ b/nanodet/model/backbone/ghostnet.py
@@ -0,0 +1,348 @@
+"""
+2020.06.09-Changed for building GhostNet
+Huawei Technologies Co., Ltd. <foss@huawei.com>
+Creates a GhostNet Model as defined in:
+GhostNet: More Features from Cheap Operations By Kai Han, Yunhe Wang,
+Qi Tian, Jianyuan Guo, Chunjing Xu, Chang Xu.
+https://arxiv.org/abs/1911.11907
+Modified from https://github.com/d-li14/mobilenetv3.pytorch
+and https://github.com/rwightman/pytorch-image-models
+"""
+import logging
+import math
+import warnings
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from ..module.activation import act_layers
+
+
+def get_url(width_mult=1.0):
+    if width_mult == 1.0:
+        return "https://raw.githubusercontent.com/huawei-noah/CV-Backbones/master/ghostnet_pytorch/models/state_dict_73.98.pth"  # noqa E501
+    else:
+        logging.info("GhostNet only has 1.0 pretrain model. ")
+        return None
+
+
+def _make_divisible(v, divisor, min_value=None):
+    """
+    This function is taken from the original tf repo.
+    It ensures that all layers have a channel number that is divisible by 8
+    It can be seen here:
+    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
+    """
+    if min_value is None:
+        min_value = divisor
+    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+    # Make sure that round down does not go down by more than 10%.
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+
+
+def hard_sigmoid(x, inplace: bool = False):
+    if inplace:
+        return x.add_(3.0).clamp_(0.0, 6.0).div_(6.0)
+    else:
+        return F.relu6(x + 3.0) / 6.0
+
+
+class SqueezeExcite(nn.Module):
+    def __init__(
+        self,
+        in_chs,
+        se_ratio=0.25,
+        reduced_base_chs=None,
+        activation="ReLU",
+        gate_fn=hard_sigmoid,
+        divisor=4,
+        **_
+    ):
+        super(SqueezeExcite, self).__init__()
+        self.gate_fn = gate_fn
+        reduced_chs = _make_divisible((reduced_base_chs or in_chs) * se_ratio, divisor)
+        self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        self.conv_reduce = nn.Conv2d(in_chs, reduced_chs, 1, bias=True)
+        self.act1 = act_layers(activation)
+        self.conv_expand = nn.Conv2d(reduced_chs, in_chs, 1, bias=True)
+
+    def forward(self, x):
+        x_se = self.avg_pool(x)
+        x_se = self.conv_reduce(x_se)
+        x_se = self.act1(x_se)
+        x_se = self.conv_expand(x_se)
+        x = x * self.gate_fn(x_se)
+        return x
+
+
+class ConvBnAct(nn.Module):
+    def __init__(self, in_chs, out_chs, kernel_size, stride=1, activation="ReLU"):
+        super(ConvBnAct, self).__init__()
+        self.conv = nn.Conv2d(
+            in_chs, out_chs, kernel_size, stride, kernel_size // 2, bias=False
+        )
+        self.bn1 = nn.BatchNorm2d(out_chs)
+        self.act1 = act_layers(activation)
+
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn1(x)
+        x = self.act1(x)
+        return x
+
+
+class GhostModule(nn.Module):
+    def __init__(
+        self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, activation="ReLU"
+    ):
+        super(GhostModule, self).__init__()
+        self.oup = oup
+        init_channels = math.ceil(oup / ratio)
+        new_channels = init_channels * (ratio - 1)
+
+        self.primary_conv = nn.Sequential(
+            nn.Conv2d(
+                inp, init_channels, kernel_size, stride, kernel_size // 2, bias=False
+            ),
+            nn.BatchNorm2d(init_channels),
+            act_layers(activation) if activation else nn.Sequential(),
+        )
+
+        self.cheap_operation = nn.Sequential(
+            nn.Conv2d(
+                init_channels,
+                new_channels,
+                dw_size,
+                1,
+                dw_size // 2,
+                groups=init_channels,
+                bias=False,
+            ),
+            nn.BatchNorm2d(new_channels),
+            act_layers(activation) if activation else nn.Sequential(),
+        )
+
+    def forward(self, x):
+        x1 = self.primary_conv(x)
+        x2 = self.cheap_operation(x1)
+        out = torch.cat([x1, x2], dim=1)
+        return out
+
+
+class GhostBottleneck(nn.Module):
+    """Ghost bottleneck w/ optional SE"""
+
+    def __init__(
+        self,
+        in_chs,
+        mid_chs,
+        out_chs,
+        dw_kernel_size=3,
+        stride=1,
+        activation="ReLU",
+        se_ratio=0.0,
+    ):
+        super(GhostBottleneck, self).__init__()
+        has_se = se_ratio is not None and se_ratio > 0.0
+        self.stride = stride
+
+        # Point-wise expansion
+        self.ghost1 = GhostModule(in_chs, mid_chs, activation=activation)
+
+        # Depth-wise convolution
+        if self.stride > 1:
+            self.conv_dw = nn.Conv2d(
+                mid_chs,
+                mid_chs,
+                dw_kernel_size,
+                stride=stride,
+                padding=(dw_kernel_size - 1) // 2,
+                groups=mid_chs,
+                bias=False,
+            )
+            self.bn_dw = nn.BatchNorm2d(mid_chs)
+
+        # Squeeze-and-excitation
+        if has_se:
+            self.se = SqueezeExcite(mid_chs, se_ratio=se_ratio)
+        else:
+            self.se = None
+
+        # Point-wise linear projection
+        self.ghost2 = GhostModule(mid_chs, out_chs, activation=None)
+
+        # shortcut
+        if in_chs == out_chs and self.stride == 1:
+            self.shortcut = nn.Sequential()
+        else:
+            self.shortcut = nn.Sequential(
+                nn.Conv2d(
+                    in_chs,
+                    in_chs,
+                    dw_kernel_size,
+                    stride=stride,
+                    padding=(dw_kernel_size - 1) // 2,
+                    groups=in_chs,
+                    bias=False,
+                ),
+                nn.BatchNorm2d(in_chs),
+                nn.Conv2d(in_chs, out_chs, 1, stride=1, padding=0, bias=False),
+                nn.BatchNorm2d(out_chs),
+            )
+
+    def forward(self, x):
+        residual = x
+
+        # 1st ghost bottleneck
+        x = self.ghost1(x)
+
+        # Depth-wise convolution
+        if self.stride > 1:
+            x = self.conv_dw(x)
+            x = self.bn_dw(x)
+
+        # Squeeze-and-excitation
+        if self.se is not None:
+            x = self.se(x)
+
+        # 2nd ghost bottleneck
+        x = self.ghost2(x)
+
+        x += self.shortcut(residual)
+        return x
+
+
+class GhostNet(nn.Module):
+    def __init__(
+        self,
+        width_mult=1.0,
+        out_stages=(4, 6, 9),
+        activation="ReLU",
+        pretrain=True,
+        act=None,
+    ):
+        super(GhostNet, self).__init__()
+        assert set(out_stages).issubset(i for i in range(10))
+        self.width_mult = width_mult
+        self.out_stages = out_stages
+        # setting of inverted residual blocks
+        self.cfgs = [
+            # k, t,   c,  SE, s
+            # stage1
+            [[3, 16, 16, 0, 1]],  # 0
+            # stage2
+            [[3, 48, 24, 0, 2]],  # 1
+            [[3, 72, 24, 0, 1]],  # 2  1/4
+            # stage3
+            [[5, 72, 40, 0.25, 2]],  # 3
+            [[5, 120, 40, 0.25, 1]],  # 4  1/8
+            # stage4
+            [[3, 240, 80, 0, 2]],  # 5
+            [
+                [3, 200, 80, 0, 1],
+                [3, 184, 80, 0, 1],
+                [3, 184, 80, 0, 1],
+                [3, 480, 112, 0.25, 1],
+                [3, 672, 112, 0.25, 1],
+            ],  # 6  1/16
+            # stage5
+            [[5, 672, 160, 0.25, 2]],  # 7
+            [
+                [5, 960, 160, 0, 1],
+                [5, 960, 160, 0.25, 1],
+                [5, 960, 160, 0, 1],
+                [5, 960, 160, 0.25, 1],
+            ],  # 8
+        ]
+        #  ------conv+bn+act----------# 9  1/32
+
+        self.activation = activation
+        if act is not None:
+            warnings.warn(
+                "Warning! act argument has been deprecated, " "use activation instead!"
+            )
+            self.activation = act
+
+        # building first layer
+        output_channel = _make_divisible(16 * width_mult, 4)
+        self.conv_stem = nn.Conv2d(3, output_channel, 3, 2, 1, bias=False)
+        self.bn1 = nn.BatchNorm2d(output_channel)
+        self.act1 = act_layers(self.activation)
+        input_channel = output_channel
+
+        # building inverted residual blocks
+        stages = []
+        block = GhostBottleneck
+        for cfg in self.cfgs:
+            layers = []
+            for k, exp_size, c, se_ratio, s in cfg:
+                output_channel = _make_divisible(c * width_mult, 4)
+                hidden_channel = _make_divisible(exp_size * width_mult, 4)
+                layers.append(
+                    block(
+                        input_channel,
+                        hidden_channel,
+                        output_channel,
+                        k,
+                        s,
+                        activation=self.activation,
+                        se_ratio=se_ratio,
+                    )
+                )
+                input_channel = output_channel
+            stages.append(nn.Sequential(*layers))
+
+        output_channel = _make_divisible(exp_size * width_mult, 4)
+        stages.append(
+            nn.Sequential(
+                ConvBnAct(input_channel, output_channel, 1, activation=self.activation)
+            )
+        )  # 9
+
+        self.blocks = nn.Sequential(*stages)
+
+        self._initialize_weights(pretrain)
+
+    def forward(self, x):
+        x = self.conv_stem(x)
+        x = self.bn1(x)
+        x = self.act1(x)
+        output = []
+        for i in range(10):
+            x = self.blocks[i](x)
+            if i in self.out_stages:
+                output.append(x)
+        return tuple(output)
+
+    def _initialize_weights(self, pretrain=True):
+        print("init weights...")
+        for name, m in self.named_modules():
+            if isinstance(m, nn.Conv2d):
+                if "conv_stem" in name:
+                    nn.init.normal_(m.weight, 0, 0.01)
+                else:
+                    nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1])
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+            elif isinstance(m, nn.BatchNorm2d):
+                nn.init.constant_(m.weight, 1)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0.0001)
+                nn.init.constant_(m.running_mean, 0)
+            elif isinstance(m, nn.BatchNorm1d):
+                nn.init.constant_(m.weight, 1)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0.0001)
+                nn.init.constant_(m.running_mean, 0)
+            elif isinstance(m, nn.Linear):
+                nn.init.normal_(m.weight, 0, 0.01)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+        if pretrain:
+            url = get_url(self.width_mult)
+            if url is not None:
+                state_dict = torch.hub.load_state_dict_from_url(url, progress=True)
+                self.load_state_dict(state_dict, strict=False)
diff --git a/nanodet/model/backbone/mobilenetv2.py b/nanodet/model/backbone/mobilenetv2.py
new file mode 100644
index 0000000..11d7978
--- /dev/null
+++ b/nanodet/model/backbone/mobilenetv2.py
@@ -0,0 +1,176 @@
+from __future__ import absolute_import, division, print_function
+
+import warnings
+
+import torch.nn as nn
+
+from ..module.activation import act_layers
+
+
+class ConvBNReLU(nn.Sequential):
+    def __init__(
+        self,
+        in_planes,
+        out_planes,
+        kernel_size=3,
+        stride=1,
+        groups=1,
+        activation="ReLU",
+    ):
+        padding = (kernel_size - 1) // 2
+        super(ConvBNReLU, self).__init__(
+            nn.Conv2d(
+                in_planes,
+                out_planes,
+                kernel_size,
+                stride,
+                padding,
+                groups=groups,
+                bias=False,
+            ),
+            nn.BatchNorm2d(out_planes),
+            act_layers(activation),
+        )
+
+
+class InvertedResidual(nn.Module):
+    def __init__(self, inp, oup, stride, expand_ratio, activation="ReLU"):
+        super(InvertedResidual, self).__init__()
+        self.stride = stride
+        assert stride in [1, 2]
+
+        hidden_dim = int(round(inp * expand_ratio))
+        self.use_res_connect = self.stride == 1 and inp == oup
+
+        layers = []
+        if expand_ratio != 1:
+            # pw
+            layers.append(
+                ConvBNReLU(inp, hidden_dim, kernel_size=1, activation=activation)
+            )
+        layers.extend(
+            [
+                # dw
+                ConvBNReLU(
+                    hidden_dim,
+                    hidden_dim,
+                    stride=stride,
+                    groups=hidden_dim,
+                    activation=activation,
+                ),
+                # pw-linear
+                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
+                nn.BatchNorm2d(oup),
+            ]
+        )
+        self.conv = nn.Sequential(*layers)
+
+    def forward(self, x):
+        if self.use_res_connect:
+            return x + self.conv(x)
+        else:
+            return self.conv(x)
+
+
+class MobileNetV2(nn.Module):
+    def __init__(
+        self,
+        width_mult=1.0,
+        out_stages=(1, 2, 4, 6),
+        last_channel=1280,
+        activation="ReLU",
+        act=None,
+    ):
+        super(MobileNetV2, self).__init__()
+        # TODO: support load torchvison pretrained weight
+        assert set(out_stages).issubset(i for i in range(7))
+        self.width_mult = width_mult
+        self.out_stages = out_stages
+        input_channel = 32
+        self.last_channel = last_channel
+        self.activation = activation
+        if act is not None:
+            warnings.warn(
+                "Warning! act argument has been deprecated, " "use activation instead!"
+            )
+            self.activation = act
+        self.interverted_residual_setting = [
+            # t, c, n, s
+            [1, 16, 1, 1],
+            [6, 24, 2, 2],
+            [6, 32, 3, 2],
+            [6, 64, 4, 2],
+            [6, 96, 3, 1],
+            [6, 160, 3, 2],
+            [6, 320, 1, 1],
+        ]
+
+        # building first layer
+        self.input_channel = int(input_channel * width_mult)
+        self.first_layer = ConvBNReLU(
+            3, self.input_channel, stride=2, activation=self.activation
+        )
+        # building inverted residual blocks
+        for i in range(7):
+            name = "stage{}".format(i)
+            setattr(self, name, self.build_mobilenet_stage(stage_num=i))
+
+        self._initialize_weights()
+
+    def build_mobilenet_stage(self, stage_num):
+        stage = []
+        t, c, n, s = self.interverted_residual_setting[stage_num]
+        output_channel = int(c * self.width_mult)
+        for i in range(n):
+            if i == 0:
+                stage.append(
+                    InvertedResidual(
+                        self.input_channel,
+                        output_channel,
+                        s,
+                        expand_ratio=t,
+                        activation=self.activation,
+                    )
+                )
+            else:
+                stage.append(
+                    InvertedResidual(
+                        self.input_channel,
+                        output_channel,
+                        1,
+                        expand_ratio=t,
+                        activation=self.activation,
+                    )
+                )
+            self.input_channel = output_channel
+        if stage_num == 6:
+            last_layer = ConvBNReLU(
+                self.input_channel,
+                self.last_channel,
+                kernel_size=1,
+                activation=self.activation,
+            )
+            stage.append(last_layer)
+        stage = nn.Sequential(*stage)
+        return stage
+
+    def forward(self, x):
+        x = self.first_layer(x)
+        output = []
+        for i in range(0, 7):
+            stage = getattr(self, "stage{}".format(i))
+            x = stage(x)
+            if i in self.out_stages:
+                output.append(x)
+
+        return tuple(output)
+
+    def _initialize_weights(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.normal_(m.weight, std=0.001)
+                if m.bias is not None:
+                    m.bias.data.zero_()
+            elif isinstance(m, nn.BatchNorm2d):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
diff --git a/nanodet/model/backbone/repvgg.py b/nanodet/model/backbone/repvgg.py
new file mode 100644
index 0000000..8ae9634
--- /dev/null
+++ b/nanodet/model/backbone/repvgg.py
@@ -0,0 +1,234 @@
+"""
+@article{ding2101repvgg,
+  title={RepVGG: Making VGG-style ConvNets Great Again},
+  author={Ding, Xiaohan and Zhang, Xiangyu and Ma, Ningning and Han,
+          Jungong and Ding, Guiguang and Sun, Jian},
+  journal={arXiv preprint arXiv:2101.03697}}
+RepVGG Backbone from paper RepVGG: Making VGG-style ConvNets Great Again
+Code from https://github.com/DingXiaoH/RepVGG
+"""
+
+import numpy as np
+import torch
+import torch.nn as nn
+
+from nanodet.model.module.conv import RepVGGConvModule
+
+optional_groupwise_layers = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26]
+g2_map = {layer: 2 for layer in optional_groupwise_layers}
+g4_map = {layer: 4 for layer in optional_groupwise_layers}
+
+model_param = {
+    "RepVGG-A0": dict(
+        num_blocks=[2, 4, 14, 1],
+        width_multiplier=[0.75, 0.75, 0.75, 2.5],
+        override_groups_map=None,
+    ),
+    "RepVGG-A1": dict(
+        num_blocks=[2, 4, 14, 1],
+        width_multiplier=[1, 1, 1, 2.5],
+        override_groups_map=None,
+    ),
+    "RepVGG-A2": dict(
+        num_blocks=[2, 4, 14, 1],
+        width_multiplier=[1.5, 1.5, 1.5, 2.75],
+        override_groups_map=None,
+    ),
+    "RepVGG-B0": dict(
+        num_blocks=[4, 6, 16, 1],
+        width_multiplier=[1, 1, 1, 2.5],
+        override_groups_map=None,
+    ),
+    "RepVGG-B1": dict(
+        num_blocks=[4, 6, 16, 1],
+        width_multiplier=[2, 2, 2, 4],
+        override_groups_map=None,
+    ),
+    "RepVGG-B1g2": dict(
+        num_blocks=[4, 6, 16, 1],
+        width_multiplier=[2, 2, 2, 4],
+        override_groups_map=g2_map,
+    ),
+    "RepVGG-B1g4": dict(
+        num_blocks=[4, 6, 16, 1],
+        width_multiplier=[2, 2, 2, 4],
+        override_groups_map=g4_map,
+    ),
+    "RepVGG-B2": dict(
+        num_blocks=[4, 6, 16, 1],
+        width_multiplier=[2.5, 2.5, 2.5, 5],
+        override_groups_map=None,
+    ),
+    "RepVGG-B2g2": dict(
+        num_blocks=[4, 6, 16, 1],
+        width_multiplier=[2.5, 2.5, 2.5, 5],
+        override_groups_map=g2_map,
+    ),
+    "RepVGG-B2g4": dict(
+        num_blocks=[4, 6, 16, 1],
+        width_multiplier=[2.5, 2.5, 2.5, 5],
+        override_groups_map=g4_map,
+    ),
+    "RepVGG-B3": dict(
+        num_blocks=[4, 6, 16, 1],
+        width_multiplier=[3, 3, 3, 5],
+        override_groups_map=None,
+    ),
+    "RepVGG-B3g2": dict(
+        num_blocks=[4, 6, 16, 1],
+        width_multiplier=[3, 3, 3, 5],
+        override_groups_map=g2_map,
+    ),
+    "RepVGG-B3g4": dict(
+        num_blocks=[4, 6, 16, 1],
+        width_multiplier=[3, 3, 3, 5],
+        override_groups_map=g4_map,
+    ),
+}
+
+
+def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1):
+    result = nn.Sequential()
+    result.add_module(
+        "conv",
+        nn.Conv2d(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            groups=groups,
+            bias=False,
+        ),
+    )
+    result.add_module("bn", nn.BatchNorm2d(num_features=out_channels))
+    return result
+
+
+class RepVGG(nn.Module):
+    def __init__(
+        self,
+        arch,
+        out_stages=(1, 2, 3, 4),
+        activation="ReLU",
+        deploy=False,
+        last_channel=None,
+    ):
+        super(RepVGG, self).__init__()
+        # TODO: Update code to Xiaohan's repo
+        model_name = "RepVGG-" + arch
+        assert model_name in model_param
+        assert set(out_stages).issubset((1, 2, 3, 4))
+        num_blocks = model_param[model_name]["num_blocks"]
+        width_multiplier = model_param[model_name]["width_multiplier"]
+        assert len(width_multiplier) == 4
+        self.out_stages = out_stages
+        self.activation = activation
+        self.deploy = deploy
+        self.override_groups_map = (
+            model_param[model_name]["override_groups_map"] or dict()
+        )
+
+        assert 0 not in self.override_groups_map
+
+        self.in_planes = min(64, int(64 * width_multiplier[0]))
+
+        self.stage0 = RepVGGConvModule(
+            in_channels=3,
+            out_channels=self.in_planes,
+            kernel_size=3,
+            stride=2,
+            padding=1,
+            activation=activation,
+            deploy=self.deploy,
+        )
+        self.cur_layer_idx = 1
+        self.stage1 = self._make_stage(
+            int(64 * width_multiplier[0]), num_blocks[0], stride=2
+        )
+        self.stage2 = self._make_stage(
+            int(128 * width_multiplier[1]), num_blocks[1], stride=2
+        )
+        self.stage3 = self._make_stage(
+            int(256 * width_multiplier[2]), num_blocks[2], stride=2
+        )
+        out_planes = last_channel if last_channel else int(512 * width_multiplier[3])
+        self.stage4 = self._make_stage(out_planes, num_blocks[3], stride=2)
+
+    def _make_stage(self, planes, num_blocks, stride):
+        strides = [stride] + [1] * (num_blocks - 1)
+        blocks = []
+        for stride in strides:
+            cur_groups = self.override_groups_map.get(self.cur_layer_idx, 1)
+            blocks.append(
+                RepVGGConvModule(
+                    in_channels=self.in_planes,
+                    out_channels=planes,
+                    kernel_size=3,
+                    stride=stride,
+                    padding=1,
+                    groups=cur_groups,
+                    activation=self.activation,
+                    deploy=self.deploy,
+                )
+            )
+            self.in_planes = planes
+            self.cur_layer_idx += 1
+        return nn.Sequential(*blocks)
+
+    def forward(self, x):
+        x = self.stage0(x)
+        output = []
+        for i in range(1, 5):
+            stage = getattr(self, "stage{}".format(i))
+            x = stage(x)
+            if i in self.out_stages:
+                output.append(x)
+        return tuple(output)
+
+
+def repvgg_model_convert(model, deploy_model, save_path=None):
+    """
+    Examples:
+        >>> train_model = RepVGG(arch='A0', deploy=False)
+        >>> deploy_model = RepVGG(arch='A0', deploy=True)
+        >>> deploy_model = repvgg_model_convert(
+        >>>     train_model, deploy_model, save_path='repvgg_deploy.pth')
+    """
+    converted_weights = {}
+    for name, module in model.named_modules():
+        if hasattr(module, "repvgg_convert"):
+            kernel, bias = module.repvgg_convert()
+            converted_weights[name + ".rbr_reparam.weight"] = kernel
+            converted_weights[name + ".rbr_reparam.bias"] = bias
+        elif isinstance(module, torch.nn.Linear):
+            converted_weights[name + ".weight"] = module.weight.detach().cpu().numpy()
+            converted_weights[name + ".bias"] = module.bias.detach().cpu().numpy()
+    del model
+
+    for name, param in deploy_model.named_parameters():
+        print("deploy param: ", name, param.size(), np.mean(converted_weights[name]))
+        param.data = torch.from_numpy(converted_weights[name]).float()
+
+    if save_path is not None:
+        torch.save(deploy_model.state_dict(), save_path)
+
+    return deploy_model
+
+
+def repvgg_det_model_convert(model, deploy_model):
+    converted_weights = {}
+    deploy_model.load_state_dict(model.state_dict(), strict=False)
+    for name, module in model.backbone.named_modules():
+        if hasattr(module, "repvgg_convert"):
+            kernel, bias = module.repvgg_convert()
+            converted_weights[name + ".rbr_reparam.weight"] = kernel
+            converted_weights[name + ".rbr_reparam.bias"] = bias
+        elif isinstance(module, torch.nn.Linear):
+            converted_weights[name + ".weight"] = module.weight.detach().cpu().numpy()
+            converted_weights[name + ".bias"] = module.bias.detach().cpu().numpy()
+    del model
+    for name, param in deploy_model.backbone.named_parameters():
+        print("deploy param: ", name, param.size(), np.mean(converted_weights[name]))
+        param.data = torch.from_numpy(converted_weights[name]).float()
+    return deploy_model
diff --git a/nanodet/model/backbone/resnet.py b/nanodet/model/backbone/resnet.py
new file mode 100644
index 0000000..0a863c9
--- /dev/null
+++ b/nanodet/model/backbone/resnet.py
@@ -0,0 +1,196 @@
+from __future__ import absolute_import, division, print_function
+
+import torch.nn as nn
+import torch.utils.model_zoo as model_zoo
+
+from ..module.activation import act_layers
+
+model_urls = {
+    "resnet18": "https://download.pytorch.org/models/resnet18-5c106cde.pth",
+    "resnet34": "https://download.pytorch.org/models/resnet34-333f7ec4.pth",
+    "resnet50": "https://download.pytorch.org/models/resnet50-19c8e357.pth",
+    "resnet101": "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth",
+    "resnet152": "https://download.pytorch.org/models/resnet152-b121ed2d.pth",
+}
+
+
+def conv3x3(in_planes, out_planes, stride=1):
+    """3x3 convolution with padding"""
+    return nn.Conv2d(
+        in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False
+    )
+
+
+class BasicBlock(nn.Module):
+    expansion = 1
+
+    def __init__(self, inplanes, planes, stride=1, downsample=None, activation="ReLU"):
+        super(BasicBlock, self).__init__()
+        self.conv1 = conv3x3(inplanes, planes, stride)
+        self.bn1 = nn.BatchNorm2d(planes)
+        self.act = act_layers(activation)
+        self.conv2 = conv3x3(planes, planes)
+        self.bn2 = nn.BatchNorm2d(planes)
+        self.downsample = downsample
+        self.stride = stride
+
+    def forward(self, x):
+        residual = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.act(out)
+
+        out = self.conv2(out)
+        out = self.bn2(out)
+
+        if self.downsample is not None:
+            residual = self.downsample(x)
+
+        out += residual
+        out = self.act(out)
+
+        return out
+
+
+class Bottleneck(nn.Module):
+    expansion = 4
+
+    def __init__(self, inplanes, planes, stride=1, downsample=None, activation="ReLU"):
+        super(Bottleneck, self).__init__()
+        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
+        self.bn1 = nn.BatchNorm2d(planes)
+        self.conv2 = nn.Conv2d(
+            planes, planes, kernel_size=3, stride=stride, padding=1, bias=False
+        )
+        self.bn2 = nn.BatchNorm2d(planes)
+        self.conv3 = nn.Conv2d(
+            planes, planes * self.expansion, kernel_size=1, bias=False
+        )
+        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
+        self.act = act_layers(activation)
+        self.downsample = downsample
+        self.stride = stride
+
+    def forward(self, x):
+        residual = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.act(out)
+
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.act(out)
+
+        out = self.conv3(out)
+        out = self.bn3(out)
+
+        if self.downsample is not None:
+            residual = self.downsample(x)
+
+        out += residual
+        out = self.act(out)
+
+        return out
+
+
+def fill_fc_weights(layers):
+    for m in layers.modules():
+        if isinstance(m, nn.Conv2d):
+            nn.init.normal_(m.weight, std=0.001)
+            # torch.nn.init.kaiming_normal_(m.weight.data, nonlinearity='relu')
+            # torch.nn.init.xavier_normal_(m.weight.data)
+            if m.bias is not None:
+                nn.init.constant_(m.bias, 0)
+
+
+class ResNet(nn.Module):
+    resnet_spec = {
+        18: (BasicBlock, [2, 2, 2, 2]),
+        34: (BasicBlock, [3, 4, 6, 3]),
+        50: (Bottleneck, [3, 4, 6, 3]),
+        101: (Bottleneck, [3, 4, 23, 3]),
+        152: (Bottleneck, [3, 8, 36, 3]),
+    }
+
+    def __init__(
+        self, depth, out_stages=(1, 2, 3, 4), activation="ReLU", pretrain=True
+    ):
+        super(ResNet, self).__init__()
+        if depth not in self.resnet_spec:
+            raise KeyError("invalid resnet depth {}".format(depth))
+        assert set(out_stages).issubset((1, 2, 3, 4))
+        self.activation = activation
+        block, layers = self.resnet_spec[depth]
+        self.depth = depth
+        self.inplanes = 64
+        self.out_stages = out_stages
+
+        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
+        self.bn1 = nn.BatchNorm2d(64)
+        self.act = act_layers(self.activation)
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.layer1 = self._make_layer(block, 64, layers[0])
+        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
+        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
+        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
+        self.init_weights(pretrain=pretrain)
+
+    def _make_layer(self, block, planes, blocks, stride=1):
+        downsample = None
+        if stride != 1 or self.inplanes != planes * block.expansion:
+            downsample = nn.Sequential(
+                nn.Conv2d(
+                    self.inplanes,
+                    planes * block.expansion,
+                    kernel_size=1,
+                    stride=stride,
+                    bias=False,
+                ),
+                nn.BatchNorm2d(planes * block.expansion),
+            )
+
+        layers = []
+        layers.append(
+            block(self.inplanes, planes, stride, downsample, activation=self.activation)
+        )
+        self.inplanes = planes * block.expansion
+        for i in range(1, blocks):
+            layers.append(block(self.inplanes, planes, activation=self.activation))
+
+        return nn.Sequential(*layers)
+
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.act(x)
+        x = self.maxpool(x)
+        output = []
+        for i in range(1, 5):
+            res_layer = getattr(self, "layer{}".format(i))
+            x = res_layer(x)
+            if i in self.out_stages:
+                output.append(x)
+
+        return tuple(output)
+
+    def init_weights(self, pretrain=True):
+        if pretrain:
+            url = model_urls["resnet{}".format(self.depth)]
+            pretrained_state_dict = model_zoo.load_url(url)
+            print("=> loading pretrained model {}".format(url))
+            self.load_state_dict(pretrained_state_dict, strict=False)
+        else:
+            for m in self.modules():
+                if self.activation == "LeakyReLU":
+                    nonlinearity = "leaky_relu"
+                else:
+                    nonlinearity = "relu"
+                if isinstance(m, nn.Conv2d):
+                    nn.init.kaiming_normal_(
+                        m.weight, mode="fan_out", nonlinearity=nonlinearity
+                    )
+                elif isinstance(m, nn.BatchNorm2d):
+                    m.weight.data.fill_(1)
+                    m.bias.data.zero_()
diff --git a/nanodet/model/backbone/shufflenetv2.py b/nanodet/model/backbone/shufflenetv2.py
new file mode 100644
index 0000000..e821f41
--- /dev/null
+++ b/nanodet/model/backbone/shufflenetv2.py
@@ -0,0 +1,207 @@
+import torch
+import torch.nn as nn
+import torch.utils.model_zoo as model_zoo
+
+from ..module.activation import act_layers
+
+model_urls = {
+    "shufflenetv2_0.5x": "https://download.pytorch.org/models/shufflenetv2_x0.5-f707e7126e.pth",  # noqa: E501
+    "shufflenetv2_1.0x": "https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth",  # noqa: E501
+    "shufflenetv2_1.5x": None,
+    "shufflenetv2_2.0x": None,
+}
+
+
+def channel_shuffle(x, groups):
+    # type: (torch.Tensor, int) -> torch.Tensor
+    batchsize, num_channels, height, width = x.data.size()
+    channels_per_group = num_channels // groups
+
+    # reshape
+    x = x.view(batchsize, groups, channels_per_group, height, width)
+
+    x = torch.transpose(x, 1, 2).contiguous()
+
+    # flatten
+    x = x.view(batchsize, -1, height, width)
+
+    return x
+
+
+class ShuffleV2Block(nn.Module):
+    def __init__(self, inp, oup, stride, activation="ReLU"):
+        super(ShuffleV2Block, self).__init__()
+
+        if not (1 <= stride <= 3):
+            raise ValueError("illegal stride value")
+        self.stride = stride
+
+        branch_features = oup // 2
+        assert (self.stride != 1) or (inp == branch_features << 1)
+
+        if self.stride > 1:
+            self.branch1 = nn.Sequential(
+                self.depthwise_conv(
+                    inp, inp, kernel_size=3, stride=self.stride, padding=1
+                ),
+                nn.BatchNorm2d(inp),
+                nn.Conv2d(
+                    inp, branch_features, kernel_size=1, stride=1, padding=0, bias=False
+                ),
+                nn.BatchNorm2d(branch_features),
+                act_layers(activation),
+            )
+        else:
+            self.branch1 = nn.Sequential()
+
+        self.branch2 = nn.Sequential(
+            nn.Conv2d(
+                inp if (self.stride > 1) else branch_features,
+                branch_features,
+                kernel_size=1,
+                stride=1,
+                padding=0,
+                bias=False,
+            ),
+            nn.BatchNorm2d(branch_features),
+            act_layers(activation),
+            self.depthwise_conv(
+                branch_features,
+                branch_features,
+                kernel_size=3,
+                stride=self.stride,
+                padding=1,
+            ),
+            nn.BatchNorm2d(branch_features),
+            nn.Conv2d(
+                branch_features,
+                branch_features,
+                kernel_size=1,
+                stride=1,
+                padding=0,
+                bias=False,
+            ),
+            nn.BatchNorm2d(branch_features),
+            act_layers(activation),
+        )
+
+    @staticmethod
+    def depthwise_conv(i, o, kernel_size, stride=1, padding=0, bias=False):
+        return nn.Conv2d(i, o, kernel_size, stride, padding, bias=bias, groups=i)
+
+    def forward(self, x):
+        if self.stride == 1:
+            x1, x2 = x.chunk(2, dim=1)
+            out = torch.cat((x1, self.branch2(x2)), dim=1)
+        else:
+            out = torch.cat((self.branch1(x), self.branch2(x)), dim=1)
+
+        out = channel_shuffle(out, 2)
+
+        return out
+
+
+class ShuffleNetV2(nn.Module):
+    def __init__(
+        self,
+        model_size="1.5x",
+        out_stages=(2, 3, 4),
+        with_last_conv=False,
+        kernal_size=3,
+        activation="ReLU",
+        pretrain=True,
+    ):
+        super(ShuffleNetV2, self).__init__()
+        # out_stages can only be a subset of (2, 3, 4)
+        assert set(out_stages).issubset((2, 3, 4))
+
+        print("model size is ", model_size)
+
+        self.stage_repeats = [4, 8, 4]
+        self.model_size = model_size
+        self.out_stages = out_stages
+        self.with_last_conv = with_last_conv
+        self.kernal_size = kernal_size
+        self.activation = activation
+        if model_size == "0.5x":
+            self._stage_out_channels = [24, 48, 96, 192, 1024]
+        elif model_size == "1.0x":
+            self._stage_out_channels = [24, 116, 232, 464, 1024]
+        elif model_size == "1.5x":
+            self._stage_out_channels = [24, 176, 352, 704, 1024]
+        elif model_size == "2.0x":
+            self._stage_out_channels = [24, 244, 488, 976, 2048]
+        else:
+            raise NotImplementedError
+
+        # building first layer
+        input_channels = 3
+        output_channels = self._stage_out_channels[0]
+        self.conv1 = nn.Sequential(
+            nn.Conv2d(input_channels, output_channels, 3, 2, 1, bias=False),
+            nn.BatchNorm2d(output_channels),
+            act_layers(activation),
+        )
+        input_channels = output_channels
+
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+
+        stage_names = ["stage{}".format(i) for i in [2, 3, 4]]
+        for name, repeats, output_channels in zip(
+            stage_names, self.stage_repeats, self._stage_out_channels[1:]
+        ):
+            seq = [
+                ShuffleV2Block(
+                    input_channels, output_channels, 2, activation=activation
+                )
+            ]
+            for i in range(repeats - 1):
+                seq.append(
+                    ShuffleV2Block(
+                        output_channels, output_channels, 1, activation=activation
+                    )
+                )
+            setattr(self, name, nn.Sequential(*seq))
+            input_channels = output_channels
+        output_channels = self._stage_out_channels[-1]
+        if self.with_last_conv:
+            conv5 = nn.Sequential(
+                nn.Conv2d(input_channels, output_channels, 1, 1, 0, bias=False),
+                nn.BatchNorm2d(output_channels),
+                act_layers(activation),
+            )
+            self.stage4.add_module("conv5", conv5)
+        self._initialize_weights(pretrain)
+
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.maxpool(x)
+        output = []
+        for i in range(2, 5):
+            stage = getattr(self, "stage{}".format(i))
+            x = stage(x)
+            if i in self.out_stages:
+                output.append(x)
+        return tuple(output)
+
+    def _initialize_weights(self, pretrain=True):
+        print("init weights...")
+        for name, m in self.named_modules():
+            if isinstance(m, nn.Conv2d):
+                if "first" in name:
+                    nn.init.normal_(m.weight, 0, 0.01)
+                else:
+                    nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1])
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+            elif isinstance(m, nn.BatchNorm2d):
+                nn.init.constant_(m.weight, 1)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0.0001)
+                nn.init.constant_(m.running_mean, 0)
+        if pretrain:
+            url = model_urls["shufflenetv2_{}".format(self.model_size)]
+            if url is not None:
+                pretrained_state_dict = model_zoo.load_url(url)
+                print("=> loading pretrained model {}".format(url))
+                self.load_state_dict(pretrained_state_dict, strict=False)
diff --git a/nanodet/model/backbone/timm_wrapper.py b/nanodet/model/backbone/timm_wrapper.py
new file mode 100644
index 0000000..ccd2cd8
--- /dev/null
+++ b/nanodet/model/backbone/timm_wrapper.py
@@ -0,0 +1,66 @@
+# Copyright 2022 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+
+import torch.nn as nn
+
+logger = logging.getLogger("NanoDet")
+
+
+class TIMMWrapper(nn.Module):
+    """Wrapper to use backbones in timm
+    https://github.com/rwightman/pytorch-image-models."""
+
+    def __init__(
+        self,
+        model_name,
+        features_only=True,
+        pretrained=True,
+        checkpoint_path="",
+        in_channels=3,
+        **kwargs,
+    ):
+        try:
+            import timm
+        except ImportError as exc:
+            raise RuntimeError(
+                "timm is not installed, please install it first"
+            ) from exc
+        super(TIMMWrapper, self).__init__()
+        self.timm = timm.create_model(
+            model_name=model_name,
+            features_only=features_only,
+            pretrained=pretrained,
+            in_chans=in_channels,
+            checkpoint_path=checkpoint_path,
+            **kwargs,
+        )
+
+        # Remove unused layers
+        self.timm.global_pool = None
+        self.timm.fc = None
+        self.timm.classifier = None
+
+        feature_info = getattr(self.timm, "feature_info", None)
+        if feature_info:
+            logger.info(f"TIMM backbone feature channels: {feature_info.channels()}")
+
+    def forward(self, x):
+        outs = self.timm(x)
+        if isinstance(outs, (list, tuple)):
+            features = tuple(outs)
+        else:
+            features = (outs,)
+        return features
diff --git a/nanodet/model/fpn/__init__.py b/nanodet/model/fpn/__init__.py
new file mode 100644
index 0000000..e55e2f6
--- /dev/null
+++ b/nanodet/model/fpn/__init__.py
@@ -0,0 +1,35 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import copy
+
+from .fpn import FPN
+from .ghost_pan import GhostPAN
+from .pan import PAN
+from .tan import TAN
+
+
+def build_fpn(cfg):
+    fpn_cfg = copy.deepcopy(cfg)
+    name = fpn_cfg.pop("name")
+    if name == "FPN":
+        return FPN(**fpn_cfg)
+    elif name == "PAN":
+        return PAN(**fpn_cfg)
+    elif name == "TAN":
+        return TAN(**fpn_cfg)
+    elif name == "GhostPAN":
+        return GhostPAN(**fpn_cfg)
+    else:
+        raise NotImplementedError
diff --git a/nanodet/model/fpn/fpn.py b/nanodet/model/fpn/fpn.py
new file mode 100644
index 0000000..a163ca1
--- /dev/null
+++ b/nanodet/model/fpn/fpn.py
@@ -0,0 +1,100 @@
+# Modification 2020 RangiLyu
+# Copyright 2018-2019 Open-MMLab.
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+
+#     http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch.nn as nn
+import torch.nn.functional as F
+
+from ..module.conv import ConvModule
+from ..module.init_weights import xavier_init
+
+
+class FPN(nn.Module):
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        num_outs,
+        start_level=0,
+        end_level=-1,
+        conv_cfg=None,
+        norm_cfg=None,
+        activation=None,
+    ):
+        super(FPN, self).__init__()
+        assert isinstance(in_channels, list)
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.num_ins = len(in_channels)
+        self.num_outs = num_outs
+        self.fp16_enabled = False
+
+        if end_level == -1:
+            self.backbone_end_level = self.num_ins
+            assert num_outs >= self.num_ins - start_level
+        else:
+            # if end_level < inputs, no extra level is allowed
+            self.backbone_end_level = end_level
+            assert end_level <= len(in_channels)
+            assert num_outs == end_level - start_level
+        self.start_level = start_level
+        self.end_level = end_level
+        self.lateral_convs = nn.ModuleList()
+
+        for i in range(self.start_level, self.backbone_end_level):
+            l_conv = ConvModule(
+                in_channels[i],
+                out_channels,
+                1,
+                conv_cfg=conv_cfg,
+                norm_cfg=norm_cfg,
+                activation=activation,
+                inplace=False,
+            )
+
+            self.lateral_convs.append(l_conv)
+        self.init_weights()
+
+    # default init_weights for conv(msra) and norm in ConvModule
+    def init_weights(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                xavier_init(m, distribution="uniform")
+
+    def forward(self, inputs):
+        assert len(inputs) == len(self.in_channels)
+
+        # build laterals
+        laterals = [
+            lateral_conv(inputs[i + self.start_level])
+            for i, lateral_conv in enumerate(self.lateral_convs)
+        ]
+
+        # build top-down path
+        used_backbone_levels = len(laterals)
+        for i in range(used_backbone_levels - 1, 0, -1):
+            laterals[i - 1] += F.interpolate(
+                laterals[i], scale_factor=2, mode="bilinear"
+            )
+
+        # build outputs
+        outs = [
+            # self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels)
+            laterals[i]
+            for i in range(used_backbone_levels)
+        ]
+        return tuple(outs)
+
+
+# if __name__ == '__main__':
diff --git a/nanodet/model/fpn/ghost_pan.py b/nanodet/model/fpn/ghost_pan.py
new file mode 100644
index 0000000..0cb4740
--- /dev/null
+++ b/nanodet/model/fpn/ghost_pan.py
@@ -0,0 +1,244 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import torch
+import torch.nn as nn
+
+from ..backbone.ghostnet import GhostBottleneck
+from ..module.conv import ConvModule, DepthwiseConvModule
+
+
+class GhostBlocks(nn.Module):
+    """Stack of GhostBottleneck used in GhostPAN.
+
+    Args:
+        in_channels (int): Number of input channels.
+        out_channels (int): Number of output channels.
+        expand (int): Expand ratio of GhostBottleneck. Default: 1.
+        kernel_size (int): Kernel size of depthwise convolution. Default: 5.
+        num_blocks (int): Number of GhostBottlecneck blocks. Default: 1.
+        use_res (bool): Whether to use residual connection. Default: False.
+        activation (str): Name of activation function. Default: LeakyReLU.
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        expand=1,
+        kernel_size=5,
+        num_blocks=1,
+        use_res=False,
+        activation="LeakyReLU",
+    ):
+        super(GhostBlocks, self).__init__()
+        self.use_res = use_res
+        if use_res:
+            self.reduce_conv = ConvModule(
+                in_channels,
+                out_channels,
+                kernel_size=1,
+                stride=1,
+                padding=0,
+                activation=activation,
+            )
+        blocks = []
+        for _ in range(num_blocks):
+            blocks.append(
+                GhostBottleneck(
+                    in_channels,
+                    int(out_channels * expand),
+                    out_channels,
+                    dw_kernel_size=kernel_size,
+                    activation=activation,
+                )
+            )
+        self.blocks = nn.Sequential(*blocks)
+
+    def forward(self, x):
+        out = self.blocks(x)
+        if self.use_res:
+            out = out + self.reduce_conv(x)
+        return out
+
+
+class GhostPAN(nn.Module):
+    """Path Aggregation Network with Ghost block.
+
+    Args:
+        in_channels (List[int]): Number of input channels per scale.
+        out_channels (int): Number of output channels (used at each scale)
+        num_csp_blocks (int): Number of bottlenecks in CSPLayer. Default: 3
+        use_depthwise (bool): Whether to depthwise separable convolution in
+            blocks. Default: False
+        kernel_size (int): Kernel size of depthwise convolution. Default: 5.
+        expand (int): Expand ratio of GhostBottleneck. Default: 1.
+        num_blocks (int): Number of GhostBottlecneck blocks. Default: 1.
+        use_res (bool): Whether to use residual connection. Default: False.
+        num_extra_level (int): Number of extra conv layers for more feature levels.
+            Default: 0.
+        upsample_cfg (dict): Config dict for interpolate layer.
+            Default: `dict(scale_factor=2, mode='nearest')`
+        norm_cfg (dict): Config dict for normalization layer.
+            Default: dict(type='BN')
+        activation (str): Activation layer name.
+            Default: LeakyReLU.
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        use_depthwise=False,
+        kernel_size=5,
+        expand=1,
+        num_blocks=1,
+        use_res=False,
+        num_extra_level=0,
+        upsample_cfg=dict(scale_factor=2, mode="bilinear"),
+        norm_cfg=dict(type="BN"),
+        activation="LeakyReLU",
+    ):
+        super(GhostPAN, self).__init__()
+        assert num_extra_level >= 0
+        assert num_blocks >= 1
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+
+        conv = DepthwiseConvModule if use_depthwise else ConvModule
+
+        # build top-down blocks
+        self.upsample = nn.Upsample(**upsample_cfg)
+        self.reduce_layers = nn.ModuleList()
+        for idx in range(len(in_channels)):
+            self.reduce_layers.append(
+                ConvModule(
+                    in_channels[idx],
+                    out_channels,
+                    1,
+                    norm_cfg=norm_cfg,
+                    activation=activation,
+                )
+            )
+        self.top_down_blocks = nn.ModuleList()
+        for idx in range(len(in_channels) - 1, 0, -1):
+            self.top_down_blocks.append(
+                GhostBlocks(
+                    out_channels * 2,
+                    out_channels,
+                    expand,
+                    kernel_size=kernel_size,
+                    num_blocks=num_blocks,
+                    use_res=use_res,
+                    activation=activation,
+                )
+            )
+
+        # build bottom-up blocks
+        self.downsamples = nn.ModuleList()
+        self.bottom_up_blocks = nn.ModuleList()
+        for idx in range(len(in_channels) - 1):
+            self.downsamples.append(
+                conv(
+                    out_channels,
+                    out_channels,
+                    kernel_size,
+                    stride=2,
+                    padding=kernel_size // 2,
+                    norm_cfg=norm_cfg,
+                    activation=activation,
+                )
+            )
+            self.bottom_up_blocks.append(
+                GhostBlocks(
+                    out_channels * 2,
+                    out_channels,
+                    expand,
+                    kernel_size=kernel_size,
+                    num_blocks=num_blocks,
+                    use_res=use_res,
+                    activation=activation,
+                )
+            )
+
+        # extra layers
+        self.extra_lvl_in_conv = nn.ModuleList()
+        self.extra_lvl_out_conv = nn.ModuleList()
+        for i in range(num_extra_level):
+            self.extra_lvl_in_conv.append(
+                conv(
+                    out_channels,
+                    out_channels,
+                    kernel_size,
+                    stride=2,
+                    padding=kernel_size // 2,
+                    norm_cfg=norm_cfg,
+                    activation=activation,
+                )
+            )
+            self.extra_lvl_out_conv.append(
+                conv(
+                    out_channels,
+                    out_channels,
+                    kernel_size,
+                    stride=2,
+                    padding=kernel_size // 2,
+                    norm_cfg=norm_cfg,
+                    activation=activation,
+                )
+            )
+
+    def forward(self, inputs):
+        """
+        Args:
+            inputs (tuple[Tensor]): input features.
+        Returns:
+            tuple[Tensor]: multi level features.
+        """
+        assert len(inputs) == len(self.in_channels)
+        inputs = [
+            reduce(input_x) for input_x, reduce in zip(inputs, self.reduce_layers)
+        ]
+        # top-down path
+        inner_outs = [inputs[-1]]
+        for idx in range(len(self.in_channels) - 1, 0, -1):
+            feat_heigh = inner_outs[0]
+            feat_low = inputs[idx - 1]
+
+            inner_outs[0] = feat_heigh
+
+            upsample_feat = self.upsample(feat_heigh)
+
+            inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx](
+                torch.cat([upsample_feat, feat_low], 1)
+            )
+            inner_outs.insert(0, inner_out)
+
+        # bottom-up path
+        outs = [inner_outs[0]]
+        for idx in range(len(self.in_channels) - 1):
+            feat_low = outs[-1]
+            feat_height = inner_outs[idx + 1]
+            downsample_feat = self.downsamples[idx](feat_low)
+            out = self.bottom_up_blocks[idx](
+                torch.cat([downsample_feat, feat_height], 1)
+            )
+            outs.append(out)
+
+        # extra layers
+        for extra_in_layer, extra_out_layer in zip(
+            self.extra_lvl_in_conv, self.extra_lvl_out_conv
+        ):
+            outs.append(extra_in_layer(inputs[-1]) + extra_out_layer(outs[-1]))
+
+        return tuple(outs)
diff --git a/nanodet/model/fpn/pan.py b/nanodet/model/fpn/pan.py
new file mode 100644
index 0000000..807ddf9
--- /dev/null
+++ b/nanodet/model/fpn/pan.py
@@ -0,0 +1,94 @@
+# Modification 2020 RangiLyu
+# Copyright 2018-2019 Open-MMLab.
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+
+#     http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch.nn.functional as F
+
+from .fpn import FPN
+
+
+class PAN(FPN):
+    """Path Aggregation Network for Instance Segmentation.
+
+    This is an implementation of the `PAN in Path Aggregation Network
+    <https://arxiv.org/abs/1803.01534>`_.
+
+    Args:
+        in_channels (List[int]): Number of input channels per scale.
+        out_channels (int): Number of output channels (used at each scale)
+        num_outs (int): Number of output scales.
+        start_level (int): Index of the start input backbone level used to
+            build the feature pyramid. Default: 0.
+        end_level (int): Index of the end input backbone level (exclusive) to
+            build the feature pyramid. Default: -1, which means the last level.
+        conv_cfg (dict): Config dict for convolution layer. Default: None.
+        norm_cfg (dict): Config dict for normalization layer. Default: None.
+        activation (str): Config dict for activation layer in ConvModule.
+            Default: None.
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        num_outs,
+        start_level=0,
+        end_level=-1,
+        conv_cfg=None,
+        norm_cfg=None,
+        activation=None,
+    ):
+        super(PAN, self).__init__(
+            in_channels,
+            out_channels,
+            num_outs,
+            start_level,
+            end_level,
+            conv_cfg,
+            norm_cfg,
+            activation,
+        )
+        self.init_weights()
+
+    def forward(self, inputs):
+        """Forward function."""
+        assert len(inputs) == len(self.in_channels)
+
+        # build laterals
+        laterals = [
+            lateral_conv(inputs[i + self.start_level])
+            for i, lateral_conv in enumerate(self.lateral_convs)
+        ]
+
+        # build top-down path
+        used_backbone_levels = len(laterals)
+        for i in range(used_backbone_levels - 1, 0, -1):
+            laterals[i - 1] += F.interpolate(
+                laterals[i], scale_factor=2, mode="bilinear"
+            )
+
+        # build outputs
+        # part 1: from original levels
+        inter_outs = [laterals[i] for i in range(used_backbone_levels)]
+
+        # part 2: add bottom-up path
+        for i in range(0, used_backbone_levels - 1):
+            inter_outs[i + 1] += F.interpolate(
+                inter_outs[i], scale_factor=0.5, mode="bilinear"
+            )
+
+        outs = []
+        outs.append(inter_outs[0])
+        outs.extend([inter_outs[i] for i in range(1, used_backbone_levels)])
+        return tuple(outs)
diff --git a/nanodet/model/fpn/tan.py b/nanodet/model/fpn/tan.py
new file mode 100644
index 0000000..6ffc305
--- /dev/null
+++ b/nanodet/model/fpn/tan.py
@@ -0,0 +1,123 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from ..module.conv import ConvModule
+from ..module.init_weights import normal_init
+from ..module.transformer import TransformerBlock
+
+
+class TAN(nn.Module):
+    """
+    Transformer Attention Network.
+
+    :param in_channels: Number of input channels per scale.
+    :param out_channels: Number of output channel.
+    :param feature_hw: Size of feature map input to transformer.
+    :param num_heads: Number of attention heads.
+    :param num_encoders: Number of transformer encoder layers.
+    :param mlp_ratio: Hidden layer dimension expand ratio in MLP.
+    :param dropout_ratio: Probability of an element to be zeroed.
+    :param activation: Activation layer type.
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        feature_hw,
+        num_heads,
+        num_encoders,
+        mlp_ratio,
+        dropout_ratio,
+        activation="LeakyReLU",
+    ):
+        super(TAN, self).__init__()
+        assert isinstance(in_channels, list)
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.num_ins = len(in_channels)
+        assert self.num_ins == 3
+
+        self.lateral_convs = nn.ModuleList()
+        for i in range(self.num_ins):
+            l_conv = ConvModule(
+                in_channels[i],
+                out_channels,
+                1,
+                norm_cfg=dict(type="BN"),
+                activation=activation,
+                inplace=False,
+            )
+            self.lateral_convs.append(l_conv)
+        self.transformer = TransformerBlock(
+            out_channels * self.num_ins,
+            out_channels,
+            num_heads,
+            num_encoders,
+            mlp_ratio,
+            dropout_ratio,
+            activation=activation,
+        )
+        self.pos_embed = nn.Parameter(
+            torch.zeros(feature_hw[0] * feature_hw[1], 1, out_channels)
+        )
+
+        self.init_weights()
+
+    def init_weights(self):
+        torch.nn.init.trunc_normal_(self.pos_embed, std=0.02)
+        for m in self.modules():
+            if isinstance(m, nn.Linear):
+                torch.nn.init.trunc_normal_(m.weight, std=0.02)
+                if isinstance(m, nn.Linear) and m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+            elif isinstance(m, nn.LayerNorm):
+                nn.init.constant_(m.bias, 0)
+                nn.init.constant_(m.weight, 1.0)
+            elif isinstance(m, nn.Conv2d):
+                normal_init(m, 0.01)
+
+    def forward(self, inputs):
+        assert len(inputs) == len(self.in_channels)
+
+        # build laterals
+        laterals = [
+            lateral_conv(inputs[i]) for i, lateral_conv in enumerate(self.lateral_convs)
+        ]
+
+        # transformer attention
+        mid_shape = laterals[1].shape[2:]
+        mid_lvl = torch.cat(
+            (
+                F.interpolate(laterals[0], size=mid_shape, mode="bilinear"),
+                laterals[1],
+                F.interpolate(laterals[2], size=mid_shape, mode="bilinear"),
+            ),
+            dim=1,
+        )
+        mid_lvl = self.transformer(mid_lvl, self.pos_embed)
+
+        # build outputs
+        outs = [
+            laterals[0]
+            + F.interpolate(mid_lvl, size=laterals[0].shape[2:], mode="bilinear"),
+            laterals[1] + mid_lvl,
+            laterals[2]
+            + F.interpolate(mid_lvl, size=laterals[2].shape[2:], mode="bilinear"),
+        ]
+        return tuple(outs)
diff --git a/nanodet/model/head/__init__.py b/nanodet/model/head/__init__.py
new file mode 100644
index 0000000..d1ef2dd
--- /dev/null
+++ b/nanodet/model/head/__init__.py
@@ -0,0 +1,21 @@
+import copy
+
+from .gfl_head import GFLHead
+from .nanodet_head import NanoDetHead
+from .nanodet_plus_head import NanoDetPlusHead
+from .simple_conv_head import SimpleConvHead
+
+
+def build_head(cfg):
+    head_cfg = copy.deepcopy(cfg)
+    name = head_cfg.pop("name")
+    if name == "GFLHead":
+        return GFLHead(**head_cfg)
+    elif name == "NanoDetHead":
+        return NanoDetHead(**head_cfg)
+    elif name == "NanoDetPlusHead":
+        return NanoDetPlusHead(**head_cfg)
+    elif name == "SimpleConvHead":
+        return SimpleConvHead(**head_cfg)
+    else:
+        raise NotImplementedError
diff --git a/nanodet/model/head/assigner/assign_result.py b/nanodet/model/head/assigner/assign_result.py
new file mode 100644
index 0000000..fb7c65e
--- /dev/null
+++ b/nanodet/model/head/assigner/assign_result.py
@@ -0,0 +1,227 @@
+# Modification 2020 RangiLyu
+# Copyright 2018-2019 Open-MMLab.
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+
+#     http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch
+
+from nanodet.util import util_mixins
+
+
+class AssignResult(util_mixins.NiceRepr):
+    """
+    Stores assignments between predicted and truth boxes.
+
+    Attributes:
+        num_gts (int): the number of truth boxes considered when computing this
+            assignment
+
+        gt_inds (LongTensor): for each predicted box indicates the 1-based
+            index of the assigned truth box. 0 means unassigned and -1 means
+            ignore.
+
+        max_overlaps (FloatTensor): the iou between the predicted box and its
+            assigned truth box.
+
+        labels (None | LongTensor): If specified, for each predicted box
+            indicates the category label of the assigned truth box.
+
+    Example:
+        >>> # An assign result between 4 predicted boxes and 9 true boxes
+        >>> # where only two boxes were assigned.
+        >>> num_gts = 9
+        >>> max_overlaps = torch.LongTensor([0, .5, .9, 0])
+        >>> gt_inds = torch.LongTensor([-1, 1, 2, 0])
+        >>> labels = torch.LongTensor([0, 3, 4, 0])
+        >>> self = AssignResult(num_gts, gt_inds, max_overlaps, labels)
+        >>> print(str(self))  # xdoctest: +IGNORE_WANT
+        <AssignResult(num_gts=9, gt_inds.shape=(4,), max_overlaps.shape=(4,),
+                      labels.shape=(4,))>
+        >>> # Force addition of gt labels (when adding gt as proposals)
+        >>> new_labels = torch.LongTensor([3, 4, 5])
+        >>> self.add_gt_(new_labels)
+        >>> print(str(self))  # xdoctest: +IGNORE_WANT
+        <AssignResult(num_gts=9, gt_inds.shape=(7,), max_overlaps.shape=(7,),
+                      labels.shape=(7,))>
+    """
+
+    def __init__(self, num_gts, gt_inds, max_overlaps, labels=None):
+        self.num_gts = num_gts
+        self.gt_inds = gt_inds
+        self.max_overlaps = max_overlaps
+        self.labels = labels
+        # Interface for possible user-defined properties
+        self._extra_properties = {}
+
+    @property
+    def num_preds(self):
+        """int: the number of predictions in this assignment"""
+        return len(self.gt_inds)
+
+    def set_extra_property(self, key, value):
+        """Set user-defined new property."""
+        assert key not in self.info
+        self._extra_properties[key] = value
+
+    def get_extra_property(self, key):
+        """Get user-defined property."""
+        return self._extra_properties.get(key, None)
+
+    @property
+    def info(self):
+        """dict: a dictionary of info about the object"""
+        basic_info = {
+            "num_gts": self.num_gts,
+            "num_preds": self.num_preds,
+            "gt_inds": self.gt_inds,
+            "max_overlaps": self.max_overlaps,
+            "labels": self.labels,
+        }
+        basic_info.update(self._extra_properties)
+        return basic_info
+
+    def __nice__(self):
+        """str: a "nice" summary string describing this assign result"""
+        parts = []
+        parts.append(f"num_gts={self.num_gts!r}")
+        if self.gt_inds is None:
+            parts.append(f"gt_inds={self.gt_inds!r}")
+        else:
+            parts.append(f"gt_inds.shape={tuple(self.gt_inds.shape)!r}")
+        if self.max_overlaps is None:
+            parts.append(f"max_overlaps={self.max_overlaps!r}")
+        else:
+            parts.append("max_overlaps.shape=" f"{tuple(self.max_overlaps.shape)!r}")
+        if self.labels is None:
+            parts.append(f"labels={self.labels!r}")
+        else:
+            parts.append(f"labels.shape={tuple(self.labels.shape)!r}")
+        return ", ".join(parts)
+
+    @classmethod
+    def random(cls, **kwargs):
+        """Create random AssignResult for tests or debugging.
+
+        Args:
+            num_preds: number of predicted boxes
+            num_gts: number of true boxes
+            p_ignore (float): probability of a predicted box assinged to an
+                ignored truth
+            p_assigned (float): probability of a predicted box not being
+                assigned
+            p_use_label (float | bool): with labels or not
+            rng (None | int | numpy.random.RandomState): seed or state
+
+        Returns:
+            :obj:`AssignResult`: Randomly generated assign results.
+
+        Example:
+            >>> from nanodet.model.head.assigner.assign_result import AssignResult
+            >>> self = AssignResult.random()
+            >>> print(self.info)
+        """
+        rng = kwargs.get("rng", None)
+        num_gts = kwargs.get("num_gts", None)
+        num_preds = kwargs.get("num_preds", None)
+        p_ignore = kwargs.get("p_ignore", 0.3)
+        p_assigned = kwargs.get("p_assigned", 0.7)
+        p_use_label = kwargs.get("p_use_label", 0.5)
+        num_classes = kwargs.get("p_use_label", 3)
+
+        import numpy as np
+
+        if rng is None:
+            rng = np.random.mtrand._rand
+        elif isinstance(rng, int):
+            rng = np.random.RandomState(rng)
+        else:
+            rng = rng
+        if num_gts is None:
+            num_gts = rng.randint(0, 8)
+        if num_preds is None:
+            num_preds = rng.randint(0, 16)
+
+        if num_gts == 0:
+            max_overlaps = torch.zeros(num_preds, dtype=torch.float32)
+            gt_inds = torch.zeros(num_preds, dtype=torch.int64)
+            if p_use_label is True or p_use_label < rng.rand():
+                labels = torch.zeros(num_preds, dtype=torch.int64)
+            else:
+                labels = None
+        else:
+            import numpy as np
+
+            # Create an overlap for each predicted box
+            max_overlaps = torch.from_numpy(rng.rand(num_preds))
+
+            # Construct gt_inds for each predicted box
+            is_assigned = torch.from_numpy(rng.rand(num_preds) < p_assigned)
+            # maximum number of assignments constraints
+            n_assigned = min(num_preds, min(num_gts, is_assigned.sum()))
+
+            assigned_idxs = np.where(is_assigned)[0]
+            rng.shuffle(assigned_idxs)
+            assigned_idxs = assigned_idxs[0:n_assigned]
+            assigned_idxs.sort()
+
+            is_assigned[:] = 0
+            is_assigned[assigned_idxs] = True
+
+            is_ignore = torch.from_numpy(rng.rand(num_preds) < p_ignore) & is_assigned
+
+            gt_inds = torch.zeros(num_preds, dtype=torch.int64)
+
+            true_idxs = np.arange(num_gts)
+            rng.shuffle(true_idxs)
+            true_idxs = torch.from_numpy(true_idxs)
+            gt_inds[is_assigned] = true_idxs[:n_assigned]
+
+            gt_inds = torch.from_numpy(rng.randint(1, num_gts + 1, size=num_preds))
+            gt_inds[is_ignore] = -1
+            gt_inds[~is_assigned] = 0
+            max_overlaps[~is_assigned] = 0
+
+            if p_use_label is True or p_use_label < rng.rand():
+                if num_classes == 0:
+                    labels = torch.zeros(num_preds, dtype=torch.int64)
+                else:
+                    labels = torch.from_numpy(
+                        # remind that we set FG labels to [0, num_class-1]
+                        # since mmdet v2.0
+                        # BG cat_id: num_class
+                        rng.randint(0, num_classes, size=num_preds)
+                    )
+                    labels[~is_assigned] = 0
+            else:
+                labels = None
+
+        self = cls(num_gts, gt_inds, max_overlaps, labels)
+        return self
+
+    def add_gt_(self, gt_labels):
+        """Add ground truth as assigned results.
+
+        Args:
+            gt_labels (torch.Tensor): Labels of gt boxes
+        """
+        self_inds = torch.arange(
+            1, len(gt_labels) + 1, dtype=torch.long, device=gt_labels.device
+        )
+        self.gt_inds = torch.cat([self_inds, self.gt_inds])
+
+        self.max_overlaps = torch.cat(
+            [self.max_overlaps.new_ones(len(gt_labels)), self.max_overlaps]
+        )
+
+        if self.labels is not None:
+            self.labels = torch.cat([gt_labels, self.labels])
diff --git a/nanodet/model/head/assigner/atss_assigner.py b/nanodet/model/head/assigner/atss_assigner.py
new file mode 100644
index 0000000..c182bff
--- /dev/null
+++ b/nanodet/model/head/assigner/atss_assigner.py
@@ -0,0 +1,174 @@
+# Modification 2020 RangiLyu
+# Copyright 2018-2019 Open-MMLab.
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+
+#     http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch
+
+from ...loss.iou_loss import bbox_overlaps
+from .assign_result import AssignResult
+from .base_assigner import BaseAssigner
+
+
+class ATSSAssigner(BaseAssigner):
+    """Assign a corresponding gt bbox or background to each bbox.
+
+    Each proposals will be assigned with `0` or a positive integer
+    indicating the ground truth index.
+
+    - 0: negative sample, no assigned gt
+    - positive integer: positive sample, index (1-based) of assigned gt
+
+    Args:
+        topk (float): number of bbox selected in each level
+    """
+
+    def __init__(self, topk):
+        self.topk = topk
+
+    # https://github.com/sfzhang15/ATSS/blob/master/atss_core/modeling/rpn/atss/loss.py
+
+    def assign(
+        self, bboxes, num_level_bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None
+    ):
+        """Assign gt to bboxes.
+
+        The assignment is done in following steps
+
+        1. compute iou between all bbox (bbox of all pyramid levels) and gt
+        2. compute center distance between all bbox and gt
+        3. on each pyramid level, for each gt, select k bbox whose center
+           are closest to the gt center, so we total select k*l bbox as
+           candidates for each gt
+        4. get corresponding iou for the these candidates, and compute the
+           mean and std, set mean + std as the iou threshold
+        5. select these candidates whose iou are greater than or equal to
+           the threshold as postive
+        6. limit the positive sample's center in gt
+
+
+        Args:
+            bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
+            num_level_bboxes (List): num of bboxes in each level
+            gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
+            gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
+                labelled as `ignored`, e.g., crowd boxes in COCO.
+            gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).
+
+        Returns:
+            :obj:`AssignResult`: The assign result.
+        """
+        INF = 100000000
+        bboxes = bboxes[:, :4]
+        num_gt, num_bboxes = gt_bboxes.size(0), bboxes.size(0)
+
+        # compute iou between all bbox and gt
+        overlaps = bbox_overlaps(bboxes, gt_bboxes)
+
+        # assign 0 by default
+        assigned_gt_inds = overlaps.new_full((num_bboxes,), 0, dtype=torch.long)
+
+        if num_gt == 0 or num_bboxes == 0:
+            # No ground truth or boxes, return empty assignment
+            max_overlaps = overlaps.new_zeros((num_bboxes,))
+            if num_gt == 0:
+                # No truth, assign everything to background
+                assigned_gt_inds[:] = 0
+            if gt_labels is None:
+                assigned_labels = None
+            else:
+                assigned_labels = overlaps.new_full((num_bboxes,), -1, dtype=torch.long)
+            return AssignResult(
+                num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels
+            )
+
+        # compute center distance between all bbox and gt
+        gt_cx = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0
+        gt_cy = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0
+        gt_points = torch.stack((gt_cx, gt_cy), dim=1)
+
+        bboxes_cx = (bboxes[:, 0] + bboxes[:, 2]) / 2.0
+        bboxes_cy = (bboxes[:, 1] + bboxes[:, 3]) / 2.0
+        bboxes_points = torch.stack((bboxes_cx, bboxes_cy), dim=1)
+
+        distances = (
+            (bboxes_points[:, None, :] - gt_points[None, :, :]).pow(2).sum(-1).sqrt()
+        )
+
+        # Selecting candidates based on the center distance
+        candidate_idxs = []
+        start_idx = 0
+        for level, bboxes_per_level in enumerate(num_level_bboxes):
+            # on each pyramid level, for each gt,
+            # select k bbox whose center are closest to the gt center
+            end_idx = start_idx + bboxes_per_level
+            distances_per_level = distances[start_idx:end_idx, :]
+            selectable_k = min(self.topk, bboxes_per_level)
+            _, topk_idxs_per_level = distances_per_level.topk(
+                selectable_k, dim=0, largest=False
+            )
+            candidate_idxs.append(topk_idxs_per_level + start_idx)
+            start_idx = end_idx
+        candidate_idxs = torch.cat(candidate_idxs, dim=0)
+
+        # get corresponding iou for the these candidates, and compute the
+        # mean and std, set mean + std as the iou threshold
+        candidate_overlaps = overlaps[candidate_idxs, torch.arange(num_gt)]
+        overlaps_mean_per_gt = candidate_overlaps.mean(0)
+        overlaps_std_per_gt = candidate_overlaps.std(0)
+        overlaps_thr_per_gt = overlaps_mean_per_gt + overlaps_std_per_gt
+
+        is_pos = candidate_overlaps >= overlaps_thr_per_gt[None, :]
+
+        # limit the positive sample's center in gt
+        for gt_idx in range(num_gt):
+            candidate_idxs[:, gt_idx] += gt_idx * num_bboxes
+        ep_bboxes_cx = (
+            bboxes_cx.view(1, -1).expand(num_gt, num_bboxes).contiguous().view(-1)
+        )
+        ep_bboxes_cy = (
+            bboxes_cy.view(1, -1).expand(num_gt, num_bboxes).contiguous().view(-1)
+        )
+        candidate_idxs = candidate_idxs.view(-1)
+
+        # calculate the left, top, right, bottom distance between positive
+        # bbox center and gt side
+        l_ = ep_bboxes_cx[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 0]
+        t_ = ep_bboxes_cy[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 1]
+        r_ = gt_bboxes[:, 2] - ep_bboxes_cx[candidate_idxs].view(-1, num_gt)
+        b_ = gt_bboxes[:, 3] - ep_bboxes_cy[candidate_idxs].view(-1, num_gt)
+        is_in_gts = torch.stack([l_, t_, r_, b_], dim=1).min(dim=1)[0] > 0.01
+        is_pos = is_pos & is_in_gts
+
+        # if an anchor box is assigned to multiple gts,
+        # the one with the highest IoU will be selected.
+        overlaps_inf = torch.full_like(overlaps, -INF).t().contiguous().view(-1)
+        index = candidate_idxs.view(-1)[is_pos.view(-1)]
+        overlaps_inf[index] = overlaps.t().contiguous().view(-1)[index]
+        overlaps_inf = overlaps_inf.view(num_gt, -1).t()
+
+        max_overlaps, argmax_overlaps = overlaps_inf.max(dim=1)
+        assigned_gt_inds[max_overlaps != -INF] = (
+            argmax_overlaps[max_overlaps != -INF] + 1
+        )
+
+        if gt_labels is not None:
+            assigned_labels = assigned_gt_inds.new_full((num_bboxes,), -1)
+            pos_inds = torch.nonzero(assigned_gt_inds > 0, as_tuple=False).squeeze()
+            if pos_inds.numel() > 0:
+                assigned_labels[pos_inds] = gt_labels[assigned_gt_inds[pos_inds] - 1]
+        else:
+            assigned_labels = None
+        return AssignResult(
+            num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels
+        )
diff --git a/nanodet/model/head/assigner/base_assigner.py b/nanodet/model/head/assigner/base_assigner.py
new file mode 100644
index 0000000..8a9094f
--- /dev/null
+++ b/nanodet/model/head/assigner/base_assigner.py
@@ -0,0 +1,7 @@
+from abc import ABCMeta, abstractmethod
+
+
+class BaseAssigner(metaclass=ABCMeta):
+    @abstractmethod
+    def assign(self, bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None):
+        pass
diff --git a/nanodet/model/head/assigner/dsl_assigner.py b/nanodet/model/head/assigner/dsl_assigner.py
new file mode 100644
index 0000000..e74dc08
--- /dev/null
+++ b/nanodet/model/head/assigner/dsl_assigner.py
@@ -0,0 +1,154 @@
+import torch
+import torch.nn.functional as F
+
+from ...loss.iou_loss import bbox_overlaps
+from .assign_result import AssignResult
+from .base_assigner import BaseAssigner
+
+
+class DynamicSoftLabelAssigner(BaseAssigner):
+    """Computes matching between predictions and ground truth with
+    dynamic soft label assignment.
+
+    Args:
+        topk (int): Select top-k predictions to calculate dynamic k
+            best matchs for each gt. Default 13.
+        iou_factor (float): The scale factor of iou cost. Default 3.0.
+    """
+
+    def __init__(self, topk=13, iou_factor=3.0):
+        self.topk = topk
+        self.iou_factor = iou_factor
+
+    def assign(
+        self,
+        pred_scores,
+        priors,
+        decoded_bboxes,
+        gt_bboxes,
+        gt_labels,
+    ):
+        """Assign gt to priors with dynamic soft label assignment.
+        Args:
+            pred_scores (Tensor): Classification scores of one image,
+                a 2D-Tensor with shape [num_priors, num_classes]
+            priors (Tensor): All priors of one image, a 2D-Tensor with shape
+                [num_priors, 4] in [cx, xy, stride_w, stride_y] format.
+            decoded_bboxes (Tensor): Predicted bboxes, a 2D-Tensor with shape
+                [num_priors, 4] in [tl_x, tl_y, br_x, br_y] format.
+            gt_bboxes (Tensor): Ground truth bboxes of one image, a 2D-Tensor
+                with shape [num_gts, 4] in [tl_x, tl_y, br_x, br_y] format.
+            gt_labels (Tensor): Ground truth labels of one image, a Tensor
+                with shape [num_gts].
+
+        Returns:
+            :obj:`AssignResult`: The assigned result.
+        """
+        INF = 100000000
+        num_gt = gt_bboxes.size(0)
+        num_bboxes = decoded_bboxes.size(0)
+
+        # assign 0 by default
+        assigned_gt_inds = decoded_bboxes.new_full((num_bboxes,), 0, dtype=torch.long)
+
+        prior_center = priors[:, :2]
+        lt_ = prior_center[:, None] - gt_bboxes[:, :2]
+        rb_ = gt_bboxes[:, 2:] - prior_center[:, None]
+
+        deltas = torch.cat([lt_, rb_], dim=-1)
+        is_in_gts = deltas.min(dim=-1).values > 0
+        valid_mask = is_in_gts.sum(dim=1) > 0
+
+        valid_decoded_bbox = decoded_bboxes[valid_mask]
+        valid_pred_scores = pred_scores[valid_mask]
+        num_valid = valid_decoded_bbox.size(0)
+
+        if num_gt == 0 or num_bboxes == 0 or num_valid == 0:
+            # No ground truth or boxes, return empty assignment
+            max_overlaps = decoded_bboxes.new_zeros((num_bboxes,))
+            if num_gt == 0:
+                # No truth, assign everything to background
+                assigned_gt_inds[:] = 0
+            if gt_labels is None:
+                assigned_labels = None
+            else:
+                assigned_labels = decoded_bboxes.new_full(
+                    (num_bboxes,), -1, dtype=torch.long
+                )
+            return AssignResult(
+                num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels
+            )
+
+        pairwise_ious = bbox_overlaps(valid_decoded_bbox, gt_bboxes)
+        iou_cost = -torch.log(pairwise_ious + 1e-7)
+
+        gt_onehot_label = (
+            F.one_hot(gt_labels.to(torch.int64), pred_scores.shape[-1])
+            .float()
+            .unsqueeze(0)
+            .repeat(num_valid, 1, 1)
+        )
+        valid_pred_scores = valid_pred_scores.unsqueeze(1).repeat(1, num_gt, 1)
+
+        soft_label = gt_onehot_label * pairwise_ious[..., None]
+        scale_factor = soft_label - valid_pred_scores
+
+        cls_cost = F.binary_cross_entropy(
+            valid_pred_scores, soft_label, reduction="none"
+        ) * scale_factor.abs().pow(2.0)
+
+        cls_cost = cls_cost.sum(dim=-1)
+
+        cost_matrix = cls_cost + iou_cost * self.iou_factor
+
+        matched_pred_ious, matched_gt_inds = self.dynamic_k_matching(
+            cost_matrix, pairwise_ious, num_gt, valid_mask
+        )
+
+        # convert to AssignResult format
+        assigned_gt_inds[valid_mask] = matched_gt_inds + 1
+        assigned_labels = assigned_gt_inds.new_full((num_bboxes,), -1)
+        assigned_labels[valid_mask] = gt_labels[matched_gt_inds].long()
+        max_overlaps = assigned_gt_inds.new_full(
+            (num_bboxes,), -INF, dtype=torch.float32
+        )
+        max_overlaps[valid_mask] = matched_pred_ious
+        return AssignResult(
+            num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels
+        )
+
+    def dynamic_k_matching(self, cost, pairwise_ious, num_gt, valid_mask):
+        """Use sum of topk pred iou as dynamic k. Refer from OTA and YOLOX.
+
+        Args:
+            cost (Tensor): Cost matrix.
+            pairwise_ious (Tensor): Pairwise iou matrix.
+            num_gt (int): Number of gt.
+            valid_mask (Tensor): Mask for valid bboxes.
+        """
+        matching_matrix = torch.zeros_like(cost)
+        # select candidate topk ious for dynamic-k calculation
+        candidate_topk = min(self.topk, pairwise_ious.size(0))
+        topk_ious, _ = torch.topk(pairwise_ious, candidate_topk, dim=0)
+        # calculate dynamic k for each gt
+        dynamic_ks = torch.clamp(topk_ious.sum(0).int(), min=1)
+        for gt_idx in range(num_gt):
+            _, pos_idx = torch.topk(
+                cost[:, gt_idx], k=dynamic_ks[gt_idx].item(), largest=False
+            )
+            matching_matrix[:, gt_idx][pos_idx] = 1.0
+
+        del topk_ious, dynamic_ks, pos_idx
+
+        prior_match_gt_mask = matching_matrix.sum(1) > 1
+        if prior_match_gt_mask.sum() > 0:
+            cost_min, cost_argmin = torch.min(cost[prior_match_gt_mask, :], dim=1)
+            matching_matrix[prior_match_gt_mask, :] *= 0.0
+            matching_matrix[prior_match_gt_mask, cost_argmin] = 1.0
+        # get foreground mask inside box and center prior
+        fg_mask_inboxes = matching_matrix.sum(1) > 0.0
+        valid_mask[valid_mask.clone()] = fg_mask_inboxes
+
+        matched_gt_inds = matching_matrix[fg_mask_inboxes, :].argmax(1)
+        matched_pred_ious = (matching_matrix * pairwise_ious).sum(1)[fg_mask_inboxes]
+        return matched_pred_ious, matched_gt_inds
diff --git a/nanodet/model/head/gfl_head.py b/nanodet/model/head/gfl_head.py
new file mode 100644
index 0000000..ee5409c
--- /dev/null
+++ b/nanodet/model/head/gfl_head.py
@@ -0,0 +1,708 @@
+import math
+
+import cv2
+import numpy as np
+import torch
+import torch.distributed as dist
+import torch.nn as nn
+import torch.nn.functional as F
+
+from nanodet.util import (
+    bbox2distance,
+    distance2bbox,
+    images_to_levels,
+    multi_apply,
+    overlay_bbox_cv,
+)
+
+from ...data.transform.warp import warp_boxes
+from ..loss.gfocal_loss import DistributionFocalLoss, QualityFocalLoss
+from ..loss.iou_loss import GIoULoss, bbox_overlaps
+from ..module.conv import ConvModule
+from ..module.init_weights import normal_init
+from ..module.nms import multiclass_nms
+from ..module.scale import Scale
+from .assigner.atss_assigner import ATSSAssigner
+
+
+def reduce_mean(tensor):
+    if not (dist.is_available() and dist.is_initialized()):
+        return tensor
+    tensor = tensor.clone()
+    dist.all_reduce(tensor.true_divide(dist.get_world_size()), op=dist.ReduceOp.SUM)
+    return tensor
+
+
+class Integral(nn.Module):
+    """A fixed layer for calculating integral result from distribution.
+    This layer calculates the target location by :math: `sum{P(y_i) * y_i}`,
+    P(y_i) denotes the softmax vector that represents the discrete distribution
+    y_i denotes the discrete set, usually {0, 1, 2, ..., reg_max}
+    Args:
+        reg_max (int): The maximal value of the discrete set. Default: 16. You
+            may want to reset it according to your new dataset or related
+            settings.
+    """
+
+    def __init__(self, reg_max=16):
+        super(Integral, self).__init__()
+        self.reg_max = reg_max
+        self.register_buffer(
+            "project", torch.linspace(0, self.reg_max, self.reg_max + 1)
+        )
+
+    def forward(self, x):
+        """Forward feature from the regression head to get integral result of
+        bounding box location.
+        Args:
+            x (Tensor): Features of the regression head, shape (N, 4*(n+1)),
+                n is self.reg_max.
+        Returns:
+            x (Tensor): Integral result of box locations, i.e., distance
+                offsets from the box center in four directions, shape (N, 4).
+        """
+        shape = x.size()
+        x = F.softmax(x.reshape(*shape[:-1], 4, self.reg_max + 1), dim=-1)
+        x = F.linear(x, self.project.type_as(x)).reshape(*shape[:-1], 4)
+        return x
+
+
+class GFLHead(nn.Module):
+    """Generalized Focal Loss: Learning Qualified and Distributed Bounding
+    Boxes for Dense Object Detection.
+
+    GFL head structure is similar with ATSS, however GFL uses
+    1) joint representation for classification and localization quality, and
+    2) flexible General distribution for bounding box locations,
+    which are supervised by
+    Quality Focal Loss (QFL) and Distribution Focal Loss (DFL), respectively
+
+    https://arxiv.org/abs/2006.04388
+
+    :param num_classes: Number of categories excluding the background category.
+    :param loss: Config of all loss functions.
+    :param input_channel: Number of channels in the input feature map.
+    :param feat_channels: Number of conv layers in cls and reg tower. Default: 4.
+    :param stacked_convs: Number of conv layers in cls and reg tower. Default: 4.
+    :param octave_base_scale: Scale factor of grid cells.
+    :param strides: Down sample strides of all level feature map
+    :param conv_cfg: Dictionary to construct and config conv layer. Default: None.
+    :param norm_cfg: Dictionary to construct and config norm layer.
+    :param reg_max: Max value of integral set :math: `{0, ..., reg_max}`
+                    in QFL setting. Default: 16.
+    :param kwargs:
+    """
+
+    def __init__(
+        self,
+        num_classes,
+        loss,
+        input_channel,
+        feat_channels=256,
+        stacked_convs=4,
+        octave_base_scale=4,
+        strides=[8, 16, 32],
+        conv_cfg=None,
+        norm_cfg=dict(type="GN", num_groups=32, requires_grad=True),
+        reg_max=16,
+        **kwargs
+    ):
+        super(GFLHead, self).__init__()
+        self.num_classes = num_classes
+        self.in_channels = input_channel
+        self.feat_channels = feat_channels
+        self.stacked_convs = stacked_convs
+        self.grid_cell_scale = octave_base_scale
+        self.strides = strides
+        self.reg_max = reg_max
+
+        self.loss_cfg = loss
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.use_sigmoid = self.loss_cfg.loss_qfl.use_sigmoid
+        if self.use_sigmoid:
+            self.cls_out_channels = num_classes
+        else:
+            self.cls_out_channels = num_classes + 1
+
+        self.assigner = ATSSAssigner(topk=9)
+        self.distribution_project = Integral(self.reg_max)
+
+        self.loss_qfl = QualityFocalLoss(
+            use_sigmoid=self.use_sigmoid,
+            beta=self.loss_cfg.loss_qfl.beta,
+            loss_weight=self.loss_cfg.loss_qfl.loss_weight,
+        )
+        self.loss_dfl = DistributionFocalLoss(
+            loss_weight=self.loss_cfg.loss_dfl.loss_weight
+        )
+        self.loss_bbox = GIoULoss(loss_weight=self.loss_cfg.loss_bbox.loss_weight)
+        self._init_layers()
+        self.init_weights()
+
+    def _init_layers(self):
+        self.relu = nn.ReLU(inplace=True)
+        self.cls_convs = nn.ModuleList()
+        self.reg_convs = nn.ModuleList()
+        for i in range(self.stacked_convs):
+            chn = self.in_channels if i == 0 else self.feat_channels
+            self.cls_convs.append(
+                ConvModule(
+                    chn,
+                    self.feat_channels,
+                    3,
+                    stride=1,
+                    padding=1,
+                    conv_cfg=self.conv_cfg,
+                    norm_cfg=self.norm_cfg,
+                )
+            )
+            self.reg_convs.append(
+                ConvModule(
+                    chn,
+                    self.feat_channels,
+                    3,
+                    stride=1,
+                    padding=1,
+                    conv_cfg=self.conv_cfg,
+                    norm_cfg=self.norm_cfg,
+                )
+            )
+        self.gfl_cls = nn.Conv2d(
+            self.feat_channels, self.cls_out_channels, 3, padding=1
+        )
+        self.gfl_reg = nn.Conv2d(
+            self.feat_channels, 4 * (self.reg_max + 1), 3, padding=1
+        )
+        self.scales = nn.ModuleList([Scale(1.0) for _ in self.strides])
+
+    def init_weights(self):
+        for m in self.cls_convs:
+            normal_init(m.conv, std=0.01)
+        for m in self.reg_convs:
+            normal_init(m.conv, std=0.01)
+        bias_cls = -4.595
+        normal_init(self.gfl_cls, std=0.01, bias=bias_cls)
+        normal_init(self.gfl_reg, std=0.01)
+
+    def forward(self, feats):
+        if torch.onnx.is_in_onnx_export():
+            return self._forward_onnx(feats)
+        outputs = []
+        for x, scale in zip(feats, self.scales):
+            cls_feat = x
+            reg_feat = x
+            for cls_conv in self.cls_convs:
+                cls_feat = cls_conv(cls_feat)
+            for reg_conv in self.reg_convs:
+                reg_feat = reg_conv(reg_feat)
+            cls_score = self.gfl_cls(cls_feat)
+            bbox_pred = scale(self.gfl_reg(reg_feat)).float()
+            output = torch.cat([cls_score, bbox_pred], dim=1)
+            outputs.append(output.flatten(start_dim=2))
+        outputs = torch.cat(outputs, dim=2).permute(0, 2, 1)
+        return outputs
+
+    def loss(self, preds, gt_meta):
+        cls_scores, bbox_preds = preds.split(
+            [self.num_classes, 4 * (self.reg_max + 1)], dim=-1
+        )
+        device = cls_scores.device
+        gt_bboxes = gt_meta["gt_bboxes"]
+        gt_labels = gt_meta["gt_labels"]
+        input_height, input_width = gt_meta["img"].shape[2:]
+        gt_bboxes_ignore = None
+
+        featmap_sizes = [
+            (math.ceil(input_height / stride), math.ceil(input_width) / stride)
+            for stride in self.strides
+        ]
+
+        cls_reg_targets = self.target_assign(
+            cls_scores,
+            bbox_preds,
+            featmap_sizes,
+            gt_bboxes,
+            gt_bboxes_ignore,
+            gt_labels,
+            device=device,
+        )
+        if cls_reg_targets is None:
+            return None
+
+        (
+            cls_preds_list,
+            reg_preds_list,
+            grid_cells_list,
+            labels_list,
+            label_weights_list,
+            bbox_targets_list,
+            bbox_weights_list,
+            num_total_pos,
+            num_total_neg,
+        ) = cls_reg_targets
+
+        num_total_samples = reduce_mean(torch.tensor(num_total_pos).to(device)).item()
+        num_total_samples = max(num_total_samples, 1.0)
+
+        losses_qfl, losses_bbox, losses_dfl, avg_factor = multi_apply(
+            self.loss_single,
+            grid_cells_list,
+            cls_preds_list,
+            reg_preds_list,
+            labels_list,
+            label_weights_list,
+            bbox_targets_list,
+            self.strides,
+            num_total_samples=num_total_samples,
+        )
+
+        avg_factor = sum(avg_factor)
+        avg_factor = reduce_mean(avg_factor).item()
+        if avg_factor <= 0:
+            loss_qfl = torch.tensor(0, dtype=torch.float32, requires_grad=True).to(
+                device
+            )
+            loss_bbox = torch.tensor(0, dtype=torch.float32, requires_grad=True).to(
+                device
+            )
+            loss_dfl = torch.tensor(0, dtype=torch.float32, requires_grad=True).to(
+                device
+            )
+        else:
+            losses_bbox = list(map(lambda x: x / avg_factor, losses_bbox))
+            losses_dfl = list(map(lambda x: x / avg_factor, losses_dfl))
+
+            loss_qfl = sum(losses_qfl)
+            loss_bbox = sum(losses_bbox)
+            loss_dfl = sum(losses_dfl)
+
+        loss = loss_qfl + loss_bbox + loss_dfl
+        loss_states = dict(loss_qfl=loss_qfl, loss_bbox=loss_bbox, loss_dfl=loss_dfl)
+
+        return loss, loss_states
+
+    def loss_single(
+        self,
+        grid_cells,
+        cls_score,
+        bbox_pred,
+        labels,
+        label_weights,
+        bbox_targets,
+        stride,
+        num_total_samples,
+    ):
+        grid_cells = grid_cells.reshape(-1, 4)
+        cls_score = cls_score.reshape(-1, self.cls_out_channels)
+        bbox_pred = bbox_pred.reshape(-1, 4 * (self.reg_max + 1))
+        bbox_targets = bbox_targets.reshape(-1, 4)
+        labels = labels.reshape(-1)
+        label_weights = label_weights.reshape(-1)
+
+        # FG cat_id: [0, num_classes -1], BG cat_id: num_classes
+        bg_class_ind = self.num_classes
+        pos_inds = torch.nonzero(
+            (labels >= 0) & (labels < bg_class_ind), as_tuple=False
+        ).squeeze(1)
+
+        score = label_weights.new_zeros(labels.shape)
+
+        if len(pos_inds) > 0:
+            pos_bbox_targets = bbox_targets[pos_inds]
+            pos_bbox_pred = bbox_pred[pos_inds]  # (n, 4 * (reg_max + 1))
+            pos_grid_cells = grid_cells[pos_inds]
+            pos_grid_cell_centers = self.grid_cells_to_center(pos_grid_cells) / stride
+
+            weight_targets = cls_score.detach().sigmoid()
+            weight_targets = weight_targets.max(dim=1)[0][pos_inds]
+            pos_bbox_pred_corners = self.distribution_project(pos_bbox_pred)
+            pos_decode_bbox_pred = distance2bbox(
+                pos_grid_cell_centers, pos_bbox_pred_corners
+            )
+            pos_decode_bbox_targets = pos_bbox_targets / stride
+            score[pos_inds] = bbox_overlaps(
+                pos_decode_bbox_pred.detach(), pos_decode_bbox_targets, is_aligned=True
+            )
+            pred_corners = pos_bbox_pred.reshape(-1, self.reg_max + 1)
+            target_corners = bbox2distance(
+                pos_grid_cell_centers, pos_decode_bbox_targets, self.reg_max
+            ).reshape(-1)
+
+            # regression loss
+            loss_bbox = self.loss_bbox(
+                pos_decode_bbox_pred,
+                pos_decode_bbox_targets,
+                weight=weight_targets,
+                avg_factor=1.0,
+            )
+
+            # dfl loss
+            loss_dfl = self.loss_dfl(
+                pred_corners,
+                target_corners,
+                weight=weight_targets[:, None].expand(-1, 4).reshape(-1),
+                avg_factor=4.0,
+            )
+        else:
+            loss_bbox = bbox_pred.sum() * 0
+            loss_dfl = bbox_pred.sum() * 0
+            weight_targets = torch.tensor(0).to(cls_score.device)
+
+        # qfl loss
+        loss_qfl = self.loss_qfl(
+            cls_score,
+            (labels, score),
+            weight=label_weights,
+            avg_factor=num_total_samples,
+        )
+
+        return loss_qfl, loss_bbox, loss_dfl, weight_targets.sum()
+
+    def target_assign(
+        self,
+        cls_preds,
+        reg_preds,
+        featmap_sizes,
+        gt_bboxes_list,
+        gt_bboxes_ignore_list,
+        gt_labels_list,
+        device,
+    ):
+        """
+        Assign target for a batch of images.
+        :param batch_size: num of images in one batch
+        :param featmap_sizes: A list of all grid cell boxes in all image
+        :param gt_bboxes_list: A list of ground truth boxes in all image
+        :param gt_bboxes_ignore_list: A list of all ignored boxes in all image
+        :param gt_labels_list: A list of all ground truth label in all image
+        :param device: pytorch device
+        :return: Assign results of all images.
+        """
+        batch_size = cls_preds.shape[0]
+        # get grid cells of one image
+        multi_level_grid_cells = [
+            self.get_grid_cells(
+                featmap_sizes[i],
+                self.grid_cell_scale,
+                stride,
+                dtype=torch.float32,
+                device=device,
+            )
+            for i, stride in enumerate(self.strides)
+        ]
+        mlvl_grid_cells_list = [multi_level_grid_cells for i in range(batch_size)]
+
+        # pixel cell number of multi-level feature maps
+        num_level_cells = [grid_cells.size(0) for grid_cells in mlvl_grid_cells_list[0]]
+        num_level_cells_list = [num_level_cells] * batch_size
+        # concat all level cells and to a single tensor
+        for i in range(batch_size):
+            mlvl_grid_cells_list[i] = torch.cat(mlvl_grid_cells_list[i])
+        # compute targets for each image
+        if gt_bboxes_ignore_list is None:
+            gt_bboxes_ignore_list = [None for _ in range(batch_size)]
+        if gt_labels_list is None:
+            gt_labels_list = [None for _ in range(batch_size)]
+        # target assign on all images, get list of tensors
+        # list length = batch size
+        # tensor first dim = num of all grid cell
+        (
+            all_grid_cells,
+            all_labels,
+            all_label_weights,
+            all_bbox_targets,
+            all_bbox_weights,
+            pos_inds_list,
+            neg_inds_list,
+        ) = multi_apply(
+            self.target_assign_single_img,
+            mlvl_grid_cells_list,
+            num_level_cells_list,
+            gt_bboxes_list,
+            gt_bboxes_ignore_list,
+            gt_labels_list,
+        )
+        # no valid cells
+        if any([labels is None for labels in all_labels]):
+            return None
+        # sampled cells of all images
+        num_total_pos = sum([max(inds.numel(), 1) for inds in pos_inds_list])
+        num_total_neg = sum([max(inds.numel(), 1) for inds in neg_inds_list])
+        # merge list of targets tensors into one batch then split to multi levels
+        mlvl_cls_preds = images_to_levels([c for c in cls_preds], num_level_cells)
+        mlvl_reg_preds = images_to_levels([r for r in reg_preds], num_level_cells)
+        mlvl_grid_cells = images_to_levels(all_grid_cells, num_level_cells)
+        mlvl_labels = images_to_levels(all_labels, num_level_cells)
+        mlvl_label_weights = images_to_levels(all_label_weights, num_level_cells)
+        mlvl_bbox_targets = images_to_levels(all_bbox_targets, num_level_cells)
+        mlvl_bbox_weights = images_to_levels(all_bbox_weights, num_level_cells)
+        return (
+            mlvl_cls_preds,
+            mlvl_reg_preds,
+            mlvl_grid_cells,
+            mlvl_labels,
+            mlvl_label_weights,
+            mlvl_bbox_targets,
+            mlvl_bbox_weights,
+            num_total_pos,
+            num_total_neg,
+        )
+
+    def target_assign_single_img(
+        self, grid_cells, num_level_cells, gt_bboxes, gt_bboxes_ignore, gt_labels
+    ):
+        """
+        Using ATSS Assigner to assign target on one image.
+        :param grid_cells: Grid cell boxes of all pixels on feature map
+        :param num_level_cells: numbers of grid cells on each level's feature map
+        :param gt_bboxes: Ground truth boxes
+        :param gt_bboxes_ignore: Ground truths which are ignored
+        :param gt_labels: Ground truth labels
+        :return: Assign results of a single image
+        """
+        device = grid_cells.device
+        gt_bboxes = torch.from_numpy(gt_bboxes).to(device)
+        gt_labels = torch.from_numpy(gt_labels).to(device)
+
+        assign_result = self.assigner.assign(
+            grid_cells, num_level_cells, gt_bboxes, gt_bboxes_ignore, gt_labels
+        )
+
+        pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds = self.sample(
+            assign_result, gt_bboxes
+        )
+
+        num_cells = grid_cells.shape[0]
+        bbox_targets = torch.zeros_like(grid_cells)
+        bbox_weights = torch.zeros_like(grid_cells)
+        labels = grid_cells.new_full((num_cells,), self.num_classes, dtype=torch.long)
+        label_weights = grid_cells.new_zeros(num_cells, dtype=torch.float)
+
+        if len(pos_inds) > 0:
+            pos_bbox_targets = pos_gt_bboxes
+            bbox_targets[pos_inds, :] = pos_bbox_targets
+            bbox_weights[pos_inds, :] = 1.0
+            if gt_labels is None:
+                # Only rpn gives gt_labels as None
+                # Foreground is the first class
+                labels[pos_inds] = 0
+            else:
+                labels[pos_inds] = gt_labels[pos_assigned_gt_inds]
+
+            label_weights[pos_inds] = 1.0
+        if len(neg_inds) > 0:
+            label_weights[neg_inds] = 1.0
+
+        return (
+            grid_cells,
+            labels,
+            label_weights,
+            bbox_targets,
+            bbox_weights,
+            pos_inds,
+            neg_inds,
+        )
+
+    def sample(self, assign_result, gt_bboxes):
+        pos_inds = (
+            torch.nonzero(assign_result.gt_inds > 0, as_tuple=False)
+            .squeeze(-1)
+            .unique()
+        )
+        neg_inds = (
+            torch.nonzero(assign_result.gt_inds == 0, as_tuple=False)
+            .squeeze(-1)
+            .unique()
+        )
+        pos_assigned_gt_inds = assign_result.gt_inds[pos_inds] - 1
+
+        if gt_bboxes.numel() == 0:
+            # hack for index error case
+            assert pos_assigned_gt_inds.numel() == 0
+            pos_gt_bboxes = torch.empty_like(gt_bboxes).view(-1, 4)
+        else:
+            if len(gt_bboxes.shape) < 2:
+                gt_bboxes = gt_bboxes.view(-1, 4)
+            pos_gt_bboxes = gt_bboxes[pos_assigned_gt_inds, :]
+        return pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds
+
+    def post_process(self, preds, meta):
+        cls_scores, bbox_preds = preds.split(
+            [self.num_classes, 4 * (self.reg_max + 1)], dim=-1
+        )
+        result_list = self.get_bboxes(cls_scores, bbox_preds, meta)
+        det_results = {}
+        warp_matrixes = (
+            meta["warp_matrix"]
+            if isinstance(meta["warp_matrix"], list)
+            else meta["warp_matrix"]
+        )
+        img_heights = (
+            meta["img_info"]["height"].cpu().numpy()
+            if isinstance(meta["img_info"]["height"], torch.Tensor)
+            else meta["img_info"]["height"]
+        )
+        img_widths = (
+            meta["img_info"]["width"].cpu().numpy()
+            if isinstance(meta["img_info"]["width"], torch.Tensor)
+            else meta["img_info"]["width"]
+        )
+        img_ids = (
+            meta["img_info"]["id"].cpu().numpy()
+            if isinstance(meta["img_info"]["id"], torch.Tensor)
+            else meta["img_info"]["id"]
+        )
+
+        for result, img_width, img_height, img_id, warp_matrix in zip(
+            result_list, img_widths, img_heights, img_ids, warp_matrixes
+        ):
+            det_result = {}
+            det_bboxes, det_labels = result
+            det_bboxes = det_bboxes.detach().cpu().numpy()
+            det_bboxes[:, :4] = warp_boxes(
+                det_bboxes[:, :4], np.linalg.inv(warp_matrix), img_width, img_height
+            )
+            classes = det_labels.detach().cpu().numpy()
+            for i in range(self.num_classes):
+                inds = classes == i
+                det_result[i] = np.concatenate(
+                    [
+                        det_bboxes[inds, :4].astype(np.float32),
+                        det_bboxes[inds, 4:5].astype(np.float32),
+                    ],
+                    axis=1,
+                ).tolist()
+            det_results[img_id] = det_result
+        return det_results
+
+    def show_result(
+        self, img, dets, class_names, score_thres=0.3, show=True, save_path=None
+    ):
+        result = overlay_bbox_cv(img, dets, class_names, score_thresh=score_thres)
+        if show:
+            cv2.imshow("det", result)
+        return result
+
+    def get_bboxes(self, cls_preds, reg_preds, img_metas):
+        """Decode the outputs to bboxes.
+        Args:
+            cls_preds (Tensor): Shape (num_imgs, num_points, num_classes).
+            reg_preds (Tensor): Shape (num_imgs, num_points, 4 * (regmax + 1)).
+            img_metas (dict): Dict of image info.
+
+        Returns:
+            results_list (list[tuple]): List of detection bboxes and labels.
+        """
+        device = cls_preds.device
+        b = cls_preds.shape[0]
+        input_height, input_width = img_metas["img"].shape[2:]
+        input_shape = (input_height, input_width)
+
+        featmap_sizes = [
+            (math.ceil(input_height / stride), math.ceil(input_width) / stride)
+            for stride in self.strides
+        ]
+        # get grid cells of one image
+        mlvl_center_priors = []
+        for i, stride in enumerate(self.strides):
+            y, x = self.get_single_level_center_point(
+                featmap_sizes[i], stride, torch.float32, device
+            )
+            strides = x.new_full((x.shape[0],), stride)
+            proiors = torch.stack([x, y, strides, strides], dim=-1)
+            mlvl_center_priors.append(proiors.unsqueeze(0).repeat(b, 1, 1))
+
+        center_priors = torch.cat(mlvl_center_priors, dim=1)
+        dis_preds = self.distribution_project(reg_preds) * center_priors[..., 2, None]
+        bboxes = distance2bbox(center_priors[..., :2], dis_preds, max_shape=input_shape)
+        scores = cls_preds.sigmoid()
+        result_list = []
+        for i in range(b):
+            # add a dummy background class at the end of all labels
+            # same with mmdetection2.0
+            score, bbox = scores[i], bboxes[i]
+            padding = score.new_zeros(score.shape[0], 1)
+            score = torch.cat([score, padding], dim=1)
+            results = multiclass_nms(
+                bbox,
+                score,
+                score_thr=0.05,
+                nms_cfg=dict(type="nms", iou_threshold=0.6),
+                max_num=100,
+            )
+            result_list.append(results)
+        return result_list
+
+    def get_single_level_center_point(
+        self, featmap_size, stride, dtype, device, flatten=True
+    ):
+        """
+        Generate pixel centers of a single stage feature map.
+        :param featmap_size: height and width of the feature map
+        :param stride: down sample stride of the feature map
+        :param dtype: data type of the tensors
+        :param device: device of the tensors
+        :param flatten: flatten the x and y tensors
+        :return: y and x of the center points
+        """
+        h, w = featmap_size
+        x_range = (torch.arange(w, dtype=dtype, device=device) + 0.5) * stride
+        y_range = (torch.arange(h, dtype=dtype, device=device) + 0.5) * stride
+        y, x = torch.meshgrid(y_range, x_range)
+        if flatten:
+            y = y.flatten()
+            x = x.flatten()
+        return y, x
+
+    def get_grid_cells(self, featmap_size, scale, stride, dtype, device):
+        """
+        Generate grid cells of a feature map for target assignment.
+        :param featmap_size: Size of a single level feature map.
+        :param scale: Grid cell scale.
+        :param stride: Down sample stride of the feature map.
+        :param dtype: Data type of the tensors.
+        :param device: Device of the tensors.
+        :return: Grid_cells xyxy position. Size should be [feat_w * feat_h, 4]
+        """
+        cell_size = stride * scale
+        y, x = self.get_single_level_center_point(
+            featmap_size, stride, dtype, device, flatten=True
+        )
+        grid_cells = torch.stack(
+            [
+                x - 0.5 * cell_size,
+                y - 0.5 * cell_size,
+                x + 0.5 * cell_size,
+                y + 0.5 * cell_size,
+            ],
+            dim=-1,
+        )
+        return grid_cells
+
+    def grid_cells_to_center(self, grid_cells):
+        """
+        Get center location of each gird cell
+        :param grid_cells: grid cells of a feature map
+        :return: center points
+        """
+        cells_cx = (grid_cells[:, 2] + grid_cells[:, 0]) / 2
+        cells_cy = (grid_cells[:, 3] + grid_cells[:, 1]) / 2
+        return torch.stack([cells_cx, cells_cy], dim=-1)
+
+    def _forward_onnx(self, feats):
+        """only used for onnx export"""
+        outputs = []
+        for x, scale in zip(feats, self.scales):
+            cls_feat = x
+            reg_feat = x
+            for cls_conv in self.cls_convs:
+                cls_feat = cls_conv(cls_feat)
+            for reg_conv in self.reg_convs:
+                reg_feat = reg_conv(reg_feat)
+            cls_pred = self.gfl_cls(cls_feat)
+            reg_pred = scale(self.gfl_reg(reg_feat))
+            cls_pred = cls_pred.sigmoid()
+            out = torch.cat([cls_pred, reg_pred], dim=1)
+            outputs.append(out.flatten(start_dim=2))
+        return torch.cat(outputs, dim=2).permute(0, 2, 1)
diff --git a/nanodet/model/head/nanodet_head.py b/nanodet/model/head/nanodet_head.py
new file mode 100644
index 0000000..8e145d6
--- /dev/null
+++ b/nanodet/model/head/nanodet_head.py
@@ -0,0 +1,185 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch
+import torch.nn as nn
+
+from ..module.conv import ConvModule, DepthwiseConvModule
+from ..module.init_weights import normal_init
+from .gfl_head import GFLHead
+
+
+class NanoDetHead(GFLHead):
+    """
+    Modified from GFL, use same loss functions but much lightweight convolution heads
+    """
+
+    def __init__(
+        self,
+        num_classes,
+        loss,
+        input_channel,
+        stacked_convs=2,
+        octave_base_scale=5,
+        conv_type="DWConv",
+        conv_cfg=None,
+        norm_cfg=dict(type="BN"),
+        reg_max=16,
+        share_cls_reg=False,
+        activation="LeakyReLU",
+        feat_channels=256,
+        strides=[8, 16, 32],
+        **kwargs
+    ):
+        self.share_cls_reg = share_cls_reg
+        self.activation = activation
+        self.ConvModule = ConvModule if conv_type == "Conv" else DepthwiseConvModule
+        super(NanoDetHead, self).__init__(
+            num_classes,
+            loss,
+            input_channel,
+            feat_channels,
+            stacked_convs,
+            octave_base_scale,
+            strides,
+            conv_cfg,
+            norm_cfg,
+            reg_max,
+            **kwargs
+        )
+
+    def _init_layers(self):
+        self.cls_convs = nn.ModuleList()
+        self.reg_convs = nn.ModuleList()
+        for _ in self.strides:
+            cls_convs, reg_convs = self._buid_not_shared_head()
+            self.cls_convs.append(cls_convs)
+            self.reg_convs.append(reg_convs)
+
+        self.gfl_cls = nn.ModuleList(
+            [
+                nn.Conv2d(
+                    self.feat_channels,
+                    self.cls_out_channels + 4 * (self.reg_max + 1)
+                    if self.share_cls_reg
+                    else self.cls_out_channels,
+                    1,
+                    padding=0,
+                )
+                for _ in self.strides
+            ]
+        )
+        # TODO: if
+        self.gfl_reg = nn.ModuleList(
+            [
+                nn.Conv2d(self.feat_channels, 4 * (self.reg_max + 1), 1, padding=0)
+                for _ in self.strides
+            ]
+        )
+
+    def _buid_not_shared_head(self):
+        cls_convs = nn.ModuleList()
+        reg_convs = nn.ModuleList()
+        for i in range(self.stacked_convs):
+            chn = self.in_channels if i == 0 else self.feat_channels
+            cls_convs.append(
+                self.ConvModule(
+                    chn,
+                    self.feat_channels,
+                    3,
+                    stride=1,
+                    padding=1,
+                    norm_cfg=self.norm_cfg,
+                    bias=self.norm_cfg is None,
+                    activation=self.activation,
+                )
+            )
+            if not self.share_cls_reg:
+                reg_convs.append(
+                    self.ConvModule(
+                        chn,
+                        self.feat_channels,
+                        3,
+                        stride=1,
+                        padding=1,
+                        norm_cfg=self.norm_cfg,
+                        bias=self.norm_cfg is None,
+                        activation=self.activation,
+                    )
+                )
+
+        return cls_convs, reg_convs
+
+    def init_weights(self):
+        for m in self.cls_convs.modules():
+            if isinstance(m, nn.Conv2d):
+                normal_init(m, std=0.01)
+        for m in self.reg_convs.modules():
+            if isinstance(m, nn.Conv2d):
+                normal_init(m, std=0.01)
+        # init cls head with confidence = 0.01
+        bias_cls = -4.595
+        for i in range(len(self.strides)):
+            normal_init(self.gfl_cls[i], std=0.01, bias=bias_cls)
+            normal_init(self.gfl_reg[i], std=0.01)
+        print("Finish initialize NanoDet Head.")
+
+    def forward(self, feats):
+        if torch.onnx.is_in_onnx_export():
+            return self._forward_onnx(feats)
+        outputs = []
+        for x, cls_convs, reg_convs, gfl_cls, gfl_reg in zip(
+            feats, self.cls_convs, self.reg_convs, self.gfl_cls, self.gfl_reg
+        ):
+            cls_feat = x
+            reg_feat = x
+            for cls_conv in cls_convs:
+                cls_feat = cls_conv(cls_feat)
+            for reg_conv in reg_convs:
+                reg_feat = reg_conv(reg_feat)
+            if self.share_cls_reg:
+                output = gfl_cls(cls_feat)
+            else:
+                cls_score = gfl_cls(cls_feat)
+                bbox_pred = gfl_reg(reg_feat)
+                output = torch.cat([cls_score, bbox_pred], dim=1)
+            outputs.append(output.flatten(start_dim=2))
+        outputs = torch.cat(outputs, dim=2).permute(0, 2, 1)
+        return outputs
+
+    def _forward_onnx(self, feats):
+        """only used for onnx export"""
+        outputs = []
+        for x, cls_convs, reg_convs, gfl_cls, gfl_reg in zip(
+            feats, self.cls_convs, self.reg_convs, self.gfl_cls, self.gfl_reg
+        ):
+            cls_feat = x
+            reg_feat = x
+            for cls_conv in cls_convs:
+                cls_feat = cls_conv(cls_feat)
+            for reg_conv in reg_convs:
+                reg_feat = reg_conv(reg_feat)
+            if self.share_cls_reg:
+                output = gfl_cls(cls_feat)
+                cls_pred, reg_pred = output.split(
+                    [self.num_classes, 4 * (self.reg_max + 1)], dim=1
+                )
+            else:
+                cls_pred = gfl_cls(cls_feat)
+                reg_pred = gfl_reg(reg_feat)
+
+            cls_pred = cls_pred.sigmoid()
+            out = torch.cat([cls_pred, reg_pred], dim=1)
+            outputs.append(out.flatten(start_dim=2))
+        return torch.cat(outputs, dim=2).permute(0, 2, 1)
diff --git a/nanodet/model/head/nanodet_plus_head.py b/nanodet/model/head/nanodet_plus_head.py
new file mode 100644
index 0000000..94bdf01
--- /dev/null
+++ b/nanodet/model/head/nanodet_plus_head.py
@@ -0,0 +1,518 @@
+import math
+
+import cv2
+import numpy as np
+import torch
+import torch.nn as nn
+
+from nanodet.util import bbox2distance, distance2bbox, multi_apply, overlay_bbox_cv
+
+from ...data.transform.warp import warp_boxes
+from ..loss.gfocal_loss import DistributionFocalLoss, QualityFocalLoss
+from ..loss.iou_loss import GIoULoss
+from ..module.conv import ConvModule, DepthwiseConvModule
+from ..module.init_weights import normal_init
+from ..module.nms import multiclass_nms
+from .assigner.dsl_assigner import DynamicSoftLabelAssigner
+from .gfl_head import Integral, reduce_mean
+
+
+class NanoDetPlusHead(nn.Module):
+    """Detection head used in NanoDet-Plus.
+
+    Args:
+        num_classes (int): Number of categories excluding the background
+            category.
+        loss (dict): Loss config.
+        input_channel (int): Number of channels of the input feature.
+        feat_channels (int): Number of channels of the feature.
+            Default: 96.
+        stacked_convs (int): Number of conv layers in the stacked convs.
+            Default: 2.
+        kernel_size (int): Size of the convolving kernel. Default: 5.
+        strides (list[int]): Strides of input multi-level feature maps.
+            Default: [8, 16, 32].
+        conv_type (str): Type of the convolution.
+            Default: "DWConv".
+        norm_cfg (dict): Dictionary to construct and config norm layer.
+            Default: dict(type='BN').
+        reg_max (int): The maximal value of the discrete set. Default: 7.
+        activation (str): Type of activation function. Default: "LeakyReLU".
+        assigner_cfg (dict): Config dict of the assigner. Default: dict(topk=13).
+    """
+
+    def __init__(
+        self,
+        num_classes,
+        loss,
+        input_channel,
+        feat_channels=96,
+        stacked_convs=2,
+        kernel_size=5,
+        strides=[8, 16, 32],
+        conv_type="DWConv",
+        norm_cfg=dict(type="BN"),
+        reg_max=7,
+        activation="LeakyReLU",
+        assigner_cfg=dict(topk=13),
+        **kwargs
+    ):
+        super(NanoDetPlusHead, self).__init__()
+        self.num_classes = num_classes
+        self.in_channels = input_channel
+        self.feat_channels = feat_channels
+        self.stacked_convs = stacked_convs
+        self.kernel_size = kernel_size
+        self.strides = strides
+        self.reg_max = reg_max
+        self.activation = activation
+        self.ConvModule = ConvModule if conv_type == "Conv" else DepthwiseConvModule
+
+        self.loss_cfg = loss
+        self.norm_cfg = norm_cfg
+
+        self.assigner = DynamicSoftLabelAssigner(**assigner_cfg)
+        self.distribution_project = Integral(self.reg_max)
+
+        self.loss_qfl = QualityFocalLoss(
+            beta=self.loss_cfg.loss_qfl.beta,
+            loss_weight=self.loss_cfg.loss_qfl.loss_weight,
+        )
+        self.loss_dfl = DistributionFocalLoss(
+            loss_weight=self.loss_cfg.loss_dfl.loss_weight
+        )
+        self.loss_bbox = GIoULoss(loss_weight=self.loss_cfg.loss_bbox.loss_weight)
+        self._init_layers()
+        self.init_weights()
+
+    def _init_layers(self):
+        self.cls_convs = nn.ModuleList()
+        for _ in self.strides:
+            cls_convs = self._buid_not_shared_head()
+            self.cls_convs.append(cls_convs)
+
+        self.gfl_cls = nn.ModuleList(
+            [
+                nn.Conv2d(
+                    self.feat_channels,
+                    self.num_classes + 4 * (self.reg_max + 1),
+                    1,
+                    padding=0,
+                )
+                for _ in self.strides
+            ]
+        )
+
+    def _buid_not_shared_head(self):
+        cls_convs = nn.ModuleList()
+        for i in range(self.stacked_convs):
+            chn = self.in_channels if i == 0 else self.feat_channels
+            cls_convs.append(
+                self.ConvModule(
+                    chn,
+                    self.feat_channels,
+                    self.kernel_size,
+                    stride=1,
+                    padding=self.kernel_size // 2,
+                    norm_cfg=self.norm_cfg,
+                    bias=self.norm_cfg is None,
+                    activation=self.activation,
+                )
+            )
+        return cls_convs
+
+    def init_weights(self):
+        for m in self.cls_convs.modules():
+            if isinstance(m, nn.Conv2d):
+                normal_init(m, std=0.01)
+        # init cls head with confidence = 0.01
+        bias_cls = -4.595
+        for i in range(len(self.strides)):
+            normal_init(self.gfl_cls[i], std=0.01, bias=bias_cls)
+        print("Finish initialize NanoDet-Plus Head.")
+
+    def forward(self, feats):
+        if torch.onnx.is_in_onnx_export():
+            return self._forward_onnx(feats)
+        outputs = []
+        for feat, cls_convs, gfl_cls in zip(
+            feats,
+            self.cls_convs,
+            self.gfl_cls,
+        ):
+            for conv in cls_convs:
+                feat = conv(feat)
+            output = gfl_cls(feat)
+            outputs.append(output.flatten(start_dim=2))
+        outputs = torch.cat(outputs, dim=2).permute(0, 2, 1)
+        return outputs
+
+    def loss(self, preds, gt_meta, aux_preds=None):
+        """Compute losses.
+        Args:
+            preds (Tensor): Prediction output.
+            gt_meta (dict): Ground truth information.
+            aux_preds (tuple[Tensor], optional): Auxiliary head prediction output.
+
+        Returns:
+            loss (Tensor): Loss tensor.
+            loss_states (dict): State dict of each loss.
+        """
+        gt_bboxes = gt_meta["gt_bboxes"]
+        gt_labels = gt_meta["gt_labels"]
+        device = preds.device
+        batch_size = preds.shape[0]
+        input_height, input_width = gt_meta["img"].shape[2:]
+        featmap_sizes = [
+            (math.ceil(input_height / stride), math.ceil(input_width) / stride)
+            for stride in self.strides
+        ]
+        # get grid cells of one image
+        mlvl_center_priors = [
+            self.get_single_level_center_priors(
+                batch_size,
+                featmap_sizes[i],
+                stride,
+                dtype=torch.float32,
+                device=device,
+            )
+            for i, stride in enumerate(self.strides)
+        ]
+        center_priors = torch.cat(mlvl_center_priors, dim=1)
+
+        cls_preds, reg_preds = preds.split(
+            [self.num_classes, 4 * (self.reg_max + 1)], dim=-1
+        )
+        dis_preds = self.distribution_project(reg_preds) * center_priors[..., 2, None]
+        decoded_bboxes = distance2bbox(center_priors[..., :2], dis_preds)
+
+        if aux_preds is not None:
+            # use auxiliary head to assign
+            aux_cls_preds, aux_reg_preds = aux_preds.split(
+                [self.num_classes, 4 * (self.reg_max + 1)], dim=-1
+            )
+            aux_dis_preds = (
+                self.distribution_project(aux_reg_preds) * center_priors[..., 2, None]
+            )
+            aux_decoded_bboxes = distance2bbox(center_priors[..., :2], aux_dis_preds)
+            batch_assign_res = multi_apply(
+                self.target_assign_single_img,
+                aux_cls_preds.detach(),
+                center_priors,
+                aux_decoded_bboxes.detach(),
+                gt_bboxes,
+                gt_labels,
+            )
+        else:
+            # use self prediction to assign
+            batch_assign_res = multi_apply(
+                self.target_assign_single_img,
+                cls_preds.detach(),
+                center_priors,
+                decoded_bboxes.detach(),
+                gt_bboxes,
+                gt_labels,
+            )
+
+        loss, loss_states = self._get_loss_from_assign(
+            cls_preds, reg_preds, decoded_bboxes, batch_assign_res
+        )
+
+        if aux_preds is not None:
+            aux_loss, aux_loss_states = self._get_loss_from_assign(
+                aux_cls_preds, aux_reg_preds, aux_decoded_bboxes, batch_assign_res
+            )
+            loss = loss + aux_loss
+            for k, v in aux_loss_states.items():
+                loss_states["aux_" + k] = v
+        return loss, loss_states
+
+    def _get_loss_from_assign(self, cls_preds, reg_preds, decoded_bboxes, assign):
+        device = cls_preds.device
+        labels, label_scores, bbox_targets, dist_targets, num_pos = assign
+        num_total_samples = max(
+            reduce_mean(torch.tensor(sum(num_pos)).to(device)).item(), 1.0
+        )
+
+        labels = torch.cat(labels, dim=0)
+        label_scores = torch.cat(label_scores, dim=0)
+        bbox_targets = torch.cat(bbox_targets, dim=0)
+        cls_preds = cls_preds.reshape(-1, self.num_classes)
+        reg_preds = reg_preds.reshape(-1, 4 * (self.reg_max + 1))
+        decoded_bboxes = decoded_bboxes.reshape(-1, 4)
+        loss_qfl = self.loss_qfl(
+            cls_preds, (labels, label_scores), avg_factor=num_total_samples
+        )
+
+        pos_inds = torch.nonzero(
+            (labels >= 0) & (labels < self.num_classes), as_tuple=False
+        ).squeeze(1)
+
+        if len(pos_inds) > 0:
+            weight_targets = cls_preds[pos_inds].detach().sigmoid().max(dim=1)[0]
+            bbox_avg_factor = max(reduce_mean(weight_targets.sum()).item(), 1.0)
+
+            loss_bbox = self.loss_bbox(
+                decoded_bboxes[pos_inds],
+                bbox_targets[pos_inds],
+                weight=weight_targets,
+                avg_factor=bbox_avg_factor,
+            )
+
+            dist_targets = torch.cat(dist_targets, dim=0)
+            loss_dfl = self.loss_dfl(
+                reg_preds[pos_inds].reshape(-1, self.reg_max + 1),
+                dist_targets[pos_inds].reshape(-1),
+                weight=weight_targets[:, None].expand(-1, 4).reshape(-1),
+                avg_factor=4.0 * bbox_avg_factor,
+            )
+        else:
+            loss_bbox = reg_preds.sum() * 0
+            loss_dfl = reg_preds.sum() * 0
+
+        loss = loss_qfl + loss_bbox + loss_dfl
+        loss_states = dict(loss_qfl=loss_qfl, loss_bbox=loss_bbox, loss_dfl=loss_dfl)
+        return loss, loss_states
+
+    @torch.no_grad()
+    def target_assign_single_img(
+        self, cls_preds, center_priors, decoded_bboxes, gt_bboxes, gt_labels
+    ):
+        """Compute classification, regression, and objectness targets for
+        priors in a single image.
+        Args:
+            cls_preds (Tensor): Classification predictions of one image,
+                a 2D-Tensor with shape [num_priors, num_classes]
+            center_priors (Tensor): All priors of one image, a 2D-Tensor with
+                shape [num_priors, 4] in [cx, xy, stride_w, stride_y] format.
+            decoded_bboxes (Tensor): Decoded bboxes predictions of one image,
+                a 2D-Tensor with shape [num_priors, 4] in [tl_x, tl_y,
+                br_x, br_y] format.
+            gt_bboxes (Tensor): Ground truth bboxes of one image, a 2D-Tensor
+                with shape [num_gts, 4] in [tl_x, tl_y, br_x, br_y] format.
+            gt_labels (Tensor): Ground truth labels of one image, a Tensor
+                with shape [num_gts].
+        """
+
+        num_priors = center_priors.size(0)
+        device = center_priors.device
+        gt_bboxes = torch.from_numpy(gt_bboxes).to(device)
+        gt_labels = torch.from_numpy(gt_labels).to(device)
+        num_gts = gt_labels.size(0)
+        gt_bboxes = gt_bboxes.to(decoded_bboxes.dtype)
+
+        bbox_targets = torch.zeros_like(center_priors)
+        dist_targets = torch.zeros_like(center_priors)
+        labels = center_priors.new_full(
+            (num_priors,), self.num_classes, dtype=torch.long
+        )
+        label_scores = center_priors.new_zeros(labels.shape, dtype=torch.float)
+        # No target
+        if num_gts == 0:
+            return labels, label_scores, bbox_targets, dist_targets, 0
+
+        assign_result = self.assigner.assign(
+            cls_preds.sigmoid(), center_priors, decoded_bboxes, gt_bboxes, gt_labels
+        )
+        pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds = self.sample(
+            assign_result, gt_bboxes
+        )
+        num_pos_per_img = pos_inds.size(0)
+        pos_ious = assign_result.max_overlaps[pos_inds]
+
+        if len(pos_inds) > 0:
+            bbox_targets[pos_inds, :] = pos_gt_bboxes
+            dist_targets[pos_inds, :] = (
+                bbox2distance(center_priors[pos_inds, :2], pos_gt_bboxes)
+                / center_priors[pos_inds, None, 2]
+            )
+            dist_targets = dist_targets.clamp(min=0, max=self.reg_max - 0.1)
+            labels[pos_inds] = gt_labels[pos_assigned_gt_inds]
+            label_scores[pos_inds] = pos_ious
+        return (
+            labels,
+            label_scores,
+            bbox_targets,
+            dist_targets,
+            num_pos_per_img,
+        )
+
+    def sample(self, assign_result, gt_bboxes):
+        """Sample positive and negative bboxes."""
+        pos_inds = (
+            torch.nonzero(assign_result.gt_inds > 0, as_tuple=False)
+            .squeeze(-1)
+            .unique()
+        )
+        neg_inds = (
+            torch.nonzero(assign_result.gt_inds == 0, as_tuple=False)
+            .squeeze(-1)
+            .unique()
+        )
+        pos_assigned_gt_inds = assign_result.gt_inds[pos_inds] - 1
+
+        if gt_bboxes.numel() == 0:
+            # hack for index error case
+            assert pos_assigned_gt_inds.numel() == 0
+            pos_gt_bboxes = torch.empty_like(gt_bboxes).view(-1, 4)
+        else:
+            if len(gt_bboxes.shape) < 2:
+                gt_bboxes = gt_bboxes.view(-1, 4)
+            pos_gt_bboxes = gt_bboxes[pos_assigned_gt_inds, :]
+        return pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds
+
+    def post_process(self, preds, meta):
+        """Prediction results post processing. Decode bboxes and rescale
+        to original image size.
+        Args:
+            preds (Tensor): Prediction output.
+            meta (dict): Meta info.
+        """
+        cls_scores, bbox_preds = preds.split(
+            [self.num_classes, 4 * (self.reg_max + 1)], dim=-1
+        )
+        result_list = self.get_bboxes(cls_scores, bbox_preds, meta)
+        det_results = {}
+        warp_matrixes = (
+            meta["warp_matrix"]
+            if isinstance(meta["warp_matrix"], list)
+            else meta["warp_matrix"]
+        )
+        img_heights = (
+            meta["img_info"]["height"].cpu().numpy()
+            if isinstance(meta["img_info"]["height"], torch.Tensor)
+            else meta["img_info"]["height"]
+        )
+        img_widths = (
+            meta["img_info"]["width"].cpu().numpy()
+            if isinstance(meta["img_info"]["width"], torch.Tensor)
+            else meta["img_info"]["width"]
+        )
+        img_ids = (
+            meta["img_info"]["id"].cpu().numpy()
+            if isinstance(meta["img_info"]["id"], torch.Tensor)
+            else meta["img_info"]["id"]
+        )
+
+        for result, img_width, img_height, img_id, warp_matrix in zip(
+            result_list, img_widths, img_heights, img_ids, warp_matrixes
+        ):
+            det_result = {}
+            det_bboxes, det_labels = result
+            det_bboxes = det_bboxes.detach().cpu().numpy()
+            det_bboxes[:, :4] = warp_boxes(
+                det_bboxes[:, :4], np.linalg.inv(warp_matrix), img_width, img_height
+            )
+            classes = det_labels.detach().cpu().numpy()
+            for i in range(self.num_classes):
+                inds = classes == i
+                det_result[i] = np.concatenate(
+                    [
+                        det_bboxes[inds, :4].astype(np.float32),
+                        det_bboxes[inds, 4:5].astype(np.float32),
+                    ],
+                    axis=1,
+                ).tolist()
+            det_results[img_id] = det_result
+        return det_results
+
+    def show_result(
+        self, img, dets, class_names, score_thres=0.3, show=True, save_path=None
+    ):
+        result, all_box = overlay_bbox_cv(img, dets, class_names, score_thresh=score_thres)
+        # if show:
+        #     cv2.imshow("det", result)
+        return result,all_box
+
+    def get_bboxes(self, cls_preds, reg_preds, img_metas):
+        """Decode the outputs to bboxes.
+        Args:
+            cls_preds (Tensor): Shape (num_imgs, num_points, num_classes).
+            reg_preds (Tensor): Shape (num_imgs, num_points, 4 * (regmax + 1)).
+            img_metas (dict): Dict of image info.
+
+        Returns:
+            results_list (list[tuple]): List of detection bboxes and labels.
+        """
+        device = cls_preds.device
+        b = cls_preds.shape[0]
+        input_height, input_width = img_metas["img"].shape[2:]
+        input_shape = (input_height, input_width)
+
+        featmap_sizes = [
+            (math.ceil(input_height / stride), math.ceil(input_width) / stride)
+            for stride in self.strides
+        ]
+        # get grid cells of one image
+        mlvl_center_priors = [
+            self.get_single_level_center_priors(
+                b,
+                featmap_sizes[i],
+                stride,
+                dtype=torch.float32,
+                device=device,
+            )
+            for i, stride in enumerate(self.strides)
+        ]
+        center_priors = torch.cat(mlvl_center_priors, dim=1)
+        dis_preds = self.distribution_project(reg_preds) * center_priors[..., 2, None]
+        bboxes = distance2bbox(center_priors[..., :2], dis_preds, max_shape=input_shape)
+        scores = cls_preds.sigmoid()
+        result_list = []
+        for i in range(b):
+            # add a dummy background class at the end of all labels
+            # same with mmdetection2.0
+            score, bbox = scores[i], bboxes[i]
+            padding = score.new_zeros(score.shape[0], 1)
+            score = torch.cat([score, padding], dim=1)
+            results = multiclass_nms(
+                bbox,
+                score,
+                score_thr=0.05,
+                nms_cfg=dict(type="nms", iou_threshold=0.6),
+                max_num=100,
+            )
+            result_list.append(results)
+        return result_list
+
+    def get_single_level_center_priors(
+        self, batch_size, featmap_size, stride, dtype, device
+    ):
+        """Generate centers of a single stage feature map.
+        Args:
+            batch_size (int): Number of images in one batch.
+            featmap_size (tuple[int]): height and width of the feature map
+            stride (int): down sample stride of the feature map
+            dtype (obj:`torch.dtype`): data type of the tensors
+            device (obj:`torch.device`): device of the tensors
+        Return:
+            priors (Tensor): center priors of a single level feature map.
+        """
+        h, w = featmap_size
+        x_range = (torch.arange(w, dtype=dtype, device=device)) * stride
+        y_range = (torch.arange(h, dtype=dtype, device=device)) * stride
+        y, x = torch.meshgrid(y_range, x_range)
+        y = y.flatten()
+        x = x.flatten()
+        strides = x.new_full((x.shape[0],), stride)
+        proiors = torch.stack([x, y, strides, strides], dim=-1)
+        return proiors.unsqueeze(0).repeat(batch_size, 1, 1)
+
+    def _forward_onnx(self, feats):
+        """only used for onnx export"""
+        outputs = []
+        for feat, cls_convs, gfl_cls in zip(
+            feats,
+            self.cls_convs,
+            self.gfl_cls,
+        ):
+            for conv in cls_convs:
+                feat = conv(feat)
+            output = gfl_cls(feat)
+            cls_pred, reg_pred = output.split(
+                [self.num_classes, 4 * (self.reg_max + 1)], dim=1
+            )
+            cls_pred = cls_pred.sigmoid()
+            out = torch.cat([cls_pred, reg_pred], dim=1)
+            outputs.append(out.flatten(start_dim=2))
+        return torch.cat(outputs, dim=2).permute(0, 2, 1)
diff --git a/nanodet/model/head/simple_conv_head.py b/nanodet/model/head/simple_conv_head.py
new file mode 100644
index 0000000..cece6d8
--- /dev/null
+++ b/nanodet/model/head/simple_conv_head.py
@@ -0,0 +1,100 @@
+import torch
+import torch.nn as nn
+
+from ..module.conv import ConvModule
+from ..module.init_weights import normal_init
+from ..module.scale import Scale
+
+
+class SimpleConvHead(nn.Module):
+    def __init__(
+        self,
+        num_classes,
+        input_channel,
+        feat_channels=256,
+        stacked_convs=4,
+        strides=[8, 16, 32],
+        conv_cfg=None,
+        norm_cfg=dict(type="GN", num_groups=32, requires_grad=True),
+        activation="LeakyReLU",
+        reg_max=16,
+        **kwargs
+    ):
+        super(SimpleConvHead, self).__init__()
+        self.num_classes = num_classes
+        self.in_channels = input_channel
+        self.feat_channels = feat_channels
+        self.stacked_convs = stacked_convs
+        self.strides = strides
+        self.reg_max = reg_max
+
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.activation = activation
+        self.cls_out_channels = num_classes
+
+        self._init_layers()
+        self.init_weights()
+
+    def _init_layers(self):
+        self.relu = nn.ReLU(inplace=True)
+        self.cls_convs = nn.ModuleList()
+        self.reg_convs = nn.ModuleList()
+        for i in range(self.stacked_convs):
+            chn = self.in_channels if i == 0 else self.feat_channels
+            self.cls_convs.append(
+                ConvModule(
+                    chn,
+                    self.feat_channels,
+                    3,
+                    stride=1,
+                    padding=1,
+                    conv_cfg=self.conv_cfg,
+                    norm_cfg=self.norm_cfg,
+                    activation=self.activation,
+                )
+            )
+            self.reg_convs.append(
+                ConvModule(
+                    chn,
+                    self.feat_channels,
+                    3,
+                    stride=1,
+                    padding=1,
+                    conv_cfg=self.conv_cfg,
+                    norm_cfg=self.norm_cfg,
+                    activation=self.activation,
+                )
+            )
+        self.gfl_cls = nn.Conv2d(
+            self.feat_channels, self.cls_out_channels, 3, padding=1
+        )
+        self.gfl_reg = nn.Conv2d(
+            self.feat_channels, 4 * (self.reg_max + 1), 3, padding=1
+        )
+        self.scales = nn.ModuleList([Scale(1.0) for _ in self.strides])
+
+    def init_weights(self):
+        for m in self.cls_convs:
+            normal_init(m.conv, std=0.01)
+        for m in self.reg_convs:
+            normal_init(m.conv, std=0.01)
+        bias_cls = -4.595
+        normal_init(self.gfl_cls, std=0.01, bias=bias_cls)
+        normal_init(self.gfl_reg, std=0.01)
+
+    def forward(self, feats):
+        outputs = []
+        for x, scale in zip(feats, self.scales):
+            cls_feat = x
+            reg_feat = x
+            for cls_conv in self.cls_convs:
+                cls_feat = cls_conv(cls_feat)
+            for reg_conv in self.reg_convs:
+                reg_feat = reg_conv(reg_feat)
+            cls_score = self.gfl_cls(cls_feat)
+            bbox_pred = scale(self.gfl_reg(reg_feat)).float()
+            output = torch.cat([cls_score, bbox_pred], dim=1)
+            outputs.append(output.flatten(start_dim=2))
+        outputs = torch.cat(outputs, dim=2).permute(0, 2, 1)
+        return outputs
diff --git a/nanodet/model/loss/gfocal_loss.py b/nanodet/model/loss/gfocal_loss.py
new file mode 100644
index 0000000..6759e93
--- /dev/null
+++ b/nanodet/model/loss/gfocal_loss.py
@@ -0,0 +1,180 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from .utils import weighted_loss
+
+
+@weighted_loss
+def quality_focal_loss(pred, target, beta=2.0):
+    r"""Quality Focal Loss (QFL) is from `Generalized Focal Loss: Learning
+    Qualified and Distributed Bounding Boxes for Dense Object Detection
+    <https://arxiv.org/abs/2006.04388>`_.
+
+    Args:
+        pred (torch.Tensor): Predicted joint representation of classification
+            and quality (IoU) estimation with shape (N, C), C is the number of
+            classes.
+        target (tuple([torch.Tensor])): Target category label with shape (N,)
+            and target quality label with shape (N,).
+        beta (float): The beta parameter for calculating the modulating factor.
+            Defaults to 2.0.
+
+    Returns:
+        torch.Tensor: Loss tensor with shape (N,).
+    """
+    assert (
+        len(target) == 2
+    ), """target for QFL must be a tuple of two elements,
+        including category label and quality label, respectively"""
+    # label denotes the category id, score denotes the quality score
+    label, score = target
+
+    # negatives are supervised by 0 quality score
+    pred_sigmoid = pred.sigmoid()
+    scale_factor = pred_sigmoid
+    zerolabel = scale_factor.new_zeros(pred.shape)
+    loss = F.binary_cross_entropy_with_logits(
+        pred, zerolabel, reduction="none"
+    ) * scale_factor.pow(beta)
+
+    # FG cat_id: [0, num_classes -1], BG cat_id: num_classes
+    bg_class_ind = pred.size(1)
+    pos = torch.nonzero((label >= 0) & (label < bg_class_ind), as_tuple=False).squeeze(
+        1
+    )
+    pos_label = label[pos].long()
+    # positives are supervised by bbox quality (IoU) score
+    scale_factor = score[pos] - pred_sigmoid[pos, pos_label]
+    loss[pos, pos_label] = F.binary_cross_entropy_with_logits(
+        pred[pos, pos_label], score[pos], reduction="none"
+    ) * scale_factor.abs().pow(beta)
+
+    loss = loss.sum(dim=1, keepdim=False)
+    return loss
+
+
+@weighted_loss
+def distribution_focal_loss(pred, label):
+    r"""Distribution Focal Loss (DFL) is from `Generalized Focal Loss: Learning
+    Qualified and Distributed Bounding Boxes for Dense Object Detection
+    <https://arxiv.org/abs/2006.04388>`_.
+
+    Args:
+        pred (torch.Tensor): Predicted general distribution of bounding boxes
+            (before softmax) with shape (N, n+1), n is the max value of the
+            integral set `{0, ..., n}` in paper.
+        label (torch.Tensor): Target distance label for bounding boxes with
+            shape (N,).
+
+    Returns:
+        torch.Tensor: Loss tensor with shape (N,).
+    """
+    dis_left = label.long()
+    dis_right = dis_left + 1
+    weight_left = dis_right.float() - label
+    weight_right = label - dis_left.float()
+    loss = (
+        F.cross_entropy(pred, dis_left, reduction="none") * weight_left
+        + F.cross_entropy(pred, dis_right, reduction="none") * weight_right
+    )
+    return loss
+
+
+class QualityFocalLoss(nn.Module):
+    r"""Quality Focal Loss (QFL) is a variant of `Generalized Focal Loss:
+    Learning Qualified and Distributed Bounding Boxes for Dense Object
+    Detection <https://arxiv.org/abs/2006.04388>`_.
+
+    Args:
+        use_sigmoid (bool): Whether sigmoid operation is conducted in QFL.
+            Defaults to True.
+        beta (float): The beta parameter for calculating the modulating factor.
+            Defaults to 2.0.
+        reduction (str): Options are "none", "mean" and "sum".
+        loss_weight (float): Loss weight of current loss.
+    """
+
+    def __init__(self, use_sigmoid=True, beta=2.0, reduction="mean", loss_weight=1.0):
+        super(QualityFocalLoss, self).__init__()
+        assert use_sigmoid is True, "Only sigmoid in QFL supported now."
+        self.use_sigmoid = use_sigmoid
+        self.beta = beta
+        self.reduction = reduction
+        self.loss_weight = loss_weight
+
+    def forward(
+        self, pred, target, weight=None, avg_factor=None, reduction_override=None
+    ):
+        """Forward function.
+
+        Args:
+            pred (torch.Tensor): Predicted joint representation of
+                classification and quality (IoU) estimation with shape (N, C),
+                C is the number of classes.
+            target (tuple([torch.Tensor])): Target category label with shape
+                (N,) and target quality label with shape (N,).
+            weight (torch.Tensor, optional): The weight of loss for each
+                prediction. Defaults to None.
+            avg_factor (int, optional): Average factor that is used to average
+                the loss. Defaults to None.
+            reduction_override (str, optional): The reduction method used to
+                override the original reduction method of the loss.
+                Defaults to None.
+        """
+        assert reduction_override in (None, "none", "mean", "sum")
+        reduction = reduction_override if reduction_override else self.reduction
+        if self.use_sigmoid:
+            loss_cls = self.loss_weight * quality_focal_loss(
+                pred,
+                target,
+                weight,
+                beta=self.beta,
+                reduction=reduction,
+                avg_factor=avg_factor,
+            )
+        else:
+            raise NotImplementedError
+        return loss_cls
+
+
+class DistributionFocalLoss(nn.Module):
+    r"""Distribution Focal Loss (DFL) is a variant of `Generalized Focal Loss:
+    Learning Qualified and Distributed Bounding Boxes for Dense Object
+    Detection <https://arxiv.org/abs/2006.04388>`_.
+
+    Args:
+        reduction (str): Options are `'none'`, `'mean'` and `'sum'`.
+        loss_weight (float): Loss weight of current loss.
+    """
+
+    def __init__(self, reduction="mean", loss_weight=1.0):
+        super(DistributionFocalLoss, self).__init__()
+        self.reduction = reduction
+        self.loss_weight = loss_weight
+
+    def forward(
+        self, pred, target, weight=None, avg_factor=None, reduction_override=None
+    ):
+        """Forward function.
+
+        Args:
+            pred (torch.Tensor): Predicted general distribution of bounding
+                boxes (before softmax) with shape (N, n+1), n is the max value
+                of the integral set `{0, ..., n}` in paper.
+            target (torch.Tensor): Target distance label for bounding boxes
+                with shape (N,).
+            weight (torch.Tensor, optional): The weight of loss for each
+                prediction. Defaults to None.
+            avg_factor (int, optional): Average factor that is used to average
+                the loss. Defaults to None.
+            reduction_override (str, optional): The reduction method used to
+                override the original reduction method of the loss.
+                Defaults to None.
+        """
+        assert reduction_override in (None, "none", "mean", "sum")
+        reduction = reduction_override if reduction_override else self.reduction
+        loss_cls = self.loss_weight * distribution_focal_loss(
+            pred, target, weight, reduction=reduction, avg_factor=avg_factor
+        )
+        return loss_cls
diff --git a/nanodet/model/loss/iou_loss.py b/nanodet/model/loss/iou_loss.py
new file mode 100644
index 0000000..f1f3e26
--- /dev/null
+++ b/nanodet/model/loss/iou_loss.py
@@ -0,0 +1,548 @@
+# Modification 2020 RangiLyu
+# Copyright 2018-2019 Open-MMLab.
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+
+#     http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+
+import torch
+import torch.nn as nn
+
+from .utils import weighted_loss
+
+
+def bbox_overlaps(bboxes1, bboxes2, mode="iou", is_aligned=False, eps=1e-6):
+    """Calculate overlap between two set of bboxes.
+
+    If ``is_aligned `` is ``False``, then calculate the overlaps between each
+    bbox of bboxes1 and bboxes2, otherwise the overlaps between each aligned
+    pair of bboxes1 and bboxes2.
+
+    Args:
+        bboxes1 (Tensor): shape (B, m, 4) in <x1, y1, x2, y2> format or empty.
+        bboxes2 (Tensor): shape (B, n, 4) in <x1, y1, x2, y2> format or empty.
+            B indicates the batch dim, in shape (B1, B2, ..., Bn).
+            If ``is_aligned `` is ``True``, then m and n must be equal.
+        mode (str): "iou" (intersection over union) or "iof" (intersection over
+            foreground).
+        is_aligned (bool, optional): If True, then m and n must be equal.
+            Default False.
+        eps (float, optional): A value added to the denominator for numerical
+            stability. Default 1e-6.
+
+    Returns:
+        Tensor: shape (m, n) if ``is_aligned `` is False else shape (m,)
+
+    Example:
+        >>> bboxes1 = torch.FloatTensor([
+        >>>     [0, 0, 10, 10],
+        >>>     [10, 10, 20, 20],
+        >>>     [32, 32, 38, 42],
+        >>> ])
+        >>> bboxes2 = torch.FloatTensor([
+        >>>     [0, 0, 10, 20],
+        >>>     [0, 10, 10, 19],
+        >>>     [10, 10, 20, 20],
+        >>> ])
+        >>> bbox_overlaps(bboxes1, bboxes2)
+        tensor([[0.5000, 0.0000, 0.0000],
+                [0.0000, 0.0000, 1.0000],
+                [0.0000, 0.0000, 0.0000]])
+        >>> bbox_overlaps(bboxes1, bboxes2, mode='giou', eps=1e-7)
+        tensor([[0.5000, 0.0000, -0.5000],
+                [-0.2500, -0.0500, 1.0000],
+                [-0.8371, -0.8766, -0.8214]])
+
+    Example:
+        >>> empty = torch.FloatTensor([])
+        >>> nonempty = torch.FloatTensor([
+        >>>     [0, 0, 10, 9],
+        >>> ])
+        >>> assert tuple(bbox_overlaps(empty, nonempty).shape) == (0, 1)
+        >>> assert tuple(bbox_overlaps(nonempty, empty).shape) == (1, 0)
+        >>> assert tuple(bbox_overlaps(empty, empty).shape) == (0, 0)
+    """
+
+    assert mode in ["iou", "iof", "giou"], f"Unsupported mode {mode}"
+    # Either the boxes are empty or the length of boxes's last dimenstion is 4
+    assert bboxes1.size(-1) == 4 or bboxes1.size(0) == 0
+    assert bboxes2.size(-1) == 4 or bboxes2.size(0) == 0
+
+    # Batch dim must be the same
+    # Batch dim: (B1, B2, ... Bn)
+    assert bboxes1.shape[:-2] == bboxes2.shape[:-2]
+    batch_shape = bboxes1.shape[:-2]
+
+    rows = bboxes1.size(-2)
+    cols = bboxes2.size(-2)
+    if is_aligned:
+        assert rows == cols
+
+    if rows * cols == 0:
+        if is_aligned:
+            return bboxes1.new(batch_shape + (rows,))
+        else:
+            return bboxes1.new(batch_shape + (rows, cols))
+
+    area1 = (bboxes1[..., 2] - bboxes1[..., 0]) * (bboxes1[..., 3] - bboxes1[..., 1])
+    area2 = (bboxes2[..., 2] - bboxes2[..., 0]) * (bboxes2[..., 3] - bboxes2[..., 1])
+
+    if is_aligned:
+        lt = torch.max(bboxes1[..., :2], bboxes2[..., :2])  # [B, rows, 2]
+        rb = torch.min(bboxes1[..., 2:], bboxes2[..., 2:])  # [B, rows, 2]
+
+        wh = (rb - lt).clamp(min=0)  # [B, rows, 2]
+        overlap = wh[..., 0] * wh[..., 1]
+
+        if mode in ["iou", "giou"]:
+            union = area1 + area2 - overlap
+        else:
+            union = area1
+        if mode == "giou":
+            enclosed_lt = torch.min(bboxes1[..., :2], bboxes2[..., :2])
+            enclosed_rb = torch.max(bboxes1[..., 2:], bboxes2[..., 2:])
+    else:
+        lt = torch.max(
+            bboxes1[..., :, None, :2], bboxes2[..., None, :, :2]
+        )  # [B, rows, cols, 2]
+        rb = torch.min(
+            bboxes1[..., :, None, 2:], bboxes2[..., None, :, 2:]
+        )  # [B, rows, cols, 2]
+
+        wh = (rb - lt).clamp(min=0)  # [B, rows, cols, 2]
+        overlap = wh[..., 0] * wh[..., 1]
+
+        if mode in ["iou", "giou"]:
+            union = area1[..., None] + area2[..., None, :] - overlap
+        else:
+            union = area1[..., None]
+        if mode == "giou":
+            enclosed_lt = torch.min(
+                bboxes1[..., :, None, :2], bboxes2[..., None, :, :2]
+            )
+            enclosed_rb = torch.max(
+                bboxes1[..., :, None, 2:], bboxes2[..., None, :, 2:]
+            )
+
+    eps = union.new_tensor([eps])
+    union = torch.max(union, eps)
+    ious = overlap / union
+    if mode in ["iou", "iof"]:
+        return ious
+    # calculate gious
+    enclose_wh = (enclosed_rb - enclosed_lt).clamp(min=0)
+    enclose_area = enclose_wh[..., 0] * enclose_wh[..., 1]
+    enclose_area = torch.max(enclose_area, eps)
+    gious = ious - (enclose_area - union) / enclose_area
+    return gious
+
+
+@weighted_loss
+def iou_loss(pred, target, eps=1e-6):
+    """IoU loss.
+
+    Computing the IoU loss between a set of predicted bboxes and target bboxes.
+    The loss is calculated as negative log of IoU.
+
+    Args:
+        pred (torch.Tensor): Predicted bboxes of format (x1, y1, x2, y2),
+            shape (n, 4).
+        target (torch.Tensor): Corresponding gt bboxes, shape (n, 4).
+        eps (float): Eps to avoid log(0).
+
+    Return:
+        torch.Tensor: Loss tensor.
+    """
+    ious = bbox_overlaps(pred, target, is_aligned=True).clamp(min=eps)
+    loss = -ious.log()
+    return loss
+
+
+@weighted_loss
+def bounded_iou_loss(pred, target, beta=0.2, eps=1e-3):
+    """BIoULoss.
+
+    This is an implementation of paper
+    `Improving Object Localization with Fitness NMS and Bounded IoU Loss.
+    <https://arxiv.org/abs/1711.00164>`_.
+
+    Args:
+        pred (torch.Tensor): Predicted bboxes.
+        target (torch.Tensor): Target bboxes.
+        beta (float): beta parameter in smoothl1.
+        eps (float): eps to avoid NaN.
+    """
+    pred_ctrx = (pred[:, 0] + pred[:, 2]) * 0.5
+    pred_ctry = (pred[:, 1] + pred[:, 3]) * 0.5
+    pred_w = pred[:, 2] - pred[:, 0]
+    pred_h = pred[:, 3] - pred[:, 1]
+    with torch.no_grad():
+        target_ctrx = (target[:, 0] + target[:, 2]) * 0.5
+        target_ctry = (target[:, 1] + target[:, 3]) * 0.5
+        target_w = target[:, 2] - target[:, 0]
+        target_h = target[:, 3] - target[:, 1]
+
+    dx = target_ctrx - pred_ctrx
+    dy = target_ctry - pred_ctry
+
+    loss_dx = 1 - torch.max(
+        (target_w - 2 * dx.abs()) / (target_w + 2 * dx.abs() + eps),
+        torch.zeros_like(dx),
+    )
+    loss_dy = 1 - torch.max(
+        (target_h - 2 * dy.abs()) / (target_h + 2 * dy.abs() + eps),
+        torch.zeros_like(dy),
+    )
+    loss_dw = 1 - torch.min(target_w / (pred_w + eps), pred_w / (target_w + eps))
+    loss_dh = 1 - torch.min(target_h / (pred_h + eps), pred_h / (target_h + eps))
+    loss_comb = torch.stack([loss_dx, loss_dy, loss_dw, loss_dh], dim=-1).view(
+        loss_dx.size(0), -1
+    )
+
+    loss = torch.where(
+        loss_comb < beta, 0.5 * loss_comb * loss_comb / beta, loss_comb - 0.5 * beta
+    ).sum(dim=-1)
+    return loss
+
+
+@weighted_loss
+def giou_loss(pred, target, eps=1e-7):
+    r"""`Generalized Intersection over Union: A Metric and A Loss for Bounding
+    Box Regression <https://arxiv.org/abs/1902.09630>`_.
+
+    Args:
+        pred (torch.Tensor): Predicted bboxes of format (x1, y1, x2, y2),
+            shape (n, 4).
+        target (torch.Tensor): Corresponding gt bboxes, shape (n, 4).
+        eps (float): Eps to avoid log(0).
+
+    Return:
+        Tensor: Loss tensor.
+    """
+    gious = bbox_overlaps(pred, target, mode="giou", is_aligned=True, eps=eps)
+    loss = 1 - gious
+    return loss
+
+
+@weighted_loss
+def diou_loss(pred, target, eps=1e-7):
+    r"""`Implementation of Distance-IoU Loss: Faster and Better
+    Learning for Bounding Box Regression, https://arxiv.org/abs/1911.08287`_.
+
+    Code is modified from https://github.com/Zzh-tju/DIoU.
+
+    Args:
+        pred (Tensor): Predicted bboxes of format (x1, y1, x2, y2),
+            shape (n, 4).
+        target (Tensor): Corresponding gt bboxes, shape (n, 4).
+        eps (float): Eps to avoid log(0).
+    Return:
+        Tensor: Loss tensor.
+    """
+    # overlap
+    lt = torch.max(pred[:, :2], target[:, :2])
+    rb = torch.min(pred[:, 2:], target[:, 2:])
+    wh = (rb - lt).clamp(min=0)
+    overlap = wh[:, 0] * wh[:, 1]
+
+    # union
+    ap = (pred[:, 2] - pred[:, 0]) * (pred[:, 3] - pred[:, 1])
+    ag = (target[:, 2] - target[:, 0]) * (target[:, 3] - target[:, 1])
+    union = ap + ag - overlap + eps
+
+    # IoU
+    ious = overlap / union
+
+    # enclose area
+    enclose_x1y1 = torch.min(pred[:, :2], target[:, :2])
+    enclose_x2y2 = torch.max(pred[:, 2:], target[:, 2:])
+    enclose_wh = (enclose_x2y2 - enclose_x1y1).clamp(min=0)
+
+    cw = enclose_wh[:, 0]
+    ch = enclose_wh[:, 1]
+
+    c2 = cw**2 + ch**2 + eps
+
+    b1_x1, b1_y1 = pred[:, 0], pred[:, 1]
+    b1_x2, b1_y2 = pred[:, 2], pred[:, 3]
+    b2_x1, b2_y1 = target[:, 0], target[:, 1]
+    b2_x2, b2_y2 = target[:, 2], target[:, 3]
+
+    left = ((b2_x1 + b2_x2) - (b1_x1 + b1_x2)) ** 2 / 4
+    right = ((b2_y1 + b2_y2) - (b1_y1 + b1_y2)) ** 2 / 4
+    rho2 = left + right
+
+    # DIoU
+    dious = ious - rho2 / c2
+    loss = 1 - dious
+    return loss
+
+
+@weighted_loss
+def ciou_loss(pred, target, eps=1e-7):
+    r"""`Implementation of paper `Enhancing Geometric Factors into
+    Model Learning and Inference for Object Detection and Instance
+    Segmentation <https://arxiv.org/abs/2005.03572>`_.
+
+    Code is modified from https://github.com/Zzh-tju/CIoU.
+
+    Args:
+        pred (Tensor): Predicted bboxes of format (x1, y1, x2, y2),
+            shape (n, 4).
+        target (Tensor): Corresponding gt bboxes, shape (n, 4).
+        eps (float): Eps to avoid log(0).
+    Return:
+        Tensor: Loss tensor.
+    """
+    # overlap
+    lt = torch.max(pred[:, :2], target[:, :2])
+    rb = torch.min(pred[:, 2:], target[:, 2:])
+    wh = (rb - lt).clamp(min=0)
+    overlap = wh[:, 0] * wh[:, 1]
+
+    # union
+    ap = (pred[:, 2] - pred[:, 0]) * (pred[:, 3] - pred[:, 1])
+    ag = (target[:, 2] - target[:, 0]) * (target[:, 3] - target[:, 1])
+    union = ap + ag - overlap + eps
+
+    # IoU
+    ious = overlap / union
+
+    # enclose area
+    enclose_x1y1 = torch.min(pred[:, :2], target[:, :2])
+    enclose_x2y2 = torch.max(pred[:, 2:], target[:, 2:])
+    enclose_wh = (enclose_x2y2 - enclose_x1y1).clamp(min=0)
+
+    cw = enclose_wh[:, 0]
+    ch = enclose_wh[:, 1]
+
+    c2 = cw**2 + ch**2 + eps
+
+    b1_x1, b1_y1 = pred[:, 0], pred[:, 1]
+    b1_x2, b1_y2 = pred[:, 2], pred[:, 3]
+    b2_x1, b2_y1 = target[:, 0], target[:, 1]
+    b2_x2, b2_y2 = target[:, 2], target[:, 3]
+
+    w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
+    w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
+
+    left = ((b2_x1 + b2_x2) - (b1_x1 + b1_x2)) ** 2 / 4
+    right = ((b2_y1 + b2_y2) - (b1_y1 + b1_y2)) ** 2 / 4
+    rho2 = left + right
+
+    factor = 4 / math.pi**2
+    v = factor * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)
+
+    # CIoU
+    cious = ious - (rho2 / c2 + v**2 / (1 - ious + v))
+    loss = 1 - cious
+    return loss
+
+
+class IoULoss(nn.Module):
+    """IoULoss.
+
+    Computing the IoU loss between a set of predicted bboxes and target bboxes.
+
+    Args:
+        eps (float): Eps to avoid log(0).
+        reduction (str): Options are "none", "mean" and "sum".
+        loss_weight (float): Weight of loss.
+    """
+
+    def __init__(self, eps=1e-6, reduction="mean", loss_weight=1.0):
+        super(IoULoss, self).__init__()
+        self.eps = eps
+        self.reduction = reduction
+        self.loss_weight = loss_weight
+
+    def forward(
+        self,
+        pred,
+        target,
+        weight=None,
+        avg_factor=None,
+        reduction_override=None,
+        **kwargs,
+    ):
+        """Forward function.
+
+        Args:
+            pred (torch.Tensor): The prediction.
+            target (torch.Tensor): The learning target of the prediction.
+            weight (torch.Tensor, optional): The weight of loss for each
+                prediction. Defaults to None.
+            avg_factor (int, optional): Average factor that is used to average
+                the loss. Defaults to None.
+            reduction_override (str, optional): The reduction method used to
+                override the original reduction method of the loss.
+                Defaults to None. Options are "none", "mean" and "sum".
+        """
+        assert reduction_override in (None, "none", "mean", "sum")
+        reduction = reduction_override if reduction_override else self.reduction
+        if (
+            (weight is not None)
+            and (not torch.any(weight > 0))
+            and (reduction != "none")
+        ):
+            if pred.dim() == weight.dim() + 1:
+                weight = weight.unsqueeze(1)
+            return (pred * weight).sum()  # 0
+        loss = self.loss_weight * iou_loss(
+            pred,
+            target,
+            weight,
+            eps=self.eps,
+            reduction=reduction,
+            avg_factor=avg_factor,
+            **kwargs,
+        )
+        return loss
+
+
+class BoundedIoULoss(nn.Module):
+    def __init__(self, beta=0.2, eps=1e-3, reduction="mean", loss_weight=1.0):
+        super(BoundedIoULoss, self).__init__()
+        self.beta = beta
+        self.eps = eps
+        self.reduction = reduction
+        self.loss_weight = loss_weight
+
+    def forward(
+        self,
+        pred,
+        target,
+        weight=None,
+        avg_factor=None,
+        reduction_override=None,
+        **kwargs,
+    ):
+        if weight is not None and not torch.any(weight > 0):
+            if pred.dim() == weight.dim() + 1:
+                weight = weight.unsqueeze(1)
+            return (pred * weight).sum()  # 0
+        assert reduction_override in (None, "none", "mean", "sum")
+        reduction = reduction_override if reduction_override else self.reduction
+        loss = self.loss_weight * bounded_iou_loss(
+            pred,
+            target,
+            weight,
+            beta=self.beta,
+            eps=self.eps,
+            reduction=reduction,
+            avg_factor=avg_factor,
+            **kwargs,
+        )
+        return loss
+
+
+class GIoULoss(nn.Module):
+    def __init__(self, eps=1e-6, reduction="mean", loss_weight=1.0):
+        super(GIoULoss, self).__init__()
+        self.eps = eps
+        self.reduction = reduction
+        self.loss_weight = loss_weight
+
+    def forward(
+        self,
+        pred,
+        target,
+        weight=None,
+        avg_factor=None,
+        reduction_override=None,
+        **kwargs,
+    ):
+        if weight is not None and not torch.any(weight > 0):
+            if pred.dim() == weight.dim() + 1:
+                weight = weight.unsqueeze(1)
+            return (pred * weight).sum()  # 0
+        assert reduction_override in (None, "none", "mean", "sum")
+        reduction = reduction_override if reduction_override else self.reduction
+        loss = self.loss_weight * giou_loss(
+            pred,
+            target,
+            weight,
+            eps=self.eps,
+            reduction=reduction,
+            avg_factor=avg_factor,
+            **kwargs,
+        )
+        return loss
+
+
+class DIoULoss(nn.Module):
+    def __init__(self, eps=1e-6, reduction="mean", loss_weight=1.0):
+        super(DIoULoss, self).__init__()
+        self.eps = eps
+        self.reduction = reduction
+        self.loss_weight = loss_weight
+
+    def forward(
+        self,
+        pred,
+        target,
+        weight=None,
+        avg_factor=None,
+        reduction_override=None,
+        **kwargs,
+    ):
+        if weight is not None and not torch.any(weight > 0):
+            if pred.dim() == weight.dim() + 1:
+                weight = weight.unsqueeze(1)
+            return (pred * weight).sum()  # 0
+        assert reduction_override in (None, "none", "mean", "sum")
+        reduction = reduction_override if reduction_override else self.reduction
+        loss = self.loss_weight * diou_loss(
+            pred,
+            target,
+            weight,
+            eps=self.eps,
+            reduction=reduction,
+            avg_factor=avg_factor,
+            **kwargs,
+        )
+        return loss
+
+
+class CIoULoss(nn.Module):
+    def __init__(self, eps=1e-6, reduction="mean", loss_weight=1.0):
+        super(CIoULoss, self).__init__()
+        self.eps = eps
+        self.reduction = reduction
+        self.loss_weight = loss_weight
+
+    def forward(
+        self,
+        pred,
+        target,
+        weight=None,
+        avg_factor=None,
+        reduction_override=None,
+        **kwargs,
+    ):
+        if weight is not None and not torch.any(weight > 0):
+            if pred.dim() == weight.dim() + 1:
+                weight = weight.unsqueeze(1)
+            return (pred * weight).sum()  # 0
+        assert reduction_override in (None, "none", "mean", "sum")
+        reduction = reduction_override if reduction_override else self.reduction
+        loss = self.loss_weight * ciou_loss(
+            pred,
+            target,
+            weight,
+            eps=self.eps,
+            reduction=reduction,
+            avg_factor=avg_factor,
+            **kwargs,
+        )
+        return loss
diff --git a/nanodet/model/loss/utils.py b/nanodet/model/loss/utils.py
new file mode 100644
index 0000000..f8bae7d
--- /dev/null
+++ b/nanodet/model/loss/utils.py
@@ -0,0 +1,93 @@
+import functools
+
+import torch.nn.functional as F
+
+
+def reduce_loss(loss, reduction):
+    """Reduce loss as specified.
+
+    Args:
+        loss (Tensor): Elementwise loss tensor.
+        reduction (str): Options are "none", "mean" and "sum".
+
+    Return:
+        Tensor: Reduced loss tensor.
+    """
+    reduction_enum = F._Reduction.get_enum(reduction)
+    # none: 0, elementwise_mean:1, sum: 2
+    if reduction_enum == 0:
+        return loss
+    elif reduction_enum == 1:
+        return loss.mean()
+    elif reduction_enum == 2:
+        return loss.sum()
+
+
+def weight_reduce_loss(loss, weight=None, reduction="mean", avg_factor=None):
+    """Apply element-wise weight and reduce loss.
+
+    Args:
+        loss (Tensor): Element-wise loss.
+        weight (Tensor): Element-wise weights.
+        reduction (str): Same as built-in losses of PyTorch.
+        avg_factor (float): Avarage factor when computing the mean of losses.
+
+    Returns:
+        Tensor: Processed loss values.
+    """
+    # if weight is specified, apply element-wise weight
+    if weight is not None:
+        loss = loss * weight
+
+    # if avg_factor is not specified, just reduce the loss
+    if avg_factor is None:
+        loss = reduce_loss(loss, reduction)
+    else:
+        # if reduction is mean, then average the loss by avg_factor
+        if reduction == "mean":
+            loss = loss.sum() / avg_factor
+        # if reduction is 'none', then do nothing, otherwise raise an error
+        elif reduction != "none":
+            raise ValueError('avg_factor can not be used with reduction="sum"')
+    return loss
+
+
+def weighted_loss(loss_func):
+    """Create a weighted version of a given loss function.
+
+    To use this decorator, the loss function must have the signature like
+    `loss_func(pred, target, **kwargs)`. The function only needs to compute
+    element-wise loss without any reduction. This decorator will add weight
+    and reduction arguments to the function. The decorated function will have
+    the signature like `loss_func(pred, target, weight=None, reduction='mean',
+    avg_factor=None, **kwargs)`.
+
+    :Example:
+
+    >>> import torch
+    >>> @weighted_loss
+    >>> def l1_loss(pred, target):
+    >>>     return (pred - target).abs()
+
+    >>> pred = torch.Tensor([0, 2, 3])
+    >>> target = torch.Tensor([1, 1, 1])
+    >>> weight = torch.Tensor([1, 0, 1])
+
+    >>> l1_loss(pred, target)
+    tensor(1.3333)
+    >>> l1_loss(pred, target, weight)
+    tensor(1.)
+    >>> l1_loss(pred, target, reduction='none')
+    tensor([1., 1., 2.])
+    >>> l1_loss(pred, target, weight, avg_factor=2)
+    tensor(1.5000)
+    """
+
+    @functools.wraps(loss_func)
+    def wrapper(pred, target, weight=None, reduction="mean", avg_factor=None, **kwargs):
+        # get element-wise loss
+        loss = loss_func(pred, target, **kwargs)
+        loss = weight_reduce_loss(loss, weight, reduction, avg_factor)
+        return loss
+
+    return wrapper
diff --git a/nanodet/model/module/activation.py b/nanodet/model/module/activation.py
new file mode 100644
index 0000000..8047fc8
--- /dev/null
+++ b/nanodet/model/module/activation.py
@@ -0,0 +1,41 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch.nn as nn
+
+activations = {
+    "ReLU": nn.ReLU,
+    "LeakyReLU": nn.LeakyReLU,
+    "ReLU6": nn.ReLU6,
+    "SELU": nn.SELU,
+    "ELU": nn.ELU,
+    "GELU": nn.GELU,
+    "PReLU": nn.PReLU,
+    "SiLU": nn.SiLU,
+    "HardSwish": nn.Hardswish,
+    "Hardswish": nn.Hardswish,
+    None: nn.Identity,
+}
+
+
+def act_layers(name):
+    assert name in activations.keys()
+    if name == "LeakyReLU":
+        return nn.LeakyReLU(negative_slope=0.1, inplace=True)
+    elif name == "GELU":
+        return nn.GELU()
+    elif name == "PReLU":
+        return nn.PReLU()
+    else:
+        return activations[name](inplace=True)
diff --git a/nanodet/model/module/conv.py b/nanodet/model/module/conv.py
new file mode 100644
index 0000000..f35f0b6
--- /dev/null
+++ b/nanodet/model/module/conv.py
@@ -0,0 +1,392 @@
+"""
+ConvModule refers from MMDetection
+RepVGGConvModule refers from RepVGG: Making VGG-style ConvNets Great Again
+"""
+import warnings
+
+import numpy as np
+import torch
+import torch.nn as nn
+
+from .activation import act_layers
+from .init_weights import constant_init, kaiming_init
+from .norm import build_norm_layer
+
+
+class ConvModule(nn.Module):
+    """A conv block that contains conv/norm/activation layers.
+
+    Args:
+        in_channels (int): Same as nn.Conv2d.
+        out_channels (int): Same as nn.Conv2d.
+        kernel_size (int or tuple[int]): Same as nn.Conv2d.
+        stride (int or tuple[int]): Same as nn.Conv2d.
+        padding (int or tuple[int]): Same as nn.Conv2d.
+        dilation (int or tuple[int]): Same as nn.Conv2d.
+        groups (int): Same as nn.Conv2d.
+        bias (bool or str): If specified as `auto`, it will be decided by the
+            norm_cfg. Bias will be set as True if norm_cfg is None, otherwise
+            False.
+        conv_cfg (dict): Config dict for convolution layer.
+        norm_cfg (dict): Config dict for normalization layer.
+        activation (str): activation layer, "ReLU" by default.
+        inplace (bool): Whether to use inplace mode for activation.
+        order (tuple[str]): The order of conv/norm/activation layers. It is a
+            sequence of "conv", "norm" and "act". Examples are
+            ("conv", "norm", "act") and ("act", "conv", "norm").
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        kernel_size,
+        stride=1,
+        padding=0,
+        dilation=1,
+        groups=1,
+        bias="auto",
+        conv_cfg=None,
+        norm_cfg=None,
+        activation="ReLU",
+        inplace=True,
+        order=("conv", "norm", "act"),
+    ):
+        super(ConvModule, self).__init__()
+        assert conv_cfg is None or isinstance(conv_cfg, dict)
+        assert norm_cfg is None or isinstance(norm_cfg, dict)
+        assert activation is None or isinstance(activation, str)
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.activation = activation
+        self.inplace = inplace
+        self.order = order
+        assert isinstance(self.order, tuple) and len(self.order) == 3
+        assert set(order) == {"conv", "norm", "act"}
+
+        self.with_norm = norm_cfg is not None
+        # if the conv layer is before a norm layer, bias is unnecessary.
+        if bias == "auto":
+            bias = False if self.with_norm else True
+        self.with_bias = bias
+
+        if self.with_norm and self.with_bias:
+            warnings.warn("ConvModule has norm and bias at the same time")
+
+        # build convolution layer
+        self.conv = nn.Conv2d(  #
+            in_channels,
+            out_channels,
+            kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            groups=groups,
+            bias=bias,
+        )
+        # export the attributes of self.conv to a higher level for convenience
+        self.in_channels = self.conv.in_channels
+        self.out_channels = self.conv.out_channels
+        self.kernel_size = self.conv.kernel_size
+        self.stride = self.conv.stride
+        self.padding = self.conv.padding
+        self.dilation = self.conv.dilation
+        self.transposed = self.conv.transposed
+        self.output_padding = self.conv.output_padding
+        self.groups = self.conv.groups
+
+        # build normalization layers
+        if self.with_norm:
+            # norm layer is after conv layer
+            if order.index("norm") > order.index("conv"):
+                norm_channels = out_channels
+            else:
+                norm_channels = in_channels
+            self.norm_name, norm = build_norm_layer(norm_cfg, norm_channels)
+            self.add_module(self.norm_name, norm)
+        else:
+            self.norm_name = None
+
+        # build activation layer
+        if self.activation:
+            self.act = act_layers(self.activation)
+
+        # Use msra init by default
+        self.init_weights()
+
+    @property
+    def norm(self):
+        if self.norm_name:
+            return getattr(self, self.norm_name)
+        else:
+            return None
+
+    def init_weights(self):
+        if self.activation == "LeakyReLU":
+            nonlinearity = "leaky_relu"
+        else:
+            nonlinearity = "relu"
+        kaiming_init(self.conv, nonlinearity=nonlinearity)
+        if self.with_norm:
+            constant_init(self.norm, 1, bias=0)
+
+    def forward(self, x, norm=True):
+        for layer in self.order:
+            if layer == "conv":
+                x = self.conv(x)
+            elif layer == "norm" and norm and self.with_norm:
+                x = self.norm(x)
+            elif layer == "act" and self.activation:
+                x = self.act(x)
+        return x
+
+
+class DepthwiseConvModule(nn.Module):
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        kernel_size,
+        stride=1,
+        padding=0,
+        dilation=1,
+        bias="auto",
+        norm_cfg=dict(type="BN"),
+        activation="ReLU",
+        inplace=True,
+        order=("depthwise", "dwnorm", "act", "pointwise", "pwnorm", "act"),
+    ):
+        super(DepthwiseConvModule, self).__init__()
+        assert activation is None or isinstance(activation, str)
+        self.activation = activation
+        self.inplace = inplace
+        self.order = order
+        assert isinstance(self.order, tuple) and len(self.order) == 6
+        assert set(order) == {
+            "depthwise",
+            "dwnorm",
+            "act",
+            "pointwise",
+            "pwnorm",
+            "act",
+        }
+
+        self.with_norm = norm_cfg is not None
+        # if the conv layer is before a norm layer, bias is unnecessary.
+        if bias == "auto":
+            bias = False if self.with_norm else True
+        self.with_bias = bias
+
+        if self.with_norm and self.with_bias:
+            warnings.warn("ConvModule has norm and bias at the same time")
+
+        # build convolution layer
+        self.depthwise = nn.Conv2d(
+            in_channels,
+            in_channels,
+            kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            groups=in_channels,
+            bias=bias,
+        )
+        self.pointwise = nn.Conv2d(
+            in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=bias
+        )
+
+        # export the attributes of self.conv to a higher level for convenience
+        self.in_channels = self.depthwise.in_channels
+        self.out_channels = self.pointwise.out_channels
+        self.kernel_size = self.depthwise.kernel_size
+        self.stride = self.depthwise.stride
+        self.padding = self.depthwise.padding
+        self.dilation = self.depthwise.dilation
+        self.transposed = self.depthwise.transposed
+        self.output_padding = self.depthwise.output_padding
+
+        # build normalization layers
+        if self.with_norm:
+            # norm layer is after conv layer
+            _, self.dwnorm = build_norm_layer(norm_cfg, in_channels)
+            _, self.pwnorm = build_norm_layer(norm_cfg, out_channels)
+
+        # build activation layer
+        if self.activation:
+            self.act = act_layers(self.activation)
+
+        # Use msra init by default
+        self.init_weights()
+
+    def init_weights(self):
+        if self.activation == "LeakyReLU":
+            nonlinearity = "leaky_relu"
+        else:
+            nonlinearity = "relu"
+        kaiming_init(self.depthwise, nonlinearity=nonlinearity)
+        kaiming_init(self.pointwise, nonlinearity=nonlinearity)
+        if self.with_norm:
+            constant_init(self.dwnorm, 1, bias=0)
+            constant_init(self.pwnorm, 1, bias=0)
+
+    def forward(self, x, norm=True):
+        for layer_name in self.order:
+            if layer_name != "act":
+                layer = self.__getattr__(layer_name)
+                x = layer(x)
+            elif layer_name == "act" and self.activation:
+                x = self.act(x)
+        return x
+
+
+class RepVGGConvModule(nn.Module):
+    """
+    RepVGG Conv Block from paper RepVGG: Making VGG-style ConvNets Great Again
+    https://arxiv.org/abs/2101.03697
+    https://github.com/DingXiaoH/RepVGG
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        kernel_size=3,
+        stride=1,
+        padding=1,
+        dilation=1,
+        groups=1,
+        activation="ReLU",
+        padding_mode="zeros",
+        deploy=False,
+        **kwargs
+    ):
+        super(RepVGGConvModule, self).__init__()
+        assert activation is None or isinstance(activation, str)
+        self.activation = activation
+
+        self.deploy = deploy
+        self.groups = groups
+        self.in_channels = in_channels
+
+        assert kernel_size == 3
+        assert padding == 1
+
+        padding_11 = padding - kernel_size // 2
+
+        # build activation layer
+        if self.activation:
+            self.act = act_layers(self.activation)
+
+        if deploy:
+            self.rbr_reparam = nn.Conv2d(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=kernel_size,
+                stride=stride,
+                padding=padding,
+                dilation=dilation,
+                groups=groups,
+                bias=True,
+                padding_mode=padding_mode,
+            )
+
+        else:
+            self.rbr_identity = (
+                nn.BatchNorm2d(num_features=in_channels)
+                if out_channels == in_channels and stride == 1
+                else None
+            )
+
+            self.rbr_dense = nn.Sequential(
+                nn.Conv2d(
+                    in_channels=in_channels,
+                    out_channels=out_channels,
+                    kernel_size=kernel_size,
+                    stride=stride,
+                    padding=padding,
+                    groups=groups,
+                    bias=False,
+                ),
+                nn.BatchNorm2d(num_features=out_channels),
+            )
+
+            self.rbr_1x1 = nn.Sequential(
+                nn.Conv2d(
+                    in_channels=in_channels,
+                    out_channels=out_channels,
+                    kernel_size=1,
+                    stride=stride,
+                    padding=padding_11,
+                    groups=groups,
+                    bias=False,
+                ),
+                nn.BatchNorm2d(num_features=out_channels),
+            )
+            print("RepVGG Block, identity = ", self.rbr_identity)
+
+    def forward(self, inputs):
+        if hasattr(self, "rbr_reparam"):
+            return self.act(self.rbr_reparam(inputs))
+
+        if self.rbr_identity is None:
+            id_out = 0
+        else:
+            id_out = self.rbr_identity(inputs)
+
+        return self.act(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out)
+
+    #   This func derives the equivalent kernel and bias in a DIFFERENTIABLE way.
+    #   You can get the equivalent kernel and bias at any time and do whatever you want,
+    #   for example, apply some penalties or constraints during training, just like you
+    #   do to the other models.  May be useful for quantization or pruning.
+    def get_equivalent_kernel_bias(self):
+        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
+        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
+        kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
+        return (
+            kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid,
+            bias3x3 + bias1x1 + biasid,
+        )
+
+    def _pad_1x1_to_3x3_tensor(self, kernel1x1):
+        if kernel1x1 is None:
+            return 0
+        else:
+            return nn.functional.pad(kernel1x1, [1, 1, 1, 1])
+
+    def _fuse_bn_tensor(self, branch):
+        if branch is None:
+            return 0, 0
+        if isinstance(branch, nn.Sequential):
+            kernel = branch[0].weight
+            running_mean = branch[1].running_mean
+            running_var = branch[1].running_var
+            gamma = branch[1].weight
+            beta = branch[1].bias
+            eps = branch[1].eps
+        else:
+            assert isinstance(branch, nn.BatchNorm2d)
+            if not hasattr(self, "id_tensor"):
+                input_dim = self.in_channels // self.groups
+                kernel_value = np.zeros(
+                    (self.in_channels, input_dim, 3, 3), dtype=np.float32
+                )
+                for i in range(self.in_channels):
+                    kernel_value[i, i % input_dim, 1, 1] = 1
+                self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
+            kernel = self.id_tensor
+            running_mean = branch.running_mean
+            running_var = branch.running_var
+            gamma = branch.weight
+            beta = branch.bias
+            eps = branch.eps
+        std = (running_var + eps).sqrt()
+        t = (gamma / std).reshape(-1, 1, 1, 1)
+        return kernel * t, beta - running_mean * gamma / std
+
+    def repvgg_convert(self):
+        kernel, bias = self.get_equivalent_kernel_bias()
+        return (
+            kernel.detach().cpu().numpy(),
+            bias.detach().cpu().numpy(),
+        )
diff --git a/nanodet/model/module/init_weights.py b/nanodet/model/module/init_weights.py
new file mode 100644
index 0000000..27da85c
--- /dev/null
+++ b/nanodet/model/module/init_weights.py
@@ -0,0 +1,43 @@
+# Modification 2020 RangiLyu
+# Copyright 2018-2019 Open-MMLab.
+
+import torch.nn as nn
+
+
+def kaiming_init(
+    module, a=0, mode="fan_out", nonlinearity="relu", bias=0, distribution="normal"
+):
+    assert distribution in ["uniform", "normal"]
+    if distribution == "uniform":
+        nn.init.kaiming_uniform_(
+            module.weight, a=a, mode=mode, nonlinearity=nonlinearity
+        )
+    else:
+        nn.init.kaiming_normal_(
+            module.weight, a=a, mode=mode, nonlinearity=nonlinearity
+        )
+    if hasattr(module, "bias") and module.bias is not None:
+        nn.init.constant_(module.bias, bias)
+
+
+def xavier_init(module, gain=1, bias=0, distribution="normal"):
+    assert distribution in ["uniform", "normal"]
+    if distribution == "uniform":
+        nn.init.xavier_uniform_(module.weight, gain=gain)
+    else:
+        nn.init.xavier_normal_(module.weight, gain=gain)
+    if hasattr(module, "bias") and module.bias is not None:
+        nn.init.constant_(module.bias, bias)
+
+
+def normal_init(module, mean=0, std=1, bias=0):
+    nn.init.normal_(module.weight, mean, std)
+    if hasattr(module, "bias") and module.bias is not None:
+        nn.init.constant_(module.bias, bias)
+
+
+def constant_init(module, val, bias=0):
+    if hasattr(module, "weight") and module.weight is not None:
+        nn.init.constant_(module.weight, val)
+    if hasattr(module, "bias") and module.bias is not None:
+        nn.init.constant_(module.bias, bias)
diff --git a/nanodet/model/module/nms.py b/nanodet/model/module/nms.py
new file mode 100644
index 0000000..e5fa3e2
--- /dev/null
+++ b/nanodet/model/module/nms.py
@@ -0,0 +1,122 @@
+import torch
+from torchvision.ops import nms
+
+
+def multiclass_nms(
+    multi_bboxes, multi_scores, score_thr, nms_cfg, max_num=-1, score_factors=None
+):
+    """NMS for multi-class bboxes.
+
+    Args:
+        multi_bboxes (Tensor): shape (n, #class*4) or (n, 4)
+        multi_scores (Tensor): shape (n, #class), where the last column
+            contains scores of the background class, but this will be ignored.
+        score_thr (float): bbox threshold, bboxes with scores lower than it
+            will not be considered.
+        nms_thr (float): NMS IoU threshold
+        max_num (int): if there are more than max_num bboxes after NMS,
+            only top max_num will be kept.
+        score_factors (Tensor): The factors multiplied to scores before
+            applying NMS
+
+    Returns:
+        tuple: (bboxes, labels), tensors of shape (k, 5) and (k, 1). Labels \
+            are 0-based.
+    """
+    num_classes = multi_scores.size(1) - 1
+    # exclude background category
+    if multi_bboxes.shape[1] > 4:
+        bboxes = multi_bboxes.view(multi_scores.size(0), -1, 4)
+    else:
+        bboxes = multi_bboxes[:, None].expand(multi_scores.size(0), num_classes, 4)
+    scores = multi_scores[:, :-1]
+
+    # filter out boxes with low scores
+    valid_mask = scores > score_thr
+
+    # We use masked_select for ONNX exporting purpose,
+    # which is equivalent to bboxes = bboxes[valid_mask]
+    # we have to use this ugly code
+    bboxes = torch.masked_select(
+        bboxes, torch.stack((valid_mask, valid_mask, valid_mask, valid_mask), -1)
+    ).view(-1, 4)
+    if score_factors is not None:
+        scores = scores * score_factors[:, None]
+    scores = torch.masked_select(scores, valid_mask)
+    labels = valid_mask.nonzero(as_tuple=False)[:, 1]
+
+    if bboxes.numel() == 0:
+        bboxes = multi_bboxes.new_zeros((0, 5))
+        labels = multi_bboxes.new_zeros((0,), dtype=torch.long)
+
+        if torch.onnx.is_in_onnx_export():
+            raise RuntimeError(
+                "[ONNX Error] Can not record NMS "
+                "as it has not been executed this time"
+            )
+        return bboxes, labels
+
+    dets, keep = batched_nms(bboxes, scores, labels, nms_cfg)
+
+    if max_num > 0:
+        dets = dets[:max_num]
+        keep = keep[:max_num]
+
+    return dets, labels[keep]
+
+
+def batched_nms(boxes, scores, idxs, nms_cfg, class_agnostic=False):
+    """Performs non-maximum suppression in a batched fashion.
+    Modified from https://github.com/pytorch/vision/blob
+    /505cd6957711af790211896d32b40291bea1bc21/torchvision/ops/boxes.py#L39.
+    In order to perform NMS independently per class, we add an offset to all
+    the boxes. The offset is dependent only on the class idx, and is large
+    enough so that boxes from different classes do not overlap.
+    Arguments:
+        boxes (torch.Tensor): boxes in shape (N, 4).
+        scores (torch.Tensor): scores in shape (N, ).
+        idxs (torch.Tensor): each index value correspond to a bbox cluster,
+            and NMS will not be applied between elements of different idxs,
+            shape (N, ).
+        nms_cfg (dict): specify nms type and other parameters like iou_thr.
+            Possible keys includes the following.
+            - iou_thr (float): IoU threshold used for NMS.
+            - split_thr (float): threshold number of boxes. In some cases the
+                number of boxes is large (e.g., 200k). To avoid OOM during
+                training, the users could set `split_thr` to a small value.
+                If the number of boxes is greater than the threshold, it will
+                perform NMS on each group of boxes separately and sequentially.
+                Defaults to 10000.
+        class_agnostic (bool): if true, nms is class agnostic,
+            i.e. IoU thresholding happens over all boxes,
+            regardless of the predicted class.
+    Returns:
+        tuple: kept dets and indice.
+    """
+    nms_cfg_ = nms_cfg.copy()
+    class_agnostic = nms_cfg_.pop("class_agnostic", class_agnostic)
+    if class_agnostic:
+        boxes_for_nms = boxes
+    else:
+        max_coordinate = boxes.max()
+        offsets = idxs.to(boxes) * (max_coordinate + 1)
+        boxes_for_nms = boxes + offsets[:, None]
+    nms_cfg_.pop("type", "nms")
+    split_thr = nms_cfg_.pop("split_thr", 10000)
+    if len(boxes_for_nms) < split_thr:
+        keep = nms(boxes_for_nms, scores, **nms_cfg_)
+        boxes = boxes[keep]
+        scores = scores[keep]
+    else:
+        total_mask = scores.new_zeros(scores.size(), dtype=torch.bool)
+        for id in torch.unique(idxs):
+            mask = (idxs == id).nonzero(as_tuple=False).view(-1)
+            keep = nms(boxes_for_nms[mask], scores[mask], **nms_cfg_)
+            total_mask[mask[keep]] = True
+
+        keep = total_mask.nonzero(as_tuple=False).view(-1)
+        keep = keep[scores[keep].argsort(descending=True)]
+        boxes = boxes[keep]
+        scores = scores[keep]
+
+    return torch.cat([boxes, scores[:, None]], -1), keep
diff --git a/nanodet/model/module/norm.py b/nanodet/model/module/norm.py
new file mode 100644
index 0000000..b9dd8f4
--- /dev/null
+++ b/nanodet/model/module/norm.py
@@ -0,0 +1,55 @@
+import torch.nn as nn
+
+norm_cfg = {
+    # format: layer_type: (abbreviation, module)
+    "BN": ("bn", nn.BatchNorm2d),
+    "SyncBN": ("bn", nn.SyncBatchNorm),
+    "GN": ("gn", nn.GroupNorm),
+    # and potentially 'SN'
+}
+
+
+def build_norm_layer(cfg, num_features, postfix=""):
+    """Build normalization layer
+
+    Args:
+        cfg (dict): cfg should contain:
+            type (str): identify norm layer type.
+            layer args: args needed to instantiate a norm layer.
+            requires_grad (bool): [optional] whether stop gradient updates
+        num_features (int): number of channels from input.
+        postfix (int, str): appended into norm abbreviation to
+            create named layer.
+
+    Returns:
+        name (str): abbreviation + postfix
+        layer (nn.Module): created norm layer
+    """
+    assert isinstance(cfg, dict) and "type" in cfg
+    cfg_ = cfg.copy()
+
+    layer_type = cfg_.pop("type")
+    if layer_type not in norm_cfg:
+        raise KeyError("Unrecognized norm type {}".format(layer_type))
+    else:
+        abbr, norm_layer = norm_cfg[layer_type]
+        if norm_layer is None:
+            raise NotImplementedError
+
+    assert isinstance(postfix, (int, str))
+    name = abbr + str(postfix)
+
+    requires_grad = cfg_.pop("requires_grad", True)
+    cfg_.setdefault("eps", 1e-5)
+    if layer_type != "GN":
+        layer = norm_layer(num_features, **cfg_)
+        if layer_type == "SyncBN" and hasattr(layer, "_specify_ddp_gpu_num"):
+            layer._specify_ddp_gpu_num(1)
+    else:
+        assert "num_groups" in cfg_
+        layer = norm_layer(num_channels=num_features, **cfg_)
+
+    for param in layer.parameters():
+        param.requires_grad = requires_grad
+
+    return name, layer
diff --git a/nanodet/model/module/scale.py b/nanodet/model/module/scale.py
new file mode 100644
index 0000000..2461af8
--- /dev/null
+++ b/nanodet/model/module/scale.py
@@ -0,0 +1,15 @@
+import torch
+import torch.nn as nn
+
+
+class Scale(nn.Module):
+    """
+    A learnable scale parameter
+    """
+
+    def __init__(self, scale=1.0):
+        super(Scale, self).__init__()
+        self.scale = nn.Parameter(torch.tensor(scale, dtype=torch.float))
+
+    def forward(self, x):
+        return x * self.scale
diff --git a/nanodet/model/module/transformer.py b/nanodet/model/module/transformer.py
new file mode 100644
index 0000000..2856df6
--- /dev/null
+++ b/nanodet/model/module/transformer.py
@@ -0,0 +1,138 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch.nn as nn
+
+from nanodet.model.module.activation import act_layers
+from nanodet.model.module.conv import ConvModule
+
+
+class MLP(nn.Module):
+    def __init__(
+        self, in_dim, hidden_dim=None, out_dim=None, drop=0.0, activation="GELU"
+    ):
+        super(MLP, self).__init__()
+        out_dim = out_dim or in_dim
+        hidden_dim = hidden_dim or in_dim
+        self.fc1 = nn.Linear(in_dim, hidden_dim)
+        self.act = act_layers(activation)
+        self.fc2 = nn.Linear(hidden_dim, out_dim)
+        self.drop = nn.Dropout(drop)
+
+    def forward(self, x):
+        x = self.fc1(x)
+        x = self.act(x)
+        x = self.drop(x)
+        x = self.fc2(x)
+        x = self.drop(x)
+        return x
+
+
+class TransformerEncoder(nn.Module):
+    """
+    Encoder layer of transformer
+    :param dim: feature dimension
+    :param num_heads: number of attention heads
+    :param mlp_ratio: hidden layer dimension expand ratio in MLP
+    :param dropout_ratio: probability of an element to be zeroed
+    :param activation: activation layer type
+    :param kv_bias: add bias on key and values
+    """
+
+    def __init__(
+        self,
+        dim,
+        num_heads,
+        mlp_ratio,
+        dropout_ratio=0.0,
+        activation="GELU",
+        kv_bias=False,
+    ):
+        super(TransformerEncoder, self).__init__()
+        self.norm1 = nn.LayerNorm(dim)
+
+        # embed_dim must be divisible by num_heads
+        assert dim // num_heads * num_heads == dim
+        self.attn = nn.MultiheadAttention(
+            embed_dim=dim,
+            num_heads=num_heads,
+            dropout=dropout_ratio,
+            add_bias_kv=kv_bias,
+        )
+        self.norm2 = nn.LayerNorm(dim)
+        self.mlp = MLP(
+            in_dim=dim,
+            hidden_dim=int(dim * mlp_ratio),
+            drop=dropout_ratio,
+            activation=activation,
+        )
+
+    def forward(self, x):
+        _x = self.norm1(x)
+        x = x + self.attn(_x, _x, _x)[0]
+        x = x + self.mlp(self.norm2(x))
+        return x
+
+
+class TransformerBlock(nn.Module):
+    """
+    Block of transformer encoder layers. Used in vision task.
+    :param in_channels: input channels
+    :param out_channels: output channels
+    :param num_heads: number of attention heads
+    :param num_encoders: number of transformer encoder layers
+    :param mlp_ratio: hidden layer dimension expand ratio in MLP
+    :param dropout_ratio: probability of an element to be zeroed
+    :param activation: activation layer type
+    :param kv_bias: add bias on key and values
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        num_heads,
+        num_encoders=1,
+        mlp_ratio=1,
+        dropout_ratio=0.0,
+        kv_bias=False,
+        activation="GELU",
+    ):
+        super(TransformerBlock, self).__init__()
+
+        # out_channels must be divisible by num_heads
+        assert out_channels // num_heads * num_heads == out_channels
+
+        self.conv = (
+            nn.Identity()
+            if in_channels == out_channels
+            else ConvModule(in_channels, out_channels, 1)
+        )
+        self.linear = nn.Linear(out_channels, out_channels)
+        encoders = [
+            TransformerEncoder(
+                out_channels, num_heads, mlp_ratio, dropout_ratio, activation, kv_bias
+            )
+            for _ in range(num_encoders)
+        ]
+        self.encoders = nn.Sequential(*encoders)
+
+    def forward(self, x, pos_embed):
+        b, _, h, w = x.shape
+        x = self.conv(x)
+        x = x.flatten(2).permute(2, 0, 1)
+        x = x + pos_embed
+        x = self.encoders(x)
+        x = x.permute(1, 2, 0).reshape(b, -1, h, w)
+        return x
diff --git a/nanodet/model/weight_averager/__init__.py b/nanodet/model/weight_averager/__init__.py
new file mode 100644
index 0000000..67d649d
--- /dev/null
+++ b/nanodet/model/weight_averager/__init__.py
@@ -0,0 +1,26 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import copy
+
+from .ema import ExpMovingAverager
+
+
+def build_weight_averager(cfg, device="cpu"):
+    cfg = copy.deepcopy(cfg)
+    name = cfg.pop("name")
+    if name == "ExpMovingAverager":
+        return ExpMovingAverager(**cfg, device=device)
+    else:
+        raise NotImplementedError(f"{name} is not implemented")
diff --git a/nanodet/model/weight_averager/ema.py b/nanodet/model/weight_averager/ema.py
new file mode 100644
index 0000000..a2c5fba
--- /dev/null
+++ b/nanodet/model/weight_averager/ema.py
@@ -0,0 +1,80 @@
+# Copyright 2021 RangiLyu. All rights reserved.
+# =====================================================================
+# Modified from: https://github.com/facebookresearch/d2go
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+# Licensed under the Apache License, Version 2.0 (the "License")
+import itertools
+import math
+from typing import Any, Dict, Optional
+
+import torch
+import torch.nn as nn
+
+
+class ExpMovingAverager(object):
+    """Exponential Moving Average.
+
+    Args:
+        decay (float): EMA decay factor, should be in [0, 1]. A decay of 0 corresponds
+            to always using the latest value (no EMA) and a decay of 1 corresponds to
+            not updating weights after initialization. Default to 0.9998.
+        device (str): If not None, move EMA state to device.
+    """
+
+    def __init__(self, decay: float = 0.9998, device: Optional[str] = None):
+        if decay < 0 or decay > 1.0:
+            raise ValueError(f"Decay should be in [0, 1], {decay} was given.")
+        self.decay: float = decay
+        self.state: Dict[str, Any] = {}
+        self.device: Optional[str] = device
+
+    def load_from(self, model: nn.Module) -> None:
+        """Load state from the model."""
+        self.state.clear()
+        for name, val in self._get_model_state_iterator(model):
+            val = val.detach().clone()
+            self.state[name] = val.to(self.device) if self.device else val
+
+    def has_inited(self) -> bool:
+        return len(self.state) > 0
+
+    def apply_to(self, model: nn.Module) -> None:
+        """Apply EMA state to the model."""
+        with torch.no_grad():
+            for name, val in self._get_model_state_iterator(model):
+                assert (
+                    name in self.state
+                ), f"Name {name} not exist, available names are {self.state.keys()}"
+                val.copy_(self.state[name])
+
+    def state_dict(self) -> Dict[str, Any]:
+        return self.state
+
+    def load_state_dict(self, state_dict: Dict[str, Any]) -> None:
+        self.state.clear()
+        for name, val in state_dict.items():
+            self.state[name] = val.to(self.device) if self.device else val
+
+    def to(self, device: torch.device) -> None:
+        """moves EMA state to device."""
+        for name, val in self.state.items():
+            self.state[name] = val.to(device)
+
+    def _get_model_state_iterator(self, model: nn.Module):
+        param_iter = model.named_parameters()
+        # pyre-fixme[16]: `nn.Module` has no attribute `named_buffers`.
+        buffer_iter = model.named_buffers()
+        return itertools.chain(param_iter, buffer_iter)
+
+    def calculate_dacay(self, iteration: int) -> float:
+        decay = (self.decay) * math.exp(-(1 + iteration) / 2000) + (1 - self.decay)
+        return decay
+
+    def update(self, model: nn.Module, iteration: int) -> None:
+        decay = self.calculate_dacay(iteration)
+        with torch.no_grad():
+            for name, val in self._get_model_state_iterator(model):
+                ema_val = self.state[name]
+                if self.device:
+                    val = val.to(self.device)
+                ema_val.copy_(ema_val * (1 - decay) + val * decay)
diff --git a/nanodet/optim/__init__.py b/nanodet/optim/__init__.py
new file mode 100644
index 0000000..c4974b9
--- /dev/null
+++ b/nanodet/optim/__init__.py
@@ -0,0 +1,3 @@
+from .builder import build_optimizer
+
+__all__ = ["build_optimizer"]
diff --git a/nanodet/optim/builder.py b/nanodet/optim/builder.py
new file mode 100644
index 0000000..afcb114
--- /dev/null
+++ b/nanodet/optim/builder.py
@@ -0,0 +1,76 @@
+import copy
+import logging
+
+import torch
+from torch.nn import GroupNorm, LayerNorm
+from torch.nn.modules.batchnorm import _BatchNorm
+
+NORMS = (GroupNorm, LayerNorm, _BatchNorm)
+
+
+def build_optimizer(model, config):
+    """Build optimizer from config.
+
+    Supports customised parameter-level hyperparameters.
+    The config should be like:
+    >>> optimizer:
+    >>>   name: AdamW
+    >>>   lr: 0.001
+    >>>   weight_decay: 0.05
+    >>>   no_norm_decay: True
+    >>>   param_level_cfg:  # parameter-level config
+    >>>     backbone:
+    >>>       lr_mult: 0.1
+    """
+    config = copy.deepcopy(config)
+    param_dict = {}
+    no_norm_decay = config.pop("no_norm_decay", False)
+    no_bias_decay = config.pop("no_bias_decay", False)
+    param_level_cfg = config.pop("param_level_cfg", {})
+    base_lr = config.get("lr", None)
+    base_wd = config.get("weight_decay", None)
+
+    name = config.pop("name")
+    optim_cls = getattr(torch.optim, name)
+
+    logger = logging.getLogger("NanoDet")
+
+    # custom param-wise lr and weight_decay
+    for name, p in model.named_parameters():
+        if not p.requires_grad:
+            continue
+        param_dict[p] = {"name": name}
+
+        for key in param_level_cfg:
+            if key in name:
+                if "lr_mult" in param_level_cfg[key] and base_lr:
+                    param_dict[p].update(
+                        {"lr": base_lr * param_level_cfg[key]["lr_mult"]}
+                    )
+                if "decay_mult" in param_level_cfg[key] and base_wd:
+                    param_dict[p].update(
+                        {"weight_decay": base_wd * param_level_cfg[key]["decay_mult"]}
+                    )
+                break
+    if no_norm_decay:
+        # update norms decay
+        for name, m in model.named_modules():
+            if isinstance(m, NORMS):
+                param_dict[m.bias].update({"weight_decay": 0})
+                param_dict[m.weight].update({"weight_decay": 0})
+    if no_bias_decay:
+        # update bias decay
+        for name, m in model.named_modules():
+            if hasattr(m, "bias"):
+                param_dict[m.bias].update({"weight_decay": 0})
+
+    # convert param dict to optimizer's param groups
+    param_groups = []
+    for p, pconfig in param_dict.items():
+        name = pconfig.pop("name", None)
+        if "weight_decay" in pconfig or "lr" in pconfig:
+            logger.info(f"special optimizer hyperparameter: {name} - {pconfig}")
+        param_groups += [{"params": p, **pconfig}]
+
+    optimizer = optim_cls(param_groups, **config)
+    return optimizer
diff --git a/nanodet/trainer/__init__.py b/nanodet/trainer/__init__.py
new file mode 100644
index 0000000..8eb73d1
--- /dev/null
+++ b/nanodet/trainer/__init__.py
@@ -0,0 +1,16 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .task import TrainingTask
+
+__all__ = ["TrainingTask"]
diff --git a/nanodet/trainer/task.py b/nanodet/trainer/task.py
new file mode 100644
index 0000000..d6ca89c
--- /dev/null
+++ b/nanodet/trainer/task.py
@@ -0,0 +1,351 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import copy
+import json
+import os
+import warnings
+from typing import Any, Dict, List
+
+import torch
+import torch.distributed as dist
+from pytorch_lightning import LightningModule
+from pytorch_lightning.utilities import rank_zero_only
+
+from nanodet.data.batch_process import stack_batch_img
+from nanodet.optim import build_optimizer
+from nanodet.util import convert_avg_params, gather_results, mkdir
+
+from ..model.arch import build_model
+from ..model.weight_averager import build_weight_averager
+
+
+class TrainingTask(LightningModule):
+    """
+    Pytorch Lightning module of a general training task.
+    Including training, evaluating and testing.
+    Args:
+        cfg: Training configurations
+        evaluator: Evaluator for evaluating the model performance.
+    """
+
+    def __init__(self, cfg, evaluator=None):
+        super(TrainingTask, self).__init__()
+        self.cfg = cfg
+        self.model = build_model(cfg.model)
+        self.evaluator = evaluator
+        self.save_flag = -10
+        self.log_style = "NanoDet"
+        self.weight_averager = None
+        if "weight_averager" in cfg.model:
+            self.weight_averager = build_weight_averager(
+                cfg.model.weight_averager, device=self.device
+            )
+            self.avg_model = copy.deepcopy(self.model)
+
+    def _preprocess_batch_input(self, batch):
+        batch_imgs = batch["img"]
+        if isinstance(batch_imgs, list):
+            batch_imgs = [img.to(self.device) for img in batch_imgs]
+            batch_img_tensor = stack_batch_img(batch_imgs, divisible=32)
+            batch["img"] = batch_img_tensor
+        return batch
+
+    def forward(self, x):
+        x = self.model(x)
+        return x
+
+    @torch.no_grad()
+    def predict(self, batch, batch_idx=None, dataloader_idx=None):
+        batch = self._preprocess_batch_input(batch)
+        preds = self.forward(batch["img"])
+        results = self.model.head.post_process(preds, batch)
+        return results
+
+    def training_step(self, batch, batch_idx):
+        batch = self._preprocess_batch_input(batch)
+        preds, loss, loss_states = self.model.forward_train(batch)
+
+        # log train losses
+        if self.global_step % self.cfg.log.interval == 0:
+            memory = (
+                torch.cuda.memory_reserved() / 1e9 if torch.cuda.is_available() else 0
+            )
+            lr = self.trainer.optimizers[0].param_groups[0]["lr"]
+            log_msg = "Train|Epoch{}/{}|Iter{}({}/{})| mem:{:.3g}G| lr:{:.2e}| ".format(
+                self.current_epoch + 1,
+                self.cfg.schedule.total_epochs,
+                self.global_step,
+                batch_idx + 1,
+                self.trainer.num_training_batches,
+                memory,
+                lr,
+            )
+            self.scalar_summary("Train_loss/lr", "Train", lr, self.global_step)
+            for loss_name in loss_states:
+                log_msg += "{}:{:.4f}| ".format(
+                    loss_name, loss_states[loss_name].mean().item()
+                )
+                self.scalar_summary(
+                    "Train_loss/" + loss_name,
+                    "Train",
+                    loss_states[loss_name].mean().item(),
+                    self.global_step,
+                )
+            self.logger.info(log_msg)
+
+        return loss
+
+    def training_epoch_end(self, outputs: List[Any]) -> None:
+        self.trainer.save_checkpoint(os.path.join(self.cfg.save_dir, "model_last.ckpt"))
+
+    def validation_step(self, batch, batch_idx):
+        batch = self._preprocess_batch_input(batch)
+        if self.weight_averager is not None:
+            preds, loss, loss_states = self.avg_model.forward_train(batch)
+        else:
+            preds, loss, loss_states = self.model.forward_train(batch)
+
+        if batch_idx % self.cfg.log.interval == 0:
+            memory = (
+                torch.cuda.memory_reserved() / 1e9 if torch.cuda.is_available() else 0
+            )
+            lr = self.trainer.optimizers[0].param_groups[0]["lr"]
+            log_msg = "Val|Epoch{}/{}|Iter{}({}/{})| mem:{:.3g}G| lr:{:.2e}| ".format(
+                self.current_epoch + 1,
+                self.cfg.schedule.total_epochs,
+                self.global_step,
+                batch_idx + 1,
+                sum(self.trainer.num_val_batches),
+                memory,
+                lr,
+            )
+            for loss_name in loss_states:
+                log_msg += "{}:{:.4f}| ".format(
+                    loss_name, loss_states[loss_name].mean().item()
+                )
+            self.logger.info(log_msg)
+
+        dets = self.model.head.post_process(preds, batch)
+        return dets
+
+    def validation_epoch_end(self, validation_step_outputs):
+        """
+        Called at the end of the validation epoch with the
+        outputs of all validation steps.Evaluating results
+        and save best model.
+        Args:
+            validation_step_outputs: A list of val outputs
+
+        """
+        results = {}
+        for res in validation_step_outputs:
+            results.update(res)
+        all_results = (
+            gather_results(results)
+            if dist.is_available() and dist.is_initialized()
+            else results
+        )
+        if all_results:
+            eval_results = self.evaluator.evaluate(
+                all_results, self.cfg.save_dir, rank=self.local_rank
+            )
+            metric = eval_results[self.cfg.evaluator.save_key]
+            # save best model
+            if metric > self.save_flag:
+                self.save_flag = metric
+                best_save_path = os.path.join(self.cfg.save_dir, "model_best")
+                mkdir(self.local_rank, best_save_path)
+                self.trainer.save_checkpoint(
+                    os.path.join(best_save_path, "model_best.ckpt")
+                )
+                self.save_model_state(
+                    os.path.join(best_save_path, "nanodet_model_best.pth")
+                )
+                txt_path = os.path.join(best_save_path, "eval_results.txt")
+                if self.local_rank < 1:
+                    with open(txt_path, "a") as f:
+                        f.write("Epoch:{}\n".format(self.current_epoch + 1))
+                        for k, v in eval_results.items():
+                            f.write("{}: {}\n".format(k, v))
+            else:
+                warnings.warn(
+                    "Warning! Save_key is not in eval results! Only save model last!"
+                )
+            self.logger.log_metrics(eval_results, self.current_epoch + 1)
+        else:
+            self.logger.info("Skip val on rank {}".format(self.local_rank))
+
+    def test_step(self, batch, batch_idx):
+        dets = self.predict(batch, batch_idx)
+        return dets
+
+    def test_epoch_end(self, test_step_outputs):
+        results = {}
+        for res in test_step_outputs:
+            results.update(res)
+        all_results = (
+            gather_results(results)
+            if dist.is_available() and dist.is_initialized()
+            else results
+        )
+        if all_results:
+            res_json = self.evaluator.results2json(all_results)
+            json_path = os.path.join(self.cfg.save_dir, "results.json")
+            json.dump(res_json, open(json_path, "w"))
+
+            if self.cfg.test_mode == "val":
+                eval_results = self.evaluator.evaluate(
+                    all_results, self.cfg.save_dir, rank=self.local_rank
+                )
+                txt_path = os.path.join(self.cfg.save_dir, "eval_results.txt")
+                with open(txt_path, "a") as f:
+                    for k, v in eval_results.items():
+                        f.write("{}: {}\n".format(k, v))
+        else:
+            self.logger.info("Skip test on rank {}".format(self.local_rank))
+
+    def configure_optimizers(self):
+        """
+        Prepare optimizer and learning-rate scheduler
+        to use in optimization.
+
+        Returns:
+            optimizer
+        """
+        optimizer_cfg = copy.deepcopy(self.cfg.schedule.optimizer)
+        optimizer = build_optimizer(self.model, optimizer_cfg)
+
+        schedule_cfg = copy.deepcopy(self.cfg.schedule.lr_schedule)
+        name = schedule_cfg.pop("name")
+        build_scheduler = getattr(torch.optim.lr_scheduler, name)
+        scheduler = {
+            "scheduler": build_scheduler(optimizer=optimizer, **schedule_cfg),
+            "interval": "epoch",
+            "frequency": 1,
+        }
+        return dict(optimizer=optimizer, lr_scheduler=scheduler)
+
+    def optimizer_step(
+        self,
+        epoch=None,
+        batch_idx=None,
+        optimizer=None,
+        optimizer_idx=None,
+        optimizer_closure=None,
+        on_tpu=None,
+        using_native_amp=None,
+        using_lbfgs=None,
+    ):
+        """
+        Performs a single optimization step (parameter update).
+        Args:
+            epoch: Current epoch
+            batch_idx: Index of current batch
+            optimizer: A PyTorch optimizer
+            optimizer_idx: If you used multiple optimizers this indexes into that list.
+            optimizer_closure: closure for all optimizers
+            on_tpu: true if TPU backward is required
+            using_native_amp: True if using native amp
+            using_lbfgs: True if the matching optimizer is lbfgs
+        """
+        # warm up lr
+        if self.trainer.global_step <= self.cfg.schedule.warmup.steps:
+            if self.cfg.schedule.warmup.name == "constant":
+                k = self.cfg.schedule.warmup.ratio
+            elif self.cfg.schedule.warmup.name == "linear":
+                k = 1 - (
+                    1 - self.trainer.global_step / self.cfg.schedule.warmup.steps
+                ) * (1 - self.cfg.schedule.warmup.ratio)
+            elif self.cfg.schedule.warmup.name == "exp":
+                k = self.cfg.schedule.warmup.ratio ** (
+                    1 - self.trainer.global_step / self.cfg.schedule.warmup.steps
+                )
+            else:
+                raise Exception("Unsupported warm up type!")
+            for pg in optimizer.param_groups:
+                pg["lr"] = pg["initial_lr"] * k
+
+        # update params
+        optimizer.step(closure=optimizer_closure)
+        optimizer.zero_grad()
+
+    def scalar_summary(self, tag, phase, value, step):
+        """
+        Write Tensorboard scalar summary log.
+        Args:
+            tag: Name for the tag
+            phase: 'Train' or 'Val'
+            value: Value to record
+            step: Step value to record
+
+        """
+        if self.local_rank < 1:
+            self.logger.experiment.add_scalars(tag, {phase: value}, step)
+
+    def info(self, string):
+        self.logger.info(string)
+
+    @rank_zero_only
+    def save_model_state(self, path):
+        self.logger.info("Saving model to {}".format(path))
+        state_dict = (
+            self.weight_averager.state_dict()
+            if self.weight_averager
+            else self.model.state_dict()
+        )
+        torch.save({"state_dict": state_dict}, path)
+
+    # ------------Hooks-----------------
+    def on_fit_start(self) -> None:
+        if "weight_averager" in self.cfg.model:
+            self.logger.info("Weight Averaging is enabled")
+            if self.weight_averager and self.weight_averager.has_inited():
+                self.weight_averager.to(self.weight_averager.device)
+                return
+            self.weight_averager = build_weight_averager(
+                self.cfg.model.weight_averager, device=self.device
+            )
+            self.weight_averager.load_from(self.model)
+
+    def on_train_epoch_start(self):
+        self.model.set_epoch(self.current_epoch)
+
+    def on_train_batch_end(self, outputs, batch, batch_idx) -> None:
+        if self.weight_averager:
+            self.weight_averager.update(self.model, self.global_step)
+
+    def on_validation_epoch_start(self):
+        if self.weight_averager:
+            self.weight_averager.apply_to(self.avg_model)
+
+    def on_test_epoch_start(self) -> None:
+        if self.weight_averager:
+            self.on_load_checkpoint({"state_dict": self.state_dict()})
+            self.weight_averager.apply_to(self.model)
+
+    def on_load_checkpoint(self, checkpointed_state: Dict[str, Any]) -> None:
+        if self.weight_averager:
+            avg_params = convert_avg_params(checkpointed_state)
+            if len(avg_params) != len(self.model.state_dict()):
+                self.logger.info(
+                    "Weight averaging is enabled but average state does not"
+                    "match the model"
+                )
+            else:
+                self.weight_averager = build_weight_averager(
+                    self.cfg.model.weight_averager, device=self.device
+                )
+                self.weight_averager.load_state_dict(avg_params)
+                self.logger.info("Loaded average state from checkpoint.")
diff --git a/nanodet/util/__init__.py b/nanodet/util/__init__.py
new file mode 100644
index 0000000..46ccfab
--- /dev/null
+++ b/nanodet/util/__init__.py
@@ -0,0 +1,43 @@
+from .box_transform import bbox2distance, distance2bbox
+from .check_point import (
+    convert_avg_params,
+    convert_old_model,
+    load_model_weight,
+    save_model,
+)
+from .config import cfg, load_config
+from .flops_counter import get_model_complexity_info
+from .logger import AverageMeter, Logger, MovingAverage, NanoDetLightningLogger
+from .misc import images_to_levels, multi_apply, unmap
+from .path import collect_files, mkdir
+from .rank_filter import rank_filter
+from .scatter_gather import gather_results, scatter_kwargs
+from .util_mixins import NiceRepr
+from .visualization import Visualizer, overlay_bbox_cv
+
+__all__ = [
+    "distance2bbox",
+    "bbox2distance",
+    "convert_old_model",
+    "load_model_weight",
+    "save_model",
+    "cfg",
+    "load_config",
+    "get_model_complexity_info",
+    "AverageMeter",
+    "Logger",
+    "MovingAverage",
+    "images_to_levels",
+    "multi_apply",
+    "unmap",
+    "mkdir",
+    "rank_filter",
+    "gather_results",
+    "scatter_kwargs",
+    "NiceRepr",
+    "Visualizer",
+    "overlay_bbox_cv",
+    "collect_files",
+    "NanoDetLightningLogger",
+    "convert_avg_params",
+]
diff --git a/nanodet/util/box_transform.py b/nanodet/util/box_transform.py
new file mode 100644
index 0000000..4b82a8c
--- /dev/null
+++ b/nanodet/util/box_transform.py
@@ -0,0 +1,49 @@
+import torch
+
+
+def distance2bbox(points, distance, max_shape=None):
+    """Decode distance prediction to bounding box.
+
+    Args:
+        points (Tensor): Shape (n, 2), [x, y].
+        distance (Tensor): Distance from the given point to 4
+            boundaries (left, top, right, bottom).
+        max_shape (tuple): Shape of the image.
+
+    Returns:
+        Tensor: Decoded bboxes.
+    """
+    x1 = points[..., 0] - distance[..., 0]
+    y1 = points[..., 1] - distance[..., 1]
+    x2 = points[..., 0] + distance[..., 2]
+    y2 = points[..., 1] + distance[..., 3]
+    if max_shape is not None:
+        x1 = x1.clamp(min=0, max=max_shape[1])
+        y1 = y1.clamp(min=0, max=max_shape[0])
+        x2 = x2.clamp(min=0, max=max_shape[1])
+        y2 = y2.clamp(min=0, max=max_shape[0])
+    return torch.stack([x1, y1, x2, y2], -1)
+
+
+def bbox2distance(points, bbox, max_dis=None, eps=0.1):
+    """Decode bounding box based on distances.
+
+    Args:
+        points (Tensor): Shape (n, 2), [x, y].
+        bbox (Tensor): Shape (n, 4), "xyxy" format
+        max_dis (float): Upper bound of the distance.
+        eps (float): a small value to ensure target < max_dis, instead <=
+
+    Returns:
+        Tensor: Decoded distances.
+    """
+    left = points[:, 0] - bbox[:, 0]
+    top = points[:, 1] - bbox[:, 1]
+    right = bbox[:, 2] - points[:, 0]
+    bottom = bbox[:, 3] - points[:, 1]
+    if max_dis is not None:
+        left = left.clamp(min=0, max=max_dis - eps)
+        top = top.clamp(min=0, max=max_dis - eps)
+        right = right.clamp(min=0, max=max_dis - eps)
+        bottom = bottom.clamp(min=0, max=max_dis - eps)
+    return torch.stack([left, top, right, bottom], -1)
diff --git a/nanodet/util/check_point.py b/nanodet/util/check_point.py
new file mode 100644
index 0000000..d88c3fa
--- /dev/null
+++ b/nanodet/util/check_point.py
@@ -0,0 +1,111 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from collections import OrderedDict
+from typing import Any, Dict
+
+import pytorch_lightning as pl
+import torch
+
+from .rank_filter import rank_filter
+
+
+def load_model_weight(model, checkpoint, logger):
+    state_dict = checkpoint["state_dict"].copy()
+    for k in checkpoint["state_dict"]:
+        # convert average model weights
+        if k.startswith("avg_model."):
+            v = state_dict.pop(k)
+            state_dict[k[4:]] = v
+    # strip prefix of state_dict
+    if list(state_dict.keys())[0].startswith("module."):
+        state_dict = {k[7:]: v for k, v in state_dict.items()}
+    if list(state_dict.keys())[0].startswith("model."):
+        state_dict = {k[6:]: v for k, v in state_dict.items()}
+
+    model_state_dict = (
+        model.module.state_dict() if hasattr(model, "module") else model.state_dict()
+    )
+
+    # check loaded parameters and created model parameters
+    for k in state_dict:
+        if k in model_state_dict:
+            if state_dict[k].shape != model_state_dict[k].shape:
+                logger.log(
+                    "Skip loading parameter {}, required shape{}, "
+                    "loaded shape{}.".format(
+                        k, model_state_dict[k].shape, state_dict[k].shape
+                    )
+                )
+                state_dict[k] = model_state_dict[k]
+        else:
+            logger.log("Drop parameter {}.".format(k))
+    for k in model_state_dict:
+        if not (k in state_dict):
+            logger.log("No param {}.".format(k))
+            state_dict[k] = model_state_dict[k]
+    model.load_state_dict(state_dict, strict=False)
+
+
+@rank_filter
+def save_model(model, path, epoch, iter, optimizer=None):
+    model_state_dict = (
+        model.module.state_dict() if hasattr(model, "module") else model.state_dict()
+    )
+    data = {"epoch": epoch, "state_dict": model_state_dict, "iter": iter}
+    if optimizer is not None:
+        data["optimizer"] = optimizer.state_dict()
+
+    torch.save(data, path)
+
+
+def convert_old_model(old_model_dict):
+    if "pytorch-lightning_version" in old_model_dict:
+        raise ValueError("This model is not old format. No need to convert!")
+    version = pl.__version__
+    epoch = old_model_dict["epoch"]
+    global_step = old_model_dict["iter"]
+    state_dict = old_model_dict["state_dict"]
+    new_state_dict = OrderedDict()
+    for name, value in state_dict.items():
+        new_state_dict["model." + name] = value
+
+    new_checkpoint = {
+        "epoch": epoch,
+        "global_step": global_step,
+        "pytorch-lightning_version": version,
+        "state_dict": new_state_dict,
+        "lr_schedulers": [],
+    }
+
+    if "optimizer" in old_model_dict:
+        optimizer_states = [old_model_dict["optimizer"]]
+        new_checkpoint["optimizer_states"] = optimizer_states
+
+    return new_checkpoint
+
+
+def convert_avg_params(checkpoint: Dict[str, Any]) -> Dict[str, Any]:
+    """Converts average state dict to the format that can be loaded to a model.
+    Args:
+        checkpoint: model.
+    Returns:
+        Converted average state dict.
+    """
+    state_dict = checkpoint["state_dict"]
+    avg_weights = {}
+    for k, v in state_dict.items():
+        if "avg_model" in k:
+            avg_weights[k[10:]] = v
+    return avg_weights
diff --git a/nanodet/util/config.py b/nanodet/util/config.py
new file mode 100644
index 0000000..8ff104b
--- /dev/null
+++ b/nanodet/util/config.py
@@ -0,0 +1,39 @@
+from .yacs import CfgNode
+
+cfg = CfgNode(new_allowed=True)
+cfg.save_dir = "./"
+# common params for NETWORK
+cfg.model = CfgNode(new_allowed=True)
+cfg.model.arch = CfgNode(new_allowed=True)
+cfg.model.arch.backbone = CfgNode(new_allowed=True)
+cfg.model.arch.fpn = CfgNode(new_allowed=True)
+cfg.model.arch.head = CfgNode(new_allowed=True)
+
+# DATASET related params
+cfg.data = CfgNode(new_allowed=True)
+cfg.data.train = CfgNode(new_allowed=True)
+cfg.data.val = CfgNode(new_allowed=True)
+cfg.device = CfgNode(new_allowed=True)
+# train
+cfg.schedule = CfgNode(new_allowed=True)
+
+# logger
+cfg.log = CfgNode()
+cfg.log.interval = 50
+
+# testing
+cfg.test = CfgNode()
+# size of images for each device
+
+
+def load_config(cfg, args_cfg):
+    cfg.defrost()
+    cfg.merge_from_file(args_cfg)
+    cfg.freeze()
+
+
+if __name__ == "__main__":
+    import sys
+
+    with open(sys.argv[1], "w") as f:
+        print(cfg, file=f)
diff --git a/nanodet/util/env_utils.py b/nanodet/util/env_utils.py
new file mode 100644
index 0000000..ec332a9
--- /dev/null
+++ b/nanodet/util/env_utils.py
@@ -0,0 +1,65 @@
+import os
+import platform
+import warnings
+
+import torch.multiprocessing as mp
+
+
+def set_multi_processing(
+    mp_start_method: str = "fork", opencv_num_threads: int = 0, distributed: bool = True
+) -> None:
+    """Set multi-processing related environment.
+
+    This function is refered from https://github.com/open-mmlab/mmengine/blob/main/mmengine/utils/dl_utils/setup_env.py
+
+    Args:
+        mp_start_method (str): Set the method which should be used to start
+            child processes. Defaults to 'fork'.
+        opencv_num_threads (int): Number of threads for opencv.
+            Defaults to 0.
+        distributed (bool): True if distributed environment.
+            Defaults to False.
+    """  # noqa
+    # set multi-process start method as `fork` to speed up the training
+    if platform.system() != "Windows":
+        current_method = mp.get_start_method(allow_none=True)
+        if current_method is not None and current_method != mp_start_method:
+            warnings.warn(
+                f"Multi-processing start method `{mp_start_method}` is "
+                f"different from the previous setting `{current_method}`."
+                f"It will be force set to `{mp_start_method}`. You can "
+                "change this behavior by changing `mp_start_method` in "
+                "your config."
+            )
+        mp.set_start_method(mp_start_method, force=True)
+
+    try:
+        import cv2
+
+        # disable opencv multithreading to avoid system being overloaded
+        cv2.setNumThreads(opencv_num_threads)
+    except ImportError:
+        pass
+
+    # setup OMP threads
+    # This code is referred from https://github.com/pytorch/pytorch/blob/master/torch/distributed/run.py  # noqa
+    if "OMP_NUM_THREADS" not in os.environ and distributed:
+        omp_num_threads = 1
+        warnings.warn(
+            "Setting OMP_NUM_THREADS environment variable for each process"
+            f" to be {omp_num_threads} in default, to avoid your system "
+            "being overloaded, please further tune the variable for "
+            "optimal performance in your application as needed."
+        )
+        os.environ["OMP_NUM_THREADS"] = str(omp_num_threads)
+
+    # setup MKL threads
+    if "MKL_NUM_THREADS" not in os.environ and distributed:
+        mkl_num_threads = 1
+        warnings.warn(
+            "Setting MKL_NUM_THREADS environment variable for each process"
+            f" to be {mkl_num_threads} in default, to avoid your system "
+            "being overloaded, please further tune the variable for "
+            "optimal performance in your application as needed."
+        )
+        os.environ["MKL_NUM_THREADS"] = str(mkl_num_threads)
diff --git a/nanodet/util/flops_counter.py b/nanodet/util/flops_counter.py
new file mode 100644
index 0000000..baddd37
--- /dev/null
+++ b/nanodet/util/flops_counter.py
@@ -0,0 +1,575 @@
+# Modified from flops-counter.pytorch by Vladislav Sovrasov
+# original repo: https://github.com/sovrasov/flops-counter.pytorch
+
+# MIT License
+
+# Copyright (c) 2018 Vladislav Sovrasov
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+import sys
+from functools import partial
+
+import numpy as np
+import torch
+import torch.nn as nn
+
+
+def get_model_complexity_info(
+    model,
+    input_shape,
+    print_per_layer_stat=True,
+    as_strings=True,
+    input_constructor=None,
+    flush=False,
+    ost=sys.stdout,
+):
+    """Get complexity information of a model.
+    This method can calculate FLOPs and parameter counts of a model with
+    corresponding input shape. It can also print complexity information for
+    each layer in a model.
+    Supported layers are listed as below:
+        - Convolutions: ``nn.Conv1d``, ``nn.Conv2d``, ``nn.Conv3d``.
+        - Activations: ``nn.ReLU``, ``nn.PReLU``, ``nn.ELU``, ``nn.LeakyReLU``,
+            ``nn.ReLU6``.
+        - Poolings: ``nn.MaxPool1d``, ``nn.MaxPool2d``, ``nn.MaxPool3d``,
+            ``nn.AvgPool1d``, ``nn.AvgPool2d``, ``nn.AvgPool3d``,
+            ``nn.AdaptiveMaxPool1d``, ``nn.AdaptiveMaxPool2d``,
+            ``nn.AdaptiveMaxPool3d``, ``nn.AdaptiveAvgPool1d``,
+            ``nn.AdaptiveAvgPool2d``, ``nn.AdaptiveAvgPool3d``.
+        - BatchNorms: ``nn.BatchNorm1d``, ``nn.BatchNorm2d``,
+            ``nn.BatchNorm3d``.
+        - Linear: ``nn.Linear``.
+        - Deconvolution: ``nn.ConvTranspose2d``.
+        - Upsample: ``nn.Upsample``.
+    Args:
+        model (nn.Module): The model for complexity calculation.
+        input_shape (tuple): Input shape used for calculation.
+        print_per_layer_stat (bool): Whether to print complexity information
+            for each layer in a model. Default: True.
+        as_strings (bool): Output FLOPs and params counts in a string form.
+            Default: True.
+        input_constructor (None | callable): If specified, it takes a callable
+            method that generates input. otherwise, it will generate a random
+            tensor with input shape to calculate FLOPs. Default: None.
+        flush (bool): same as that in :func:`print`. Default: False.
+        ost (stream): same as ``file`` param in :func:`print`.
+            Default: sys.stdout.
+    Returns:
+        tuple[float | str]: If ``as_strings`` is set to True, it will return
+            FLOPs and parameter counts in a string format. otherwise, it will
+            return those in a float number format.
+    """
+    assert type(input_shape) is tuple
+    assert len(input_shape) >= 1
+    assert isinstance(model, nn.Module)
+    flops_model = add_flops_counting_methods(model)
+    flops_model.eval()
+    flops_model.start_flops_count()
+    if input_constructor:
+        input = input_constructor(input_shape)
+        _ = flops_model(**input)
+    else:
+        try:
+            batch = torch.ones(()).new_empty(
+                (1, *input_shape),
+                dtype=next(flops_model.parameters()).dtype,
+                device=next(flops_model.parameters()).device,
+            )
+        except StopIteration:
+            # Avoid StopIteration for models which have no parameters,
+            # like `nn.Relu()`, `nn.AvgPool2d`, etc.
+            batch = torch.ones(()).new_empty((1, *input_shape))
+
+        _ = flops_model(batch)
+
+    flops_count, params_count = flops_model.compute_average_flops_cost()
+    if print_per_layer_stat:
+        print_model_with_flops(
+            flops_model, flops_count, params_count, ost=ost, flush=flush
+        )
+    flops_model.stop_flops_count()
+
+    if as_strings:
+        return flops_to_string(flops_count), params_to_string(params_count)
+
+    return flops_count, params_count
+
+
+def flops_to_string(flops, units="GFLOPs", precision=2):
+    """Convert FLOPs number into a string.
+    Note that Here we take a multiply-add counts as one FLOP.
+    Args:
+        flops (float): FLOPs number to be converted.
+        units (str | None): Converted FLOPs units. Options are None, 'GFLOPs',
+            'MFLOPs', 'KFLOPs', 'FLOPs'. If set to None, it will automatically
+            choose the most suitable unit for FLOPs. Default: 'GFLOPs'.
+        precision (int): Digit number after the decimal point. Default: 2.
+    Returns:
+        str: The converted FLOPs number with units.
+    Examples:
+        >>> flops_to_string(1e9)
+        '1.0 GFLOPs'
+        >>> flops_to_string(2e5, 'MFLOPs')
+        '0.2 MFLOPs'
+        >>> flops_to_string(3e-9, None)
+        '3e-09 FLOPs'
+    """
+    if units is None:
+        if flops // 10**9 > 0:
+            return str(round(flops / 10.0**9, precision)) + " GFLOPs"
+        elif flops // 10**6 > 0:
+            return str(round(flops / 10.0**6, precision)) + " MFLOPs"
+        elif flops // 10**3 > 0:
+            return str(round(flops / 10.0**3, precision)) + " KFLOPs"
+        else:
+            return str(flops) + " FLOPs"
+    else:
+        if units == "GFLOPs":
+            return str(round(flops / 10.0**9, precision)) + " " + units
+        elif units == "MFLOPs":
+            return str(round(flops / 10.0**6, precision)) + " " + units
+        elif units == "KFLOPs":
+            return str(round(flops / 10.0**3, precision)) + " " + units
+        else:
+            return str(flops) + " FLOPs"
+
+
+def params_to_string(num_params, units=None, precision=2):
+    """Convert parameter number into a string.
+    Args:
+        num_params (float): Parameter number to be converted.
+        units (str | None): Converted FLOPs units. Options are None, 'M',
+            'K' and ''. If set to None, it will automatically choose the most
+            suitable unit for Parameter number. Default: None.
+        precision (int): Digit number after the decimal point. Default: 2.
+    Returns:
+        str: The converted parameter number with units.
+    Examples:
+        >>> params_to_string(1e9)
+        '1000.0 M'
+        >>> params_to_string(2e5)
+        '200.0 k'
+        >>> params_to_string(3e-9)
+        '3e-09'
+    """
+    if units is None:
+        if num_params // 10**6 > 0:
+            return str(round(num_params / 10**6, precision)) + " M"
+        elif num_params // 10**3:
+            return str(round(num_params / 10**3, precision)) + " k"
+        else:
+            return str(num_params)
+    else:
+        if units == "M":
+            return str(round(num_params / 10.0**6, precision)) + " " + units
+        elif units == "K":
+            return str(round(num_params / 10.0**3, precision)) + " " + units
+        else:
+            return str(num_params)
+
+
+def print_model_with_flops(
+    model,
+    total_flops,
+    total_params,
+    units="GFLOPs",
+    precision=3,
+    ost=sys.stdout,
+    flush=False,
+):
+    """Print a model with FLOPs for each layer.
+    Args:
+        model (nn.Module): The model to be printed.
+        total_flops (float): Total FLOPs of the model.
+        total_params (float): Total parameter counts of the model.
+        units (str | None): Converted FLOPs units. Default: 'GFLOPs'.
+        precision (int): Digit number after the decimal point. Default: 3.
+        ost (stream): same as `file` param in :func:`print`.
+            Default: sys.stdout.
+        flush (bool): same as that in :func:`print`. Default: False.
+    Example:
+        >>> class ExampleModel(nn.Module):
+        >>> def __init__(self):
+        >>>     super().__init__()
+        >>>     self.conv1 = nn.Conv2d(3, 8, 3)
+        >>>     self.conv2 = nn.Conv2d(8, 256, 3)
+        >>>     self.conv3 = nn.Conv2d(256, 8, 3)
+        >>>     self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
+        >>>     self.flatten = nn.Flatten()
+        >>>     self.fc = nn.Linear(8, 1)
+        >>> def forward(self, x):
+        >>>     x = self.conv1(x)
+        >>>     x = self.conv2(x)
+        >>>     x = self.conv3(x)
+        >>>     x = self.avg_pool(x)
+        >>>     x = self.flatten(x)
+        >>>     x = self.fc(x)
+        >>>     return x
+        >>> model = ExampleModel()
+        >>> x = (3, 16, 16)
+        to print the complexity inforamtion state for each layer, you can use
+        >>> get_model_complexity_info(model, x)
+        or directly use
+        >>> print_model_with_flops(model, 4579784.0, 37361)
+        ExampleModel(
+          0.037 M, 100.000% Params, 0.005 GFLOPs, 100.000% FLOPs,
+          (conv1): Conv2d(0.0 M, 0.600% Params, 0.0 GFLOPs, 0.959% FLOPs, 3, 8, kernel_size=(3, 3), stride=(1, 1))  # noqa: E501
+          (conv2): Conv2d(0.019 M, 50.020% Params, 0.003 GFLOPs, 58.760% FLOPs, 8, 256, kernel_size=(3, 3), stride=(1, 1))
+          (conv3): Conv2d(0.018 M, 49.356% Params, 0.002 GFLOPs, 40.264% FLOPs, 256, 8, kernel_size=(3, 3), stride=(1, 1))
+          (avg_pool): AdaptiveAvgPool2d(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.017% FLOPs, output_size=(1, 1))
+          (flatten): Flatten(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, )
+          (fc): Linear(0.0 M, 0.024% Params, 0.0 GFLOPs, 0.000% FLOPs, in_features=8, out_features=1, bias=True)
+        )
+    """
+
+    def accumulate_params(self):
+        if is_supported_instance(self):
+            return self.__params__
+        else:
+            sum = 0
+            for m in self.children():
+                sum += m.accumulate_params()
+            return sum
+
+    def accumulate_flops(self):
+        if is_supported_instance(self):
+            return self.__flops__ / model.__batch_counter__
+        else:
+            sum = 0
+            for m in self.children():
+                sum += m.accumulate_flops()
+            return sum
+
+    def flops_repr(self):
+        accumulated_num_params = self.accumulate_params()
+        accumulated_flops_cost = self.accumulate_flops()
+        return ", ".join(
+            [
+                params_to_string(
+                    accumulated_num_params, units="M", precision=precision
+                ),
+                "{:.3%} Params".format(accumulated_num_params / total_params),
+                flops_to_string(
+                    accumulated_flops_cost, units=units, precision=precision
+                ),
+                "{:.3%} FLOPs".format(accumulated_flops_cost / total_flops),
+                self.original_extra_repr(),
+            ]
+        )
+
+    def add_extra_repr(m):
+        m.accumulate_flops = accumulate_flops.__get__(m)
+        m.accumulate_params = accumulate_params.__get__(m)
+        flops_extra_repr = flops_repr.__get__(m)
+        if m.extra_repr != flops_extra_repr:
+            m.original_extra_repr = m.extra_repr
+            m.extra_repr = flops_extra_repr
+            assert m.extra_repr != m.original_extra_repr
+
+    def del_extra_repr(m):
+        if hasattr(m, "original_extra_repr"):
+            m.extra_repr = m.original_extra_repr
+            del m.original_extra_repr
+        if hasattr(m, "accumulate_flops"):
+            del m.accumulate_flops
+
+    model.apply(add_extra_repr)
+    print(model, file=ost, flush=flush)
+    model.apply(del_extra_repr)
+
+
+def get_model_parameters_number(model):
+    """Calculate parameter number of a model.
+    Args:
+        model (nn.module): The model for parameter number calculation.
+    Returns:
+        float: Parameter number of the model.
+    """
+    num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
+    return num_params
+
+
+def add_flops_counting_methods(net_main_module):
+    # adding additional methods to the existing module object,
+    # this is done this way so that each function has access to self object
+    net_main_module.start_flops_count = start_flops_count.__get__(net_main_module)
+    net_main_module.stop_flops_count = stop_flops_count.__get__(net_main_module)
+    net_main_module.reset_flops_count = reset_flops_count.__get__(net_main_module)
+    net_main_module.compute_average_flops_cost = compute_average_flops_cost.__get__(
+        net_main_module
+    )  # noqa: E501
+
+    net_main_module.reset_flops_count()
+
+    return net_main_module
+
+
+def compute_average_flops_cost(self):
+    """Compute average FLOPs cost.
+    A method to compute average FLOPs cost, which will be available after
+    `add_flops_counting_methods()` is called on a desired net object.
+    Returns:
+        float: Current mean flops consumption per image.
+    """
+    batches_count = self.__batch_counter__
+    flops_sum = 0
+    for module in self.modules():
+        if is_supported_instance(module):
+            flops_sum += module.__flops__
+    params_sum = get_model_parameters_number(self)
+    return flops_sum / batches_count, params_sum
+
+
+def start_flops_count(self):
+    """Activate the computation of mean flops consumption per image.
+    A method to activate the computation of mean flops consumption per image.
+    which will be available after ``add_flops_counting_methods()`` is called on
+    a desired net object. It should be called before running the network.
+    """
+    add_batch_counter_hook_function(self)
+
+    def add_flops_counter_hook_function(module):
+        if is_supported_instance(module):
+            if hasattr(module, "__flops_handle__"):
+                return
+
+            else:
+                handle = module.register_forward_hook(MODULES_MAPPING[type(module)])
+
+            module.__flops_handle__ = handle
+
+    self.apply(partial(add_flops_counter_hook_function))
+
+
+def stop_flops_count(self):
+    """Stop computing the mean flops consumption per image.
+    A method to stop computing the mean flops consumption per image, which will
+    be available after ``add_flops_counting_methods()`` is called on a desired
+    net object. It can be called to pause the computation whenever.
+    """
+    remove_batch_counter_hook_function(self)
+    self.apply(remove_flops_counter_hook_function)
+
+
+def reset_flops_count(self):
+    """Reset statistics computed so far.
+    A method to Reset computed statistics, which will be available after
+    `add_flops_counting_methods()` is called on a desired net object.
+    """
+    add_batch_counter_variables_or_reset(self)
+    self.apply(add_flops_counter_variable_or_reset)
+
+
+# ---- Internal functions
+def empty_flops_counter_hook(module, input, output):
+    module.__flops__ += 0
+
+
+def upsample_flops_counter_hook(module, input, output):
+    output_size = output[0]
+    batch_size = output_size.shape[0]
+    output_elements_count = batch_size
+    for val in output_size.shape[1:]:
+        output_elements_count *= val
+    module.__flops__ += int(output_elements_count)
+
+
+def relu_flops_counter_hook(module, input, output):
+    active_elements_count = output.numel()
+    module.__flops__ += int(active_elements_count)
+
+
+def linear_flops_counter_hook(module, input, output):
+    input = input[0]
+    output_last_dim = output.shape[
+        -1
+    ]  # pytorch checks dimensions, so here we don't care much
+    module.__flops__ += int(np.prod(input.shape) * output_last_dim)
+
+
+def pool_flops_counter_hook(module, input, output):
+    input = input[0]
+    module.__flops__ += int(np.prod(input.shape))
+
+
+def bn_flops_counter_hook(module, input, output):
+    input = input[0]
+
+    batch_flops = np.prod(input.shape)
+    if module.affine:
+        batch_flops *= 2
+    module.__flops__ += int(batch_flops)
+
+
+def deconv_flops_counter_hook(conv_module, input, output):
+    # Can have multiple inputs, getting the first one
+    input = input[0]
+
+    batch_size = input.shape[0]
+    input_height, input_width = input.shape[2:]
+
+    kernel_height, kernel_width = conv_module.kernel_size
+    in_channels = conv_module.in_channels
+    out_channels = conv_module.out_channels
+    groups = conv_module.groups
+
+    filters_per_channel = out_channels // groups
+    conv_per_position_flops = (
+        kernel_height * kernel_width * in_channels * filters_per_channel
+    )
+
+    active_elements_count = batch_size * input_height * input_width
+    overall_conv_flops = conv_per_position_flops * active_elements_count
+    bias_flops = 0
+    if conv_module.bias is not None:
+        output_height, output_width = output.shape[2:]
+        bias_flops = out_channels * batch_size * output_height * output_height
+    overall_flops = overall_conv_flops + bias_flops
+
+    conv_module.__flops__ += int(overall_flops)
+
+
+def conv_flops_counter_hook(conv_module, input, output):
+    # Can have multiple inputs, getting the first one
+    input = input[0]
+
+    batch_size = input.shape[0]
+    output_dims = list(output.shape[2:])
+
+    kernel_dims = list(conv_module.kernel_size)
+    in_channels = conv_module.in_channels
+    out_channels = conv_module.out_channels
+    groups = conv_module.groups
+
+    filters_per_channel = out_channels // groups
+    conv_per_position_flops = (
+        int(np.prod(kernel_dims)) * in_channels * filters_per_channel
+    )
+
+    active_elements_count = batch_size * int(np.prod(output_dims))
+
+    overall_conv_flops = conv_per_position_flops * active_elements_count
+
+    bias_flops = 0
+
+    if conv_module.bias is not None:
+
+        bias_flops = out_channels * active_elements_count
+
+    overall_flops = overall_conv_flops + bias_flops
+
+    conv_module.__flops__ += int(overall_flops)
+
+
+def batch_counter_hook(module, input, output):
+    batch_size = 1
+    if len(input) > 0:
+        # Can have multiple inputs, getting the first one
+        input = input[0]
+        batch_size = len(input)
+    else:
+        pass
+        print(
+            "Warning! No positional inputs found for a module, "
+            "assuming batch size is 1."
+        )
+    module.__batch_counter__ += batch_size
+
+
+def add_batch_counter_variables_or_reset(module):
+
+    module.__batch_counter__ = 0
+
+
+def add_batch_counter_hook_function(module):
+    if hasattr(module, "__batch_counter_handle__"):
+        return
+
+    handle = module.register_forward_hook(batch_counter_hook)
+    module.__batch_counter_handle__ = handle
+
+
+def remove_batch_counter_hook_function(module):
+    if hasattr(module, "__batch_counter_handle__"):
+        module.__batch_counter_handle__.remove()
+        del module.__batch_counter_handle__
+
+
+def add_flops_counter_variable_or_reset(module):
+    if is_supported_instance(module):
+        if hasattr(module, "__flops__") or hasattr(module, "__params__"):
+            print(
+                "Warning: variables __flops__ or __params__ are already "
+                "defined for the module"
+                + type(module).__name__
+                + " ptflops can affect your code!"
+            )
+        module.__flops__ = 0
+        module.__params__ = get_model_parameters_number(module)
+
+
+def is_supported_instance(module):
+    if type(module) in MODULES_MAPPING:
+        return True
+    return False
+
+
+def remove_flops_counter_hook_function(module):
+    if is_supported_instance(module):
+        if hasattr(module, "__flops_handle__"):
+            module.__flops_handle__.remove()
+            del module.__flops_handle__
+
+
+MODULES_MAPPING = {
+    # convolutions
+    nn.Conv1d: conv_flops_counter_hook,
+    nn.Conv2d: conv_flops_counter_hook,
+    nn.Conv3d: conv_flops_counter_hook,
+    # activations
+    nn.ReLU: relu_flops_counter_hook,
+    nn.PReLU: relu_flops_counter_hook,
+    nn.ELU: relu_flops_counter_hook,
+    nn.LeakyReLU: relu_flops_counter_hook,
+    nn.ReLU6: relu_flops_counter_hook,
+    # poolings
+    nn.MaxPool1d: pool_flops_counter_hook,
+    nn.AvgPool1d: pool_flops_counter_hook,
+    nn.AvgPool2d: pool_flops_counter_hook,
+    nn.MaxPool2d: pool_flops_counter_hook,
+    nn.MaxPool3d: pool_flops_counter_hook,
+    nn.AvgPool3d: pool_flops_counter_hook,
+    nn.AdaptiveMaxPool1d: pool_flops_counter_hook,
+    nn.AdaptiveAvgPool1d: pool_flops_counter_hook,
+    nn.AdaptiveMaxPool2d: pool_flops_counter_hook,
+    nn.AdaptiveAvgPool2d: pool_flops_counter_hook,
+    nn.AdaptiveMaxPool3d: pool_flops_counter_hook,
+    nn.AdaptiveAvgPool3d: pool_flops_counter_hook,
+    # BNs
+    nn.BatchNorm1d: bn_flops_counter_hook,
+    nn.BatchNorm2d: bn_flops_counter_hook,
+    nn.BatchNorm3d: bn_flops_counter_hook,
+    # FC
+    nn.Linear: linear_flops_counter_hook,
+    # Upscale
+    nn.Upsample: upsample_flops_counter_hook,
+    # Deconvolution
+    nn.ConvTranspose2d: deconv_flops_counter_hook,
+}
diff --git a/nanodet/util/logger.py b/nanodet/util/logger.py
new file mode 100644
index 0000000..d726327
--- /dev/null
+++ b/nanodet/util/logger.py
@@ -0,0 +1,225 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import os
+import time
+
+import numpy as np
+from pytorch_lightning.loggers import Logger as LightningLoggerBase
+from pytorch_lightning.loggers.logger import rank_zero_experiment
+from pytorch_lightning.utilities import rank_zero_only
+from pytorch_lightning.utilities.cloud_io import get_filesystem
+from termcolor import colored
+
+from .path import mkdir
+
+
+class Logger:
+    def __init__(self, local_rank, save_dir="./", use_tensorboard=True):
+        mkdir(local_rank, save_dir)
+        self.rank = local_rank
+        fmt = (
+            colored("[%(name)s]", "magenta", attrs=["bold"])
+            + colored("[%(asctime)s]", "blue")
+            + colored("%(levelname)s:", "green")
+            + colored("%(message)s", "white")
+        )
+        logging.basicConfig(
+            level=logging.INFO,
+            filename=os.path.join(save_dir, "logs.txt"),
+            filemode="w",
+        )
+        self.log_dir = os.path.join(save_dir, "logs")
+        console = logging.StreamHandler()
+        console.setLevel(logging.INFO)
+        formatter = logging.Formatter(fmt, datefmt="%m-%d %H:%M:%S")
+        console.setFormatter(formatter)
+        logging.getLogger().addHandler(console)
+        if use_tensorboard:
+            try:
+                from torch.utils.tensorboard import SummaryWriter
+            except ImportError:
+                raise ImportError(
+                    'Please run "pip install future tensorboard" to install '
+                    "the dependencies to use torch.utils.tensorboard "
+                    "(applicable to PyTorch 1.1 or higher)"
+                ) from None
+            if self.rank < 1:
+                logging.info(
+                    "Using Tensorboard, logs will be saved in {}".format(self.log_dir)
+                )
+                self.writer = SummaryWriter(log_dir=self.log_dir)
+
+    def log(self, string):
+        if self.rank < 1:
+            logging.info(string)
+
+    def scalar_summary(self, tag, phase, value, step):
+        if self.rank < 1:
+            self.writer.add_scalars(tag, {phase: value}, step)
+
+
+class MovingAverage(object):
+    def __init__(self, val, window_size=50):
+        self.window_size = window_size
+        self.reset()
+        self.push(val)
+
+    def reset(self):
+        self.queue = []
+
+    def push(self, val):
+        self.queue.append(val)
+        if len(self.queue) > self.window_size:
+            self.queue.pop(0)
+
+    def avg(self):
+        return np.mean(self.queue)
+
+
+class AverageMeter(object):
+    """Computes and stores the average and current value"""
+
+    def __init__(self, val):
+        self.reset()
+        self.update(val)
+
+    def reset(self):
+        self.val = 0
+        self.avg = 0
+        self.sum = 0
+        self.count = 0
+
+    def update(self, val, n=1):
+        self.val = val
+        self.sum += val * n
+        self.count += n
+        if self.count > 0:
+            self.avg = self.sum / self.count
+
+
+class NanoDetLightningLogger(LightningLoggerBase):
+    def __init__(self, save_dir="./", **kwargs):
+        super().__init__()
+        self._name = "NanoDet"
+        self._version = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
+        self.log_dir = os.path.join(save_dir, f"logs-{self._version}")
+
+        self._fs = get_filesystem(save_dir)
+        self._fs.makedirs(self.log_dir, exist_ok=True)
+        self._init_logger()
+
+        self._experiment = None
+        self._kwargs = kwargs
+
+    @property
+    def name(self):
+        return self._name
+
+    @property
+    @rank_zero_experiment
+    def experiment(self):
+        r"""
+        Actual tensorboard object. To use TensorBoard features in your
+        :class:`~pytorch_lightning.core.lightning.LightningModule` do the following.
+
+        Example::
+
+            self.logger.experiment.some_tensorboard_function()
+
+        """
+        if self._experiment is not None:
+            return self._experiment
+
+        assert rank_zero_only.rank == 0, "tried to init log dirs in non global_rank=0"
+
+        try:
+            from torch.utils.tensorboard import SummaryWriter
+        except ImportError:
+            raise ImportError(
+                'Please run "pip install future tensorboard" to install '
+                "the dependencies to use torch.utils.tensorboard "
+                "(applicable to PyTorch 1.1 or higher)"
+            ) from None
+
+        self._experiment = SummaryWriter(log_dir=self.log_dir, **self._kwargs)
+        return self._experiment
+
+    @property
+    def version(self):
+        return self._version
+
+    @rank_zero_only
+    def _init_logger(self):
+        self.logger = logging.getLogger(name=self.name)
+        self.logger.setLevel(logging.INFO)
+
+        # create file handler
+        fh = logging.FileHandler(os.path.join(self.log_dir, "logs.txt"))
+        fh.setLevel(logging.INFO)
+        # set file formatter
+        f_fmt = "[%(name)s][%(asctime)s]%(levelname)s: %(message)s"
+        file_formatter = logging.Formatter(f_fmt, datefmt="%m-%d %H:%M:%S")
+        fh.setFormatter(file_formatter)
+
+        # create console handler
+        ch = logging.StreamHandler()
+        ch.setLevel(logging.INFO)
+        # set console formatter
+        c_fmt = (
+            colored("[%(name)s]", "magenta", attrs=["bold"])
+            + colored("[%(asctime)s]", "blue")
+            + colored("%(levelname)s:", "green")
+            + colored("%(message)s", "white")
+        )
+        console_formatter = logging.Formatter(c_fmt, datefmt="%m-%d %H:%M:%S")
+        ch.setFormatter(console_formatter)
+
+        # add the handlers to the logger
+        self.logger.addHandler(fh)
+        self.logger.addHandler(ch)
+
+    @rank_zero_only
+    def info(self, string):
+        self.logger.info(string)
+
+    @rank_zero_only
+    def log(self, string):
+        self.logger.info(string)
+
+    @rank_zero_only
+    def dump_cfg(self, cfg_node):
+        with open(os.path.join(self.log_dir, "train_cfg.yml"), "w") as f:
+            cfg_node.dump(stream=f)
+
+    @rank_zero_only
+    def log_hyperparams(self, params):
+        self.logger.info(f"hyperparams: {params}")
+
+    @rank_zero_only
+    def log_metrics(self, metrics, step):
+        self.logger.info(f"Val_metrics: {metrics}")
+        for k, v in metrics.items():
+            self.experiment.add_scalars("Val_metrics/" + k, {"Val": v}, step)
+
+    @rank_zero_only
+    def save(self):
+        super().save()
+
+    @rank_zero_only
+    def finalize(self, status):
+        self.experiment.flush()
+        self.experiment.close()
+        self.save()
diff --git a/nanodet/util/misc.py b/nanodet/util/misc.py
new file mode 100644
index 0000000..961b77b
--- /dev/null
+++ b/nanodet/util/misc.py
@@ -0,0 +1,52 @@
+# Modification 2020 RangiLyu
+# Copyright 2018-2019 Open-MMLab.
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+
+#     http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from functools import partial
+
+import torch
+
+
+def multi_apply(func, *args, **kwargs):
+    pfunc = partial(func, **kwargs) if kwargs else func
+    map_results = map(pfunc, *args)
+    return tuple(map(list, zip(*map_results)))
+
+
+def images_to_levels(target, num_level_anchors):
+    """Convert targets by image to targets by feature level.
+
+    [target_img0, target_img1] -> [target_level0, target_level1, ...]
+    """
+    target = torch.stack(target, 0)
+    level_targets = []
+    start = 0
+    for n in num_level_anchors:
+        end = start + n
+        level_targets.append(target[:, start:end].squeeze(0))
+        start = end
+    return level_targets
+
+
+def unmap(data, count, inds, fill=0):
+    """Unmap a subset of item (data) back to the original set of items (of
+    size count)"""
+    if data.dim() == 1:
+        ret = data.new_full((count,), fill)
+        ret[inds.type(torch.bool)] = data
+    else:
+        new_size = (count,) + data.size()[1:]
+        ret = data.new_full(new_size, fill)
+        ret[inds.type(torch.bool), :] = data
+    return ret
diff --git a/nanodet/util/path.py b/nanodet/util/path.py
new file mode 100644
index 0000000..85bfa69
--- /dev/null
+++ b/nanodet/util/path.py
@@ -0,0 +1,34 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+from .rank_filter import rank_filter
+
+
+@rank_filter
+def mkdir(path):
+    if not os.path.exists(path):
+        os.makedirs(path)
+
+
+def collect_files(path, exts):
+    file_paths = []
+    for maindir, subdir, filename_list in os.walk(path):
+        for filename in filename_list:
+            file_path = os.path.join(maindir, filename)
+            ext = os.path.splitext(file_path)[1]
+            if ext in exts:
+                file_paths.append(file_path)
+    return file_paths
diff --git a/nanodet/util/rank_filter.py b/nanodet/util/rank_filter.py
new file mode 100644
index 0000000..2316b2f
--- /dev/null
+++ b/nanodet/util/rank_filter.py
@@ -0,0 +1,23 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+def rank_filter(func):
+    def func_filter(local_rank=-1, *args, **kwargs):
+        if local_rank < 1:
+            return func(*args, **kwargs)
+        else:
+            pass
+
+    return func_filter
diff --git a/nanodet/util/scatter_gather.py b/nanodet/util/scatter_gather.py
new file mode 100644
index 0000000..5660a81
--- /dev/null
+++ b/nanodet/util/scatter_gather.py
@@ -0,0 +1,97 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import pickle
+
+import torch
+import torch.distributed as dist
+from torch.autograd import Variable
+from torch.nn.parallel._functions import Scatter
+
+
+def list_scatter(input, target_gpus, chunk_sizes):
+    ret = []
+    for idx, size in enumerate(chunk_sizes):
+        ret.append(input[:size])
+        del input[:size]
+    return tuple(ret)
+
+
+def scatter(inputs, target_gpus, dim=0, chunk_sizes=None):
+    """
+    Slices variables into approximately equal chunks and
+    distributes them across given GPUs. Duplicates
+    references to objects that are not variables. Does not
+    support Tensors.
+    """
+
+    def scatter_map(obj):
+        if isinstance(obj, Variable):
+            return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
+        assert not torch.is_tensor(obj), "Tensors not supported in scatter."
+        if isinstance(obj, list):
+            return list_scatter(obj, target_gpus, chunk_sizes)
+        if isinstance(obj, tuple):
+            return list(zip(*map(scatter_map, obj)))
+        if isinstance(obj, dict):
+            return list(map(type(obj), zip(*map(scatter_map, obj.items()))))
+        return [obj for targets in target_gpus]
+
+    return scatter_map(inputs)
+
+
+def scatter_kwargs(inputs, kwargs, target_gpus, dim=0, chunk_sizes=None):
+    r"""Scatter with support for kwargs dictionary"""
+    inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
+    kwargs = scatter(kwargs, target_gpus, dim, chunk_sizes) if kwargs else []
+    if len(inputs) < len(kwargs):
+        inputs.extend([() for _ in range(len(kwargs) - len(inputs))])
+    elif len(kwargs) < len(inputs):
+        kwargs.extend([{} for _ in range(len(inputs) - len(kwargs))])
+    inputs = tuple(inputs)
+    kwargs = tuple(kwargs)
+    return inputs, kwargs
+
+
+def gather_results(result_part):
+    rank = -1
+    world_size = 1
+    if dist.is_available() and dist.is_initialized():
+        rank = dist.get_rank()
+        world_size = dist.get_world_size()
+
+    # dump result part to tensor with pickle
+    part_tensor = torch.tensor(
+        bytearray(pickle.dumps(result_part)), dtype=torch.uint8, device="cuda"
+    )
+
+    # gather all result part tensor shape
+    shape_tensor = torch.tensor(part_tensor.shape, device="cuda")
+    shape_list = [shape_tensor.clone() for _ in range(world_size)]
+    dist.all_gather(shape_list, shape_tensor)
+
+    # padding result part tensor to max length
+    shape_max = torch.tensor(shape_list).max()
+    part_send = torch.zeros(shape_max, dtype=torch.uint8, device="cuda")
+    part_send[: shape_tensor[0]] = part_tensor
+    part_recv_list = [part_tensor.new_zeros(shape_max) for _ in range(world_size)]
+
+    # gather all result dict
+    dist.all_gather(part_recv_list, part_send)
+
+    if rank < 1:
+        all_res = {}
+        for recv, shape in zip(part_recv_list, shape_list):
+            all_res.update(pickle.loads(recv[: shape[0]].cpu().numpy().tobytes()))
+        return all_res
diff --git a/nanodet/util/util_mixins.py b/nanodet/util/util_mixins.py
new file mode 100644
index 0000000..278aa03
--- /dev/null
+++ b/nanodet/util/util_mixins.py
@@ -0,0 +1,105 @@
+"""This module defines the :class:`NiceRepr` mixin class, which defines a
+``__repr__`` and ``__str__`` method that only depend on a custom ``__nice__``
+method, which you must define. This means you only have to overload one
+function instead of two.  Furthermore, if the object defines a ``__len__``
+method, then the ``__nice__`` method defaults to something sensible, otherwise
+it is treated as abstract and raises ``NotImplementedError``.
+
+To use simply have your object inherit from :class:`NiceRepr`
+(multi-inheritance should be ok).
+
+This code was copied from the ubelt library: https://github.com/Erotemic/ubelt
+
+Example:
+    >>> # Objects that define __nice__ have a default __str__ and __repr__
+    >>> class Student(NiceRepr):
+    ...    def __init__(self, name):
+    ...        self.name = name
+    ...    def __nice__(self):
+    ...        return self.name
+    >>> s1 = Student('Alice')
+    >>> s2 = Student('Bob')
+    >>> print(f's1 = {s1}')
+    >>> print(f's2 = {s2}')
+    s1 = <Student(Alice)>
+    s2 = <Student(Bob)>
+
+Example:
+    >>> # Objects that define __len__ have a default __nice__
+    >>> class Group(NiceRepr):
+    ...    def __init__(self, data):
+    ...        self.data = data
+    ...    def __len__(self):
+    ...        return len(self.data)
+    >>> g = Group([1, 2, 3])
+    >>> print(f'g = {g}')
+    g = <Group(3)>
+"""
+import warnings
+
+
+class NiceRepr(object):
+    """Inherit from this class and define ``__nice__`` to "nicely" print your
+    objects.
+
+    Defines ``__str__`` and ``__repr__`` in terms of ``__nice__`` function
+    Classes that inherit from :class:`NiceRepr` should redefine ``__nice__``.
+    If the inheriting class has a ``__len__``, method then the default
+    ``__nice__`` method will return its length.
+
+    Example:
+        >>> class Foo(NiceRepr):
+        ...    def __nice__(self):
+        ...        return 'info'
+        >>> foo = Foo()
+        >>> assert str(foo) == '<Foo(info)>'
+        >>> assert repr(foo).startswith('<Foo(info) at ')
+
+    Example:
+        >>> class Bar(NiceRepr):
+        ...    pass
+        >>> bar = Bar()
+        >>> import pytest
+        >>> with pytest.warns(None) as record:
+        >>>     assert 'object at' in str(bar)
+        >>>     assert 'object at' in repr(bar)
+
+    Example:
+        >>> class Baz(NiceRepr):
+        ...    def __len__(self):
+        ...        return 5
+        >>> baz = Baz()
+        >>> assert str(baz) == '<Baz(5)>'
+    """
+
+    def __nice__(self):
+        """str: a "nice" summary string describing this module"""
+        if hasattr(self, "__len__"):
+            # It is a common pattern for objects to use __len__ in __nice__
+            # As a convenience we define a default __nice__ for these objects
+            return str(len(self))
+        else:
+            # In all other cases force the subclass to overload __nice__
+            raise NotImplementedError(
+                f"Define the __nice__ method for {self.__class__!r}"
+            )
+
+    def __repr__(self):
+        """str: the string of the module"""
+        try:
+            nice = self.__nice__()
+            classname = self.__class__.__name__
+            return f"<{classname}({nice}) at {hex(id(self))}>"
+        except NotImplementedError as ex:
+            warnings.warn(str(ex), category=RuntimeWarning)
+            return object.__repr__(self)
+
+    def __str__(self):
+        """str: the string of the module"""
+        try:
+            classname = self.__class__.__name__
+            nice = self.__nice__()
+            return f"<{classname}({nice})>"
+        except NotImplementedError as ex:
+            warnings.warn(str(ex), category=RuntimeWarning)
+            return object.__repr__(self)
diff --git a/nanodet/util/visualization.py b/nanodet/util/visualization.py
new file mode 100644
index 0000000..44badcd
--- /dev/null
+++ b/nanodet/util/visualization.py
@@ -0,0 +1,742 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import cv2
+import matplotlib as mpl
+import matplotlib.figure as mplfigure
+import numpy as np
+import pycocotools.mask as mask_util
+from matplotlib.backends.backend_agg import FigureCanvasAgg
+
+_SMALL_OBJECT_AREA_THRESH = 1000
+
+
+def overlay_bbox_cv(img, dets, class_names, score_thresh):
+    all_box = []
+    for label in dets:
+        for bbox in dets[label]:
+            score = bbox[-1]
+            if score > score_thresh:
+                x0, y0, x1, y1 = [int(i) for i in bbox[:4]]
+                all_box.append([label, x0, y0, x1, y1, score])
+    all_box.sort(key=lambda v: v[5])
+    for box in all_box:
+        label, x0, y0, x1, y1, score = box
+        # color = self.cmap(i)[:3]
+        color = (_COLORS[label] * 255).astype(np.uint8).tolist()
+        text = "{}:{:.1f}%".format(class_names[label], score * 100)
+        txt_color = (0, 0, 0) if np.mean(_COLORS[label]) > 0.5 else (255, 255, 255)
+        font = cv2.FONT_HERSHEY_SIMPLEX
+        txt_size = cv2.getTextSize(text, font, 0.5, 2)[0]
+        cv2.rectangle(img, (x0, y0), (x1, y1), color, 2)
+
+        cv2.rectangle(
+            img,
+            (x0, y0 - txt_size[1] - 1),
+            (x0 + txt_size[0] + txt_size[1], y0 - 1),
+            color,
+            -1,
+        )
+        cv2.putText(img, text, (x0, y0 - 1), font, 0.5, txt_color, thickness=1)
+    return img,all_box
+
+
+def rand_cmap(
+    nlabels,
+    type="bright",
+    first_color_black=False,
+    last_color_black=False,
+    verbose=False,
+):
+    """
+    Creates a random colormap to be used together with matplotlib.
+    Useful for segmentation tasks
+    :param nlabels: Number of labels (size of colormap)
+    :param type: 'bright' for strong colors, 'soft' for pastel colors
+    :param first_color_black: Option to use first color as black, True or False
+    :param last_color_black: Option to use last color as black, True or False
+    :param verbose: Prints the number of labels and shows the colormap. True or False
+    :return: colormap for matplotlib
+    """
+    import colorsys
+
+    import numpy as np
+    from matplotlib.colors import LinearSegmentedColormap
+
+    if type not in ("bright", "soft"):
+        print('Please choose "bright" or "soft" for type')
+        return
+
+    if verbose:
+        print("Number of labels: " + str(nlabels))
+
+    # Generate color map for bright colors, based on hsv
+    if type == "bright":
+        randHSVcolors = [
+            (
+                np.random.uniform(low=0.0, high=1),
+                np.random.uniform(low=0.2, high=1),
+                np.random.uniform(low=0.9, high=1),
+            )
+            for i in range(nlabels)
+        ]
+
+        # Convert HSV list to RGB
+        randRGBcolors = []
+        for HSVcolor in randHSVcolors:
+            randRGBcolors.append(
+                colorsys.hsv_to_rgb(HSVcolor[0], HSVcolor[1], HSVcolor[2])
+            )
+
+        if first_color_black:
+            randRGBcolors[0] = [0, 0, 0]
+
+        if last_color_black:
+            randRGBcolors[-1] = [0, 0, 0]
+
+        random_colormap = LinearSegmentedColormap.from_list(
+            "new_map", randRGBcolors, N=nlabels
+        )
+
+    # Generate soft pastel colors, by limiting the RGB spectrum
+    if type == "soft":
+        low = 0.6
+        high = 0.95
+        randRGBcolors = [
+            (
+                np.random.uniform(low=low, high=high),
+                np.random.uniform(low=low, high=high),
+                np.random.uniform(low=low, high=high),
+            )
+            for i in range(nlabels)
+        ]
+
+        if first_color_black:
+            randRGBcolors[0] = [0, 0, 0]
+
+        if last_color_black:
+            randRGBcolors[-1] = [0, 0, 0]
+        random_colormap = LinearSegmentedColormap.from_list(
+            "new_map", randRGBcolors, N=nlabels
+        )
+
+    return random_colormap
+
+
+class VisImage:
+    """
+    Visualize detection results.
+
+    Modified from Detectron2
+    https://github.com/facebookresearch/detectron2
+    """
+
+    def __init__(self, img, scale=1.0):
+        self.img = img
+        self.scale = scale
+        self.width, self.height = img.shape[1], img.shape[0]
+        self._setup_figure(img)
+
+    def _setup_figure(self, img):
+        """
+        Args:
+            Same as in :meth:`__init__()`.
+
+        Returns:
+            fig (matplotlib.pyplot.figure): top level container for all the
+                image plot elements.
+            ax (matplotlib.pyplot.Axes): contains figure elements and sets
+                the coordinate system.
+        """
+        fig = mplfigure.Figure(frameon=False)
+        self.dpi = fig.get_dpi()
+        # add a small 1e-2 to avoid precision lost due to matplotlib's truncation
+        # (https://github.com/matplotlib/matplotlib/issues/15363)
+        fig.set_size_inches(
+            (self.width * self.scale + 1e-2) / self.dpi,
+            (self.height * self.scale + 1e-2) / self.dpi,
+        )
+        self.canvas = FigureCanvasAgg(fig)
+        # self.canvas = mpl.backends.backend_cairo.FigureCanvasCairo(fig)
+        ax = fig.add_axes([0.0, 0.0, 1.0, 1.0])
+        ax.axis("off")
+        ax.set_xlim(0.0, self.width)
+        ax.set_ylim(self.height)
+
+        self.fig = fig
+        self.ax = ax
+
+    def save(self, filepath):
+        """
+        Args:
+            filepath (str): a string that contains the absolute path, including
+                the file name, where the visualized image will be saved.
+        """
+        if filepath.lower().endswith(".jpg") or filepath.lower().endswith(".png"):
+            # faster than matplotlib's imshow
+            cv2.imwrite(filepath, self.get_image()[:, :, ::-1])
+        else:
+            # support general formats (e.g. pdf)
+            self.ax.imshow(self.img, interpolation="nearest")
+            self.fig.savefig(filepath)
+
+    def get_image(self):
+        """
+        Returns:
+            ndarray:
+                the visualized image of shape (H, W, 3) (RGB) in uint8 type.
+                The shape is scaled w.r.t the input image using the given
+                `scale` argument.
+        """
+        canvas = self.canvas
+        s, (width, height) = canvas.print_to_buffer()
+        if (self.width, self.height) != (width, height):
+            img = cv2.resize(self.img, (width, height))
+        else:
+            img = self.img
+
+        # buf = io.BytesIO()  # works for cairo backend
+        # canvas.print_rgba(buf)
+        # width, height = self.width, self.height
+        # s = buf.getvalue()
+
+        buffer = np.frombuffer(s, dtype="uint8")
+
+        # imshow is slow. blend manually (still quite slow)
+        img_rgba = buffer.reshape(height, width, 4)
+        rgb, alpha = np.split(img_rgba, [3], axis=2)
+
+        try:
+            import numexpr as ne  # fuse them with numexpr
+
+            visualized_image = ne.evaluate(
+                "img * (1 - alpha / 255.0) + rgb * (alpha / 255.0)"
+            )
+        except ImportError:
+            alpha = alpha.astype("float32") / 255.0
+            visualized_image = img * (1 - alpha) + rgb * alpha
+
+        visualized_image = visualized_image.astype("uint8")
+
+        return visualized_image
+
+
+class Visualizer:
+    def __init__(self, img, dets, class_names, socre_thresh):
+        self.img = img
+        self.dets = dets
+        self.class_names = class_names
+        self.num_classes = len(self.class_names)
+        self.score_thresh = socre_thresh
+        self.viz = VisImage(img=self.img)
+        self._default_font_size = max(
+            np.sqrt(self.viz.height * self.viz.width) // 100, 10
+        )
+
+    def mask_to_polygon(self, mask, need_binary=True):
+        res = cv2.findContours(mask, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
+        hierarchy = res[-1]
+        if hierarchy is None:  # empty mask
+            return None, None, None
+        has_holes = (hierarchy.reshape(-1, 4)[:, 3] >= 0).sum() > 0
+        res = res[-2]
+        res = [x.flatten() for x in res]
+        res = [x for x in res if len(x) >= 6]
+
+        p = mask_util.frPyObjects(res, self.viz.height, self.viz.width)
+        p = mask_util.merge(p)
+        bbox = mask_util.toBbox(p)
+        bbox[2] += bbox[0]
+        bbox[3] += bbox[1]
+
+        return res, bbox, has_holes
+
+    def draw_box(self, box_coord, alpha=0.5, edge_color="g", line_style="-"):
+        x0, y0, x1, y1 = box_coord
+        width = x1 - x0
+        height = y1 - y0
+        linewidth = max(self._default_font_size / 6, 1)
+        self.viz.ax.add_patch(
+            mpl.patches.Rectangle(
+                (x0, y0),
+                width,
+                height,
+                fill=False,
+                edgecolor=edge_color,
+                linewidth=linewidth * self.viz.scale,
+                alpha=alpha,
+                linestyle=line_style,
+            )
+        )
+        return self.viz
+
+    def draw_polycon(self, mask, color, edge_color, alpha=0.5):
+        if edge_color is None:
+            edge_color = color
+        edge_color = mpl.colors.to_rgb(edge_color) + (1,)
+
+        polygon = mpl.patches.Polygon(
+            mask,
+            fill=False,
+            # facecolor=mpl.colors.to_rgb(color) + (alpha,),
+            edgecolor=edge_color,
+            linewidth=max(self._default_font_size // 15 * self.viz.scale, 1),
+        )
+        self.viz.ax.add_patch(polygon)
+        return self.viz
+
+    def draw_mask(self, mask, polys, color, edge_color, alpha=0.5):
+        if edge_color is None:
+            edge_color = color
+        edge_color = mpl.colors.to_rgb(edge_color) + (1,)
+        color_mask = np.ones((mask.shape[0], mask.shape[1], 3))
+        for i in range(3):
+            color_mask[:, :, i] = color[i]
+        self.viz.ax.imshow(np.dstack((color_mask, mask * alpha)))
+        for ploy in polys:
+            self.draw_polycon(ploy.reshape(-1, 2), color, edge_color=None, alpha=alpha)
+
+    def _jitter(self, color):
+        """
+        Randomly modifies given color to produce a slightly different color than
+        the color given.
+
+        Args:
+            color (tuple[double]): a tuple of 3 elements, containing the RGB
+                values of the color picked. The values in the list are in the
+                 [0.0, 1.0] range.
+
+        Returns:
+            jittered_color (tuple[double]): a tuple of 3 elements, containing
+                the RGB values of the color after being jittered. The values
+                in the list are in the [0.0, 1.0] range.
+        """
+        color = mpl.colors.to_rgb(color)
+        vec = np.random.rand(3)
+        # better to do it in another color space
+        vec = vec / np.linalg.norm(vec) * 0.5
+        res = np.clip(vec + color, 0, 1)
+        return tuple(res)
+
+    def overlay_bbox(self, alpha=1.0):
+        for label in self.dets:
+            for bbox in self.dets[label]:
+                x0, y0, x1, y1, score = bbox
+                if score >= self.score_thresh:
+                    # color = self.cmap(i)[:3]
+                    color = _COLORS[label]
+                    text = "{}:{:.1f}%".format(self.class_names[label], score * 100)
+                    self.draw_box(bbox[:4], alpha=1.0, edge_color=color, line_style="-")
+                    text_pos = (x0, y0)
+                    instance_area = (y1 - y0) * (x1 - x0)
+                    if (
+                        instance_area < _SMALL_OBJECT_AREA_THRESH * self.viz.scale
+                        or y1 - y0 < 40 * self.viz.scale
+                    ):
+                        if y1 >= self.viz.height - 5:
+                            text_pos = (x1, y0)
+                        else:
+                            text_pos = (x0, y1)
+
+                    height_ratio = (y1 - y0) / np.sqrt(self.viz.height * self.viz.width)
+                    font_size = (
+                        np.clip((height_ratio - 0.02) / 0.08 + 1, 1.2, 2)
+                        * 0.5
+                        * self._default_font_size
+                    )
+
+                    self.draw_text(
+                        text,
+                        text_pos,
+                        color="black",
+                        horizontal_alignment="left",
+                        font_size=font_size,
+                    )
+        out = self.viz.get_image()
+        return out
+
+    def overlay_masks(self, alpha=0.5):
+        ov = self.img.copy()
+        im = self.img  # .astype(np.float32)
+        total_ma = np.zeros([im.shape[0], im.shape[1]])
+        total_contours = []
+        for i, det in enumerate(self.dets[::-1]):
+            score = det["score"]
+            if score >= self.score_thresh:
+                ma = det["mask"]
+                _, ma = cv2.threshold(
+                    ma, thresh=127, maxval=255, type=cv2.THRESH_BINARY
+                )
+                fg = (
+                    im * alpha
+                    + np.ones(im.shape) * (1 - alpha) * self.cmap(i)[:3] * 255
+                )
+                ov[ma == 255] = fg[ma == 255]
+                total_ma += ma
+                contours = cv2.findContours(
+                    ma.copy(), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE
+                )[-2:]
+                total_contours.append(contours)
+        for cnt in total_contours:
+            cv2.drawContours(ov, cnt[0], -1, (0.0, 0.0, 0.0), 1)
+        ov[total_ma == 0] = im[total_ma == 0]
+        return ov
+
+    def overlay_instance(self, alpha=0.4):
+        for i, det in enumerate(self.dets[::-1]):
+            score = det["score"]
+            if score >= self.score_thresh:
+                label = det["label"]
+                binary_mask = det["mask"]
+                # color = self.cmap(i)[:3]
+                color = _COLORS[label]
+                color = self._jitter(color)
+                contours, bbox, has_holes = self.mask_to_polygon(binary_mask.copy())
+                if not contours:
+                    continue
+                self.draw_mask(
+                    binary_mask, contours, color, edge_color=None, alpha=alpha
+                )
+
+                x0, y0, x1, y1 = bbox
+                text = "{}:{:.1f}%".format(self.class_names[label], score * 100)
+                text_pos = np.median(binary_mask.nonzero(), axis=1)[::-1]
+                instance_area = (y1 - y0) * (x1 - x0)
+                if (
+                    instance_area < _SMALL_OBJECT_AREA_THRESH * self.viz.scale
+                    or y1 - y0 < 40 * self.viz.scale
+                ):
+                    if y1 >= self.viz.height - 5:
+                        text_pos = (x1, y0)
+                    else:
+                        text_pos = (x0, y1)
+
+                height_ratio = (y1 - y0) / np.sqrt(self.viz.height * self.viz.width)
+                font_size = (
+                    np.clip((height_ratio - 0.02) / 0.08 + 1, 1.2, 2)
+                    * 0.5
+                    * self._default_font_size
+                )
+
+                self.draw_text(
+                    text,
+                    text_pos,
+                    color="black",
+                    horizontal_alignment="center",
+                    font_size=font_size,
+                )
+        out = self.viz.get_image()
+        return out
+
+    def draw_text(
+        self,
+        text,
+        position,
+        *,
+        font_size=None,
+        color="g",
+        horizontal_alignment="center",
+        rotation=0
+    ):
+        """
+        Args:
+            text (str): class label
+            position (tuple): a tuple of the x and y coordinates to place text on image.
+            font_size (int, optional): font of the text. If not provided, a font size
+                proportional to the image width is calculated and used.
+            color: color of the text. Refer to `matplotlib.colors` for full list
+                of formats that are accepted.
+            horizontal_alignment (str): see `matplotlib.text.Text`
+            rotation: rotation angle in degrees CCW
+
+        Returns:
+            output (VisImage): image object with text drawn.
+        """
+        if not font_size:
+            font_size = self._default_font_size
+
+        # since the text background is dark, we don't want the text to be dark
+        color = np.maximum(list(mpl.colors.to_rgb(color)), 0.2)
+        color[np.argmax(color)] = max(0.8, np.max(color))
+
+        x, y = position
+        self.viz.ax.text(
+            x,
+            y,
+            text,
+            size=font_size * self.viz.scale,
+            family="sans-serif",
+            bbox={
+                "facecolor": (0.5, 0.5, 1.0),
+                "alpha": 0.8,
+                "pad": 0.7,
+                "edgecolor": (0.8, 0.8, 1.0),
+            },
+            verticalalignment="top",
+            horizontalalignment=horizontal_alignment,
+            color=color,
+            zorder=10,
+            rotation=rotation,
+        )
+        return self.viz
+
+
+_COLORS = (
+    np.array(
+        [
+            0.000,
+            0.447,
+            0.741,
+            0.850,
+            0.325,
+            0.098,
+            0.929,
+            0.694,
+            0.125,
+            0.494,
+            0.184,
+            0.556,
+            0.466,
+            0.674,
+            0.188,
+            0.301,
+            0.745,
+            0.933,
+            0.635,
+            0.078,
+            0.184,
+            0.300,
+            0.300,
+            0.300,
+            0.600,
+            0.600,
+            0.600,
+            1.000,
+            0.000,
+            0.000,
+            1.000,
+            0.500,
+            0.000,
+            0.749,
+            0.749,
+            0.000,
+            0.000,
+            1.000,
+            0.000,
+            0.000,
+            0.000,
+            1.000,
+            0.667,
+            0.000,
+            1.000,
+            0.333,
+            0.333,
+            0.000,
+            0.333,
+            0.667,
+            0.000,
+            0.333,
+            1.000,
+            0.000,
+            0.667,
+            0.333,
+            0.000,
+            0.667,
+            0.667,
+            0.000,
+            0.667,
+            1.000,
+            0.000,
+            1.000,
+            0.333,
+            0.000,
+            1.000,
+            0.667,
+            0.000,
+            1.000,
+            1.000,
+            0.000,
+            0.000,
+            0.333,
+            0.500,
+            0.000,
+            0.667,
+            0.500,
+            0.000,
+            1.000,
+            0.500,
+            0.333,
+            0.000,
+            0.500,
+            0.333,
+            0.333,
+            0.500,
+            0.333,
+            0.667,
+            0.500,
+            0.333,
+            1.000,
+            0.500,
+            0.667,
+            0.000,
+            0.500,
+            0.667,
+            0.333,
+            0.500,
+            0.667,
+            0.667,
+            0.500,
+            0.667,
+            1.000,
+            0.500,
+            1.000,
+            0.000,
+            0.500,
+            1.000,
+            0.333,
+            0.500,
+            1.000,
+            0.667,
+            0.500,
+            1.000,
+            1.000,
+            0.500,
+            0.000,
+            0.333,
+            1.000,
+            0.000,
+            0.667,
+            1.000,
+            0.000,
+            1.000,
+            1.000,
+            0.333,
+            0.000,
+            1.000,
+            0.333,
+            0.333,
+            1.000,
+            0.333,
+            0.667,
+            1.000,
+            0.333,
+            1.000,
+            1.000,
+            0.667,
+            0.000,
+            1.000,
+            0.667,
+            0.333,
+            1.000,
+            0.667,
+            0.667,
+            1.000,
+            0.667,
+            1.000,
+            1.000,
+            1.000,
+            0.000,
+            1.000,
+            1.000,
+            0.333,
+            1.000,
+            1.000,
+            0.667,
+            1.000,
+            0.333,
+            0.000,
+            0.000,
+            0.500,
+            0.000,
+            0.000,
+            0.667,
+            0.000,
+            0.000,
+            0.833,
+            0.000,
+            0.000,
+            1.000,
+            0.000,
+            0.000,
+            0.000,
+            0.167,
+            0.000,
+            0.000,
+            0.333,
+            0.000,
+            0.000,
+            0.500,
+            0.000,
+            0.000,
+            0.667,
+            0.000,
+            0.000,
+            0.833,
+            0.000,
+            0.000,
+            1.000,
+            0.000,
+            0.000,
+            0.000,
+            0.167,
+            0.000,
+            0.000,
+            0.333,
+            0.000,
+            0.000,
+            0.500,
+            0.000,
+            0.000,
+            0.667,
+            0.000,
+            0.000,
+            0.833,
+            0.000,
+            0.000,
+            1.000,
+            0.000,
+            0.000,
+            0.000,
+            0.143,
+            0.143,
+            0.143,
+            0.286,
+            0.286,
+            0.286,
+            0.429,
+            0.429,
+            0.429,
+            0.571,
+            0.571,
+            0.571,
+            0.714,
+            0.714,
+            0.714,
+            0.857,
+            0.857,
+            0.857,
+            0.000,
+            0.447,
+            0.741,
+            0.314,
+            0.717,
+            0.741,
+            0.50,
+            0.5,
+            0,
+        ]
+    )
+    .astype(np.float32)
+    .reshape(-1, 3)
+)
diff --git a/nanodet/util/yacs.py b/nanodet/util/yacs.py
new file mode 100644
index 0000000..1cbe16c
--- /dev/null
+++ b/nanodet/util/yacs.py
@@ -0,0 +1,531 @@
+# Copyright (c) 2018-present, Facebook, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+##############################################################################
+"""YACS -- Yet Another Configuration System is designed to be a simple
+configuration management system for academic and industrial research
+projects.
+
+See README.md for usage and examples.
+"""
+
+import copy
+import io
+import logging
+import os
+import sys
+from ast import literal_eval
+
+import yaml
+
+# Flag for py2 and py3 compatibility to use when separate code paths are necessary
+# When _PY2 is False, we assume Python 3 is in use
+_PY2 = sys.version_info.major == 2
+
+# Filename extensions for loading configs from files
+_YAML_EXTS = {"", ".yaml", ".yml"}
+_PY_EXTS = {".py"}
+
+_FILE_TYPES = (io.IOBase,)
+
+# CfgNodes can only contain a limited set of valid types
+_VALID_TYPES = {tuple, list, str, int, float, bool, type(None)}
+# py2 allow for str and unicode
+if _PY2:
+    _VALID_TYPES = _VALID_TYPES.union({unicode})  # noqa: F821
+
+# Utilities for importing modules from file paths
+if _PY2:
+    # imp is available in both py2 and py3 for now, but is deprecated in py3
+    import imp
+else:
+    import importlib.util
+
+logger = logging.getLogger(__name__)
+
+
+class CfgNode(dict):
+    """
+    CfgNode represents an internal node in the configuration tree. It's a simple
+    dict-like container that allows for attribute-based access to keys.
+    """
+
+    IMMUTABLE = "__immutable__"
+    DEPRECATED_KEYS = "__deprecated_keys__"
+    RENAMED_KEYS = "__renamed_keys__"
+    NEW_ALLOWED = "__new_allowed__"
+
+    def __init__(self, init_dict=None, key_list=None, new_allowed=False):
+        """
+        Args:
+            init_dict (dict): the possibly-nested dictionary to initailize the
+                CfgNode.
+            key_list (list[str]): a list of names which index this CfgNode from
+                the root.
+                Currently only used for logging purposes.
+            new_allowed (bool): whether adding new key is allowed when merging with
+                other configs.
+        """
+        # Recursively convert nested dictionaries in init_dict into CfgNodes
+        init_dict = {} if init_dict is None else init_dict
+        key_list = [] if key_list is None else key_list
+        init_dict = self._create_config_tree_from_dict(init_dict, key_list)
+        super(CfgNode, self).__init__(init_dict)
+        # Manage if the CfgNode is frozen or not
+        self.__dict__[CfgNode.IMMUTABLE] = False
+        # Deprecated options
+        # If an option is removed from the code and you don't want to break existing
+        # yaml configs, you can add the full config key as a string to the set below.
+        self.__dict__[CfgNode.DEPRECATED_KEYS] = set()
+        # Renamed options
+        # If you rename a config option, record the mapping from the old name to the
+        # new name in the dictionary below. Optionally, if the type also changed, you
+        # can make the value a tuple that specifies first the renamed key and then
+        # instructions for how to edit the config file.
+        self.__dict__[CfgNode.RENAMED_KEYS] = {
+            # 'EXAMPLE.OLD.KEY': 'EXAMPLE.NEW.KEY',  # Dummy example to follow
+            # 'EXAMPLE.OLD.KEY': (                   # A more complex example to follow
+            #     'EXAMPLE.NEW.KEY',
+            #     "Also convert to a tuple, e.g., 'foo' -> ('foo',) or "
+            #     + "'foo:bar' -> ('foo', 'bar')"
+            # ),
+        }
+
+        # Allow new attributes after initialisation
+        self.__dict__[CfgNode.NEW_ALLOWED] = new_allowed
+
+    @classmethod
+    def _create_config_tree_from_dict(cls, dic, key_list):
+        """
+        Create a configuration tree using the given dict.
+        Any dict-like objects inside dict will be treated as a new CfgNode.
+
+        Args:
+            dic (dict):
+            key_list (list[str]): a list of names which index this CfgNode from
+                the root. Currently only used for logging purposes.
+        """
+        dic = copy.deepcopy(dic)
+        for k, v in dic.items():
+            if isinstance(v, dict):
+                # Convert dict to CfgNode
+                dic[k] = cls(v, key_list=key_list + [k])
+            else:
+                # Check for valid leaf type or nested CfgNode
+                _assert_with_logging(
+                    _valid_type(v, allow_cfg_node=False),
+                    "Key {} with value {} is not a valid type; valid types: {}".format(
+                        ".".join(key_list + [k]), type(v), _VALID_TYPES
+                    ),
+                )
+        return dic
+
+    def __getattr__(self, name):
+        if name in self:
+            return self[name]
+        else:
+            raise AttributeError(name)
+
+    def __setattr__(self, name, value):
+        if self.is_frozen():
+            raise AttributeError(
+                "Attempted to set {} to {}, but CfgNode is immutable".format(
+                    name, value
+                )
+            )
+
+        _assert_with_logging(
+            name not in self.__dict__,
+            "Invalid attempt to modify internal CfgNode state: {}".format(name),
+        )
+        _assert_with_logging(
+            _valid_type(value, allow_cfg_node=True),
+            "Invalid type {} for key {}; valid types = {}".format(
+                type(value), name, _VALID_TYPES
+            ),
+        )
+
+        self[name] = value
+
+    def __str__(self):
+        def _indent(s_, num_spaces):
+            s = s_.split("\n")
+            if len(s) == 1:
+                return s_
+            first = s.pop(0)
+            s = [(num_spaces * " ") + line for line in s]
+            s = "\n".join(s)
+            s = first + "\n" + s
+            return s
+
+        r = ""
+        s = []
+        for k, v in sorted(self.items()):
+            seperator = "\n" if isinstance(v, CfgNode) else " "
+            attr_str = "{}:{}{}".format(str(k), seperator, str(v))
+            attr_str = _indent(attr_str, 2)
+            s.append(attr_str)
+        r += "\n".join(s)
+        return r
+
+    def __repr__(self):
+        return "{}({})".format(self.__class__.__name__, super(CfgNode, self).__repr__())
+
+    def dump(self, **kwargs):
+        """Dump to a string."""
+
+        def convert_to_dict(cfg_node, key_list):
+            if not isinstance(cfg_node, CfgNode):
+                _assert_with_logging(
+                    _valid_type(cfg_node),
+                    "Key {} with value {} is not a valid type; valid types: {}".format(
+                        ".".join(key_list), type(cfg_node), _VALID_TYPES
+                    ),
+                )
+                return cfg_node
+            else:
+                cfg_dict = dict(cfg_node)
+                for k, v in cfg_dict.items():
+                    cfg_dict[k] = convert_to_dict(v, key_list + [k])
+                return cfg_dict
+
+        self_as_dict = convert_to_dict(self, [])
+        return yaml.safe_dump(self_as_dict, **kwargs)
+
+    def merge_from_file(self, cfg_filename):
+        """Load a yaml config file and merge it this CfgNode."""
+        with open(cfg_filename, "r", encoding="utf-8") as f:
+            cfg = self.load_cfg(f)
+        self.merge_from_other_cfg(cfg)
+
+    def merge_from_other_cfg(self, cfg_other):
+        """Merge `cfg_other` into this CfgNode."""
+        _merge_a_into_b(cfg_other, self, self, [])
+
+    def merge_from_list(self, cfg_list):
+        """Merge config (keys, values) in a list (e.g., from command line) into
+        this CfgNode. For example, `cfg_list = ['FOO.BAR', 0.5]`.
+        """
+        _assert_with_logging(
+            len(cfg_list) % 2 == 0,
+            "Override list has odd length: {}; it must be a list of pairs".format(
+                cfg_list
+            ),
+        )
+        root = self
+        for full_key, v in zip(cfg_list[0::2], cfg_list[1::2]):
+            if root.key_is_deprecated(full_key):
+                continue
+            if root.key_is_renamed(full_key):
+                root.raise_key_rename_error(full_key)
+            key_list = full_key.split(".")
+            d = self
+            for subkey in key_list[:-1]:
+                _assert_with_logging(
+                    subkey in d, "Non-existent key: {}".format(full_key)
+                )
+                d = d[subkey]
+            subkey = key_list[-1]
+            _assert_with_logging(subkey in d, "Non-existent key: {}".format(full_key))
+            value = self._decode_cfg_value(v)
+            value = _check_and_coerce_cfg_value_type(value, d[subkey], subkey, full_key)
+            d[subkey] = value
+
+    def freeze(self):
+        """Make this CfgNode and all of its children immutable."""
+        self._immutable(True)
+
+    def defrost(self):
+        """Make this CfgNode and all of its children mutable."""
+        self._immutable(False)
+
+    def is_frozen(self):
+        """Return mutability."""
+        return self.__dict__[CfgNode.IMMUTABLE]
+
+    def _immutable(self, is_immutable):
+        """Set immutability to is_immutable and recursively apply the setting
+        to all nested CfgNodes.
+        """
+        self.__dict__[CfgNode.IMMUTABLE] = is_immutable
+        # Recursively set immutable state
+        for v in self.__dict__.values():
+            if isinstance(v, CfgNode):
+                v._immutable(is_immutable)
+        for v in self.values():
+            if isinstance(v, CfgNode):
+                v._immutable(is_immutable)
+
+    def clone(self):
+        """Recursively copy this CfgNode."""
+        return copy.deepcopy(self)
+
+    def register_deprecated_key(self, key):
+        """Register key (e.g. `FOO.BAR`) a deprecated option. When merging deprecated
+        keys a warning is generated and the key is ignored.
+        """
+        _assert_with_logging(
+            key not in self.__dict__[CfgNode.DEPRECATED_KEYS],
+            "key {} is already registered as a deprecated key".format(key),
+        )
+        self.__dict__[CfgNode.DEPRECATED_KEYS].add(key)
+
+    def register_renamed_key(self, old_name, new_name, message=None):
+        """Register a key as having been renamed from `old_name` to `new_name`.
+        When merging a renamed key, an exception is thrown alerting to user to
+        the fact that the key has been renamed.
+        """
+        _assert_with_logging(
+            old_name not in self.__dict__[CfgNode.RENAMED_KEYS],
+            "key {} is already registered as a renamed cfg key".format(old_name),
+        )
+        value = new_name
+        if message:
+            value = (new_name, message)
+        self.__dict__[CfgNode.RENAMED_KEYS][old_name] = value
+
+    def key_is_deprecated(self, full_key):
+        """Test if a key is deprecated."""
+        if full_key in self.__dict__[CfgNode.DEPRECATED_KEYS]:
+            logger.warning("Deprecated config key (ignoring): {}".format(full_key))
+            return True
+        return False
+
+    def key_is_renamed(self, full_key):
+        """Test if a key is renamed."""
+        return full_key in self.__dict__[CfgNode.RENAMED_KEYS]
+
+    def raise_key_rename_error(self, full_key):
+        new_key = self.__dict__[CfgNode.RENAMED_KEYS][full_key]
+        if isinstance(new_key, tuple):
+            msg = " Note: " + new_key[1]
+            new_key = new_key[0]
+        else:
+            msg = ""
+        raise KeyError(
+            "Key {} was renamed to {}; please update your config.{}".format(
+                full_key, new_key, msg
+            )
+        )
+
+    def is_new_allowed(self):
+        return self.__dict__[CfgNode.NEW_ALLOWED]
+
+    @classmethod
+    def load_cfg(cls, cfg_file_obj_or_str):
+        """
+        Load a cfg.
+        Args:
+            cfg_file_obj_or_str (str or file):
+                Supports loading from:
+                - A file object backed by a YAML file
+                - A file object backed by a Python source file that exports an attribute
+                  "cfg" that is either a dict or a CfgNode
+                - A string that can be parsed as valid YAML
+        """
+        _assert_with_logging(
+            isinstance(cfg_file_obj_or_str, _FILE_TYPES + (str,)),
+            "Expected first argument to be of type {} or {}, but it was {}".format(
+                _FILE_TYPES, str, type(cfg_file_obj_or_str)
+            ),
+        )
+        if isinstance(cfg_file_obj_or_str, str):
+            return cls._load_cfg_from_yaml_str(cfg_file_obj_or_str)
+        elif isinstance(cfg_file_obj_or_str, _FILE_TYPES):
+            return cls._load_cfg_from_file(cfg_file_obj_or_str)
+        else:
+            raise NotImplementedError("Impossible to reach here (unless there's a bug)")
+
+    @classmethod
+    def _load_cfg_from_file(cls, file_obj):
+        """Load a config from a YAML file or a Python source file."""
+        _, file_extension = os.path.splitext(file_obj.name)
+        if file_extension in _YAML_EXTS:
+            return cls._load_cfg_from_yaml_str(file_obj.read())
+        elif file_extension in _PY_EXTS:
+            return cls._load_cfg_py_source(file_obj.name)
+        else:
+            raise Exception(
+                "Attempt to load from an unsupported file type {}; "
+                "only {} are supported".format(file_obj, _YAML_EXTS.union(_PY_EXTS))
+            )
+
+    @classmethod
+    def _load_cfg_from_yaml_str(cls, str_obj):
+        """Load a config from a YAML string encoding."""
+        cfg_as_dict = yaml.safe_load(str_obj)
+        return cls(cfg_as_dict)
+
+    @classmethod
+    def _load_cfg_py_source(cls, filename):
+        """Load a config from a Python source file."""
+        module = _load_module_from_file("yacs.config.override", filename)
+        _assert_with_logging(
+            hasattr(module, "cfg"),
+            "Python module from file {} must have 'cfg' attr".format(filename),
+        )
+        VALID_ATTR_TYPES = {dict, CfgNode}
+        _assert_with_logging(
+            type(module.cfg) in VALID_ATTR_TYPES,
+            "Imported module 'cfg' attr must be in {} but is {} instead".format(
+                VALID_ATTR_TYPES, type(module.cfg)
+            ),
+        )
+        return cls(module.cfg)
+
+    @classmethod
+    def _decode_cfg_value(cls, value):
+        """
+        Decodes a raw config value (e.g., from a yaml config files or command
+        line argument) into a Python object.
+
+        If the value is a dict, it will be interpreted as a new CfgNode.
+        If the value is a str, it will be evaluated as literals.
+        Otherwise it is returned as-is.
+        """
+        # Configs parsed from raw yaml will contain dictionary keys that need to be
+        # converted to CfgNode objects
+        if isinstance(value, dict):
+            return cls(value)
+        # All remaining processing is only applied to strings
+        if not isinstance(value, str):
+            return value
+        # Try to interpret `value` as a:
+        #   string, number, tuple, list, dict, boolean, or None
+        try:
+            value = literal_eval(value)
+        # The following two excepts allow v to pass through when it represents a
+        # string.
+        #
+        # Longer explanation:
+        # The type of v is always a string (before calling literal_eval), but
+        # sometimes it *represents* a string and other times a data structure, like
+        # a list. In the case that v represents a string, what we got back from the
+        # yaml parser is 'foo' *without quotes* (so, not '"foo"'). literal_eval is
+        # ok with '"foo"', but will raise a ValueError if given 'foo'. In other
+        # cases, like paths (v = 'foo/bar' and not v = '"foo/bar"'), literal_eval
+        # will raise a SyntaxError.
+        except ValueError:
+            pass
+        except SyntaxError:
+            pass
+        return value
+
+
+load_cfg = (
+    CfgNode.load_cfg
+)  # keep this function in global scope for backward compatibility
+
+
+def _valid_type(value, allow_cfg_node=False):
+    return (type(value) in _VALID_TYPES) or (
+        allow_cfg_node and isinstance(value, CfgNode)
+    )
+
+
+def _merge_a_into_b(a, b, root, key_list):
+    """Merge config dictionary a into config dictionary b, clobbering the
+    options in b whenever they are also specified in a.
+    """
+    _assert_with_logging(
+        isinstance(a, CfgNode),
+        "`a` (cur type {}) must be an instance of {}".format(type(a), CfgNode),
+    )
+    _assert_with_logging(
+        isinstance(b, CfgNode),
+        "`b` (cur type {}) must be an instance of {}".format(type(b), CfgNode),
+    )
+
+    for k, v_ in a.items():
+        full_key = ".".join(key_list + [k])
+
+        v = copy.deepcopy(v_)
+        v = b._decode_cfg_value(v)
+
+        if k in b:
+            v = _check_and_coerce_cfg_value_type(v, b[k], k, full_key)
+            # Recursively merge dicts
+            if isinstance(v, CfgNode):
+                try:
+                    _merge_a_into_b(v, b[k], root, key_list + [k])
+                except BaseException:
+                    raise
+            else:
+                b[k] = v
+        elif b.is_new_allowed():
+            b[k] = v
+        else:
+            if root.key_is_deprecated(full_key):
+                continue
+            elif root.key_is_renamed(full_key):
+                root.raise_key_rename_error(full_key)
+            else:
+                raise KeyError("Non-existent config key: {}".format(full_key))
+
+
+def _check_and_coerce_cfg_value_type(replacement, original, key, full_key):
+    """Checks that `replacement`, which is intended to replace `original` is of
+    the right type. The type is correct if it matches exactly or is one of a few
+    cases in which the type can be easily coerced.
+    """
+    original_type = type(original)
+    replacement_type = type(replacement)
+
+    # The types must match (with some exceptions)
+    if replacement_type == original_type:
+        return replacement
+
+    # Cast replacement from from_type to to_type if the replacement and original
+    # types match from_type and to_type
+    def conditional_cast(from_type, to_type):
+        if replacement_type == from_type and original_type == to_type:
+            return True, to_type(replacement)
+        else:
+            return False, None
+
+    # Conditionally casts
+    # list <-> tuple
+    casts = [(tuple, list), (list, tuple)]
+    # For py2: allow converting from str (bytes) to a unicode string
+    try:
+        casts.append((str, unicode))  # noqa: F821
+    except Exception:
+        pass
+
+    for (from_type, to_type) in casts:
+        converted, converted_value = conditional_cast(from_type, to_type)
+        if converted:
+            return converted_value
+
+    raise ValueError(
+        "Type mismatch ({} vs. {}) with values ({} vs. {}) for config "
+        "key: {}".format(
+            original_type, replacement_type, original, replacement, full_key
+        )
+    )
+
+
+def _assert_with_logging(cond, msg):
+    if not cond:
+        logger.debug(msg)
+    assert cond, msg
+
+
+def _load_module_from_file(name, filename):
+    if _PY2:
+        module = imp.load_source(name, filename)
+    else:
+        spec = importlib.util.spec_from_file_location(name, filename)
+        module = importlib.util.module_from_spec(spec)
+        spec.loader.exec_module(module)
+    return module
diff --git a/reference/demo.py b/reference/demo.py
new file mode 100644
index 0000000..e38c8f3
--- /dev/null
+++ b/reference/demo.py
@@ -0,0 +1,157 @@
+import argparse
+import os
+import time
+
+import cv2
+import torch
+
+from nanodet.data.batch_process import stack_batch_img
+from nanodet.data.collate import naive_collate
+from nanodet.data.transform import Pipeline
+from nanodet.model.arch import build_model
+from nanodet.util import Logger, cfg, load_config, load_model_weight
+from nanodet.util.path import mkdir
+
+image_ext = [".jpg", ".jpeg", ".webp", ".bmp", ".png"]
+video_ext = ["mp4", "mov", "avi", "mkv"]
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "demo", default="image", help="demo type, eg. image, video and webcam"
+    )
+    parser.add_argument("--config", default="config/00.yml",help="model config file path")
+    parser.add_argument("--model",default="", help="model file path")
+    parser.add_argument("--path", default="./demo", help="path to images or video")
+    parser.add_argument("--camid", type=int, default=0, help="webcam demo camera id")
+    parser.add_argument(
+        "--save_result",
+        action="store_true",
+        help="whether to save the inference result of image/video",
+    )
+    args = parser.parse_args()
+    return args
+
+
+class Predictor(object):
+    def __init__(self, cfg, model_path, logger, device="cpu:0"):
+        self.cfg = cfg
+        self.device = device
+        model = build_model(cfg.model)
+        ckpt = torch.load(model_path, map_location=lambda storage, loc: storage)
+        load_model_weight(model, ckpt, logger)
+        if cfg.model.arch.backbone.name == "RepVGG":
+            deploy_config = cfg.model
+            deploy_config.arch.backbone.update({"deploy": True})
+            deploy_model = build_model(deploy_config)
+            from nanodet.model.backbone.repvgg import repvgg_det_model_convert
+
+            model = repvgg_det_model_convert(model, deploy_model)
+        self.model = model.to(device).eval()
+        self.pipeline = Pipeline(cfg.data.val.pipeline, cfg.data.val.keep_ratio)
+
+    def inference(self, img):
+        img_info = {"id": 0}
+        if isinstance(img, str):
+            img_info["file_name"] = os.path.basename(img)
+            img = cv2.imread(img)
+        else:
+            img_info["file_name"] = None
+
+        height, width = img.shape[:2]
+        img_info["height"] = height
+        img_info["width"] = width
+        meta = dict(img_info=img_info, raw_img=img, img=img)
+        meta = self.pipeline(None, meta, self.cfg.data.val.input_size)
+        meta["img"] = torch.from_numpy(meta["img"].transpose(2, 0, 1)).to(self.device)
+        meta = naive_collate([meta])
+        meta["img"] = stack_batch_img(meta["img"], divisible=32)
+        with torch.no_grad():
+            results = self.model.inference(meta)
+        return meta, results
+
+    def visualize(self, dets, meta, class_names, score_thres, wait=0):
+        time1 = time.time()
+        result_img = self.model.head.show_result(
+            meta["raw_img"][0], dets, class_names, score_thres=score_thres, show=True
+        )
+        print("viz time: {:.3f}s".format(time.time() - time1))
+        return result_img
+
+
+def get_image_list(path):
+    image_names = []
+    for maindir, subdir, file_name_list in os.walk(path):
+        for filename in file_name_list:
+            apath = os.path.join(maindir, filename)
+            ext = os.path.splitext(apath)[1]
+            if ext in image_ext:
+                image_names.append(apath)
+    return image_names
+
+
+def main():
+    args = parse_args()
+    local_rank = 0
+    torch.backends.cudnn.enabled = True
+    torch.backends.cudnn.benchmark = True
+
+    load_config(cfg, args.config)
+    logger = Logger(local_rank, use_tensorboard=False)
+    predictor = Predictor(cfg, args.model, logger, device="cpu:0")
+    logger.log('Press "Esc", "q" or "Q" to exit.')
+    current_time = time.localtime()
+    if args.demo == "image":
+        if os.path.isdir(args.path):
+            files = get_image_list(args.path)
+        else:
+            files = [args.path]
+        files.sort()
+        for image_name in files:
+            meta, res = predictor.inference(image_name)
+            result_image = predictor.visualize(res[0], meta, cfg.class_names, 0.35)
+            if args.save_result:
+                save_folder = os.path.join(
+                    cfg.save_dir, time.strftime("%Y_%m_%d_%H_%M_%S", current_time)
+                )
+                mkdir(local_rank, save_folder)
+                save_file_name = os.path.join(save_folder, os.path.basename(image_name))
+                cv2.imwrite(save_file_name, result_image)
+            ch = cv2.waitKey(0)
+            if ch == 27 or ch == ord("q") or ch == ord("Q"):
+                break
+    elif args.demo == "video" or args.demo == "webcam":
+        cap = cv2.VideoCapture(args.path if args.demo == "video" else args.camid)
+        width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)  # float
+        height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)  # float
+        fps = cap.get(cv2.CAP_PROP_FPS)
+        save_folder = os.path.join(
+            cfg.save_dir, time.strftime("%Y_%m_%d_%H_%M_%S", current_time)
+        )
+        mkdir(local_rank, save_folder)
+        save_path = (
+            os.path.join(save_folder, args.path.replace("\\", "/").split("/")[-1])
+            if args.demo == "video"
+            else os.path.join(save_folder, "camera.mp4")
+        )
+        print(f"save_path is {save_path}")
+        vid_writer = cv2.VideoWriter(
+            save_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (int(width), int(height))
+        )
+        while True:
+            ret_val, frame = cap.read()
+            if ret_val:
+                meta, res = predictor.inference(frame)
+                result_frame = predictor.visualize(res[0], meta, cfg.class_names, 0.35)
+                if args.save_result:
+                    vid_writer.write(result_frame)
+                ch = cv2.waitKey(1)
+                if ch == 27 or ch == ord("q") or ch == ord("Q"):
+                    break
+            else:
+                break
+
+
+if __name__ == "__main__":
+    main()
diff --git a/reference/inference.py b/reference/inference.py
new file mode 100644
index 0000000..8f853f3
--- /dev/null
+++ b/reference/inference.py
@@ -0,0 +1,70 @@
+# Copyright 2021 RangiLyu.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+
+import cv2
+import torch
+
+from nanodet.data.transform import Pipeline
+from nanodet.model.arch import build_model
+from nanodet.util import load_model_weight
+
+
+class Predictor(object):
+    def __init__(self, cfg, model_path, logger, device="cuda:0"):
+        self.cfg = cfg
+        self.device = device
+        model = build_model(cfg.model)
+        ckpt = torch.load(model_path, map_location=lambda storage, loc: storage)
+        load_model_weight(model, ckpt, logger)
+        if cfg.model.arch.backbone.name == "RepVGG":
+            deploy_config = cfg.model
+            deploy_config.arch.backbone.update({"deploy": True})
+            deploy_model = build_model(deploy_config)
+            from nanodet.model.backbone.repvgg import repvgg_det_model_convert
+
+            model = repvgg_det_model_convert(model, deploy_model)
+        self.model = model.to(device).eval()
+        self.pipeline = Pipeline(cfg.data.val.pipeline, cfg.data.val.keep_ratio)
+
+    def inference(self, img):
+        img_info = {}
+        if isinstance(img, str):
+            img_info["file_name"] = os.path.basename(img)
+            img = cv2.imread(img)
+        else:
+            img_info["file_name"] = None
+
+        height, width = img.shape[:2]
+        img_info["height"] = height
+        img_info["width"] = width
+        meta = dict(img_info=img_info, raw_img=img, img=img)
+        meta = self.pipeline(meta, self.cfg.data.val.input_size)
+        meta["img"] = (
+            torch.from_numpy(meta["img"].transpose(2, 0, 1))
+            .unsqueeze(0)
+            .to(self.device)
+        )
+        with torch.no_grad():
+            results = self.model.inference(meta)
+        return meta, results
+
+    def visualize(self, dets, meta, class_names, score_thres, wait=0):
+        time1 = time.time()
+        self.model.head.show_result(
+            meta["raw_img"], dets, class_names, score_thres=score_thres, show=True
+        )
+        print("viz time: {:.3f}s".format(time.time() - time1))
diff --git a/requirements.txt b/requirements.txt
index 5bfc357..dd7ce36 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,7 +1,23 @@
-numpy
-scipy
+Cython
 matplotlib
+numpy
+omegaconf>=2.0.1
+onnx
+onnx-simplifier
+opencv-python
+pyaml
+pycocotools
+pytorch-lightning==1.7.0
+tabulate
+tensorboard
+termcolor
+torch>=1.9
+torchmetrics
+torchvision
+tqdm
+
 opencv-contrib-python
+scipy
 pandas
 motmetrics
 setuptools
diff --git a/setup_nanodet.py b/setup_nanodet.py
new file mode 100644
index 0000000..d2dccb8
--- /dev/null
+++ b/setup_nanodet.py
@@ -0,0 +1,27 @@
+#!/usr/bin/env python
+from setuptools import find_packages, setup
+
+from nanodet import __author__, __author_email__, __docs__, __homepage__, __version__
+
+if __name__ == "__main__":
+    setup(
+        name="nanodet",
+        version=__version__,
+        description=__docs__,
+        url=__homepage__,
+        author=__author__,
+        author_email=__author_email__,
+        keywords="deep learning",
+        packages=find_packages(exclude=("config", "tools", "demo")),
+        classifiers=[
+            "Development Status :: Beta",
+            "License :: OSI Approved :: Apache Software License",
+            "Operating System :: OS Independent",
+            "Programming Language :: Python :: 3.5",
+            "Programming Language :: Python :: 3.6",
+            "Programming Language :: Python :: 3.7",
+            "Programming Language :: Python :: 3.8",
+        ],
+        license="Apache License 2.0",
+        zip_safe=False,
+    )
diff --git a/test.avi b/test.avi
new file mode 100644
index 0000000..eb22256
Binary files /dev/null and b/test.avi differ
diff --git a/tool.py b/tool.py
new file mode 100644
index 0000000..97e2e05
--- /dev/null
+++ b/tool.py
@@ -0,0 +1,8 @@
+def infotrans(all_box):
+    # 用液滴左上和右下边框上的点的坐标近似计算液滴中心点坐标
+    bboxes, confidences, class_ids = [], [], []
+    for i in range(len(all_box)):
+        bboxes.append([all_box[i][1],all_box[i][2],all_box[i][3]-all_box[i][1],all_box[i][4]-all_box[i][2]])
+        confidences.append(float(all_box[i][5]))
+        class_ids.append(all_box[i][0])
+    return bboxes , confidences , class_ids
diff --git a/weight/LiquidV5.pth b/weight/LiquidV5.pth
new file mode 100644
index 0000000..53477de
Binary files /dev/null and b/weight/LiquidV5.pth differ