diff --git a/README.md b/README.md index b3994c8..12ebd78 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,12 @@ -[cars-yolo-output]: examples/assets/cars.gif "Sample Output with YOLO" -[cows-tf-ssd-output]: examples/assets/cows.gif "Sample Output with SSD" +# 应用领域 -# Multi-object trackers in Python -Easy to use implementation of various multi-object tracking algorithms. +本文立足于**将超声悬浮技术应用于超疏水表面上的液滴操控系统**,并在此基础上搭建**以机器视觉**为辅助的三轴式液滴操控系统。本文目的是利用神经网络实现液滴的目标检测与目标跟踪问题,实现液滴的自动化操控,并且提高液滴的操控精度。通过一个轻量化网络,使得在边缘计算设备上也能运行精准的液滴目标检测与跟踪算法。 -[![DOI](https://zenodo.org/badge/148338463.svg)](https://zenodo.org/badge/latestdoi/148338463) +## Available Object Detector - -`YOLOv3 + CentroidTracker` | `TF-MobileNetSSD + CentroidTracker` -:-------------------------:|:-------------------------: -![Cars with YOLO][cars-yolo-output] | ![Cows with tf-SSD][cows-tf-ssd-output] -Video source: [link](https://flic.kr/p/L6qyxj) | Video source: [link](https://flic.kr/p/26WeEWy) +``` +NanoDet-Plus +``` ## Available Multi Object Trackers @@ -21,84 +17,18 @@ CentroidKF_Tracker SORT ``` -## Available OpenCV-based object detectors: - -``` -detector.TF_SSDMobileNetV2 -detector.Caffe_SSDMobileNet -detector.YOLOv3 -``` - ## Installation -Pip install for OpenCV (version 3.4.3 or later) is available [here](https://pypi.org/project/opencv-python/) and can be done with the following command: - ``` -git clone https://github.com/adipandas/multi-object-tracker +git clone https://github.com/vvEverett/multi-object-tracker.git cd multi-object-tracker pip install -r requirements.txt -pip install -e . +# pip install -e . +python setup.py develop +python setup_nanodet.py develop ``` -**Note - for using neural network models with GPU** -For using the opencv `dnn`-based object detection modules provided in this repository with GPU, you may have to compile a CUDA enabled version of OpenCV from source. -* To build opencv from source, refer the following links: -[[link-1](https://docs.opencv.org/master/df/d65/tutorial_table_of_content_introduction.html)], -[[link-2](https://www.pyimagesearch.com/2020/02/03/how-to-use-opencvs-dnn-module-with-nvidia-gpus-cuda-and-cudnn/)] - -## How to use?: Examples +## How to use? -The interface for each tracker is simple and similar. Please refer the example template below. - -``` -from motrackers import CentroidTracker # or IOUTracker, CentroidKF_Tracker, SORT -input_data = ... -detector = ... -tracker = CentroidTracker(...) # or IOUTracker(...), CentroidKF_Tracker(...), SORT(...) -while True: - done, image = - if done: - break - detection_bboxes, detection_confidences, detection_class_ids = detector.detect(image) - # NOTE: - # * `detection_bboxes` are numpy.ndarray of shape (n, 4) with each row containing (bb_left, bb_top, bb_width, bb_height) - # * `detection_confidences` are numpy.ndarray of shape (n,); - # * `detection_class_ids` are numpy.ndarray of shape (n,). - output_tracks = tracker.update(detection_bboxes, detection_confidences, detection_class_ids) - # `output_tracks` is a list with each element containing tuple of - # (, , , , , , , , , ) - for track in output_tracks: - frame, id, bb_left, bb_top, bb_width, bb_height, confidence, x, y, z = track - assert len(track) == 10 - print(track) -``` - -Please refer [examples](https://github.com/adipandas/multi-object-tracker/tree/master/examples) folder of this repository for more details. You can clone and run the examples. - -## Pretrained object detection models - -You will have to download the pretrained weights for the neural-network models. -The shell scripts for downloading these are provided [here](https://github.com/adipandas/multi-object-tracker/tree/master/examples/pretrained_models) below respective folders. -Please refer [DOWNLOAD_WEIGHTS.md](https://github.com/adipandas/multi-object-tracker/blob/master/DOWNLOAD_WEIGHTS.md) for more details. - -### Notes -* There are some variations in implementations as compared to what appeared in papers of `SORT` and `IoU Tracker`. -* In case you find any bugs in the algorithm, I will be happy to accept your pull request or you can create an issue to point it out. - -## References, Credits and Contributions -Please see [REFERENCES.md](https://github.com/adipandas/multi-object-tracker/blob/master/docs/readme/REFERENCES.md) and [CONTRIBUTING.md](https://github.com/adipandas/multi-object-tracker/blob/master/docs/readme/CONTRIBUTING.md). - -## Citation - -If you use this repository in your work, please consider citing it with: -``` -@misc{multiobjtracker_amd2018, - author = {Deshpande, Aditya M.}, - title = {Multi-object trackers in Python}, - year = {2020}, - publisher = {GitHub}, - journal = {GitHub repository}, - howpublished = {\url{https://github.com/adipandas/multi-object-tracker}}, -} -``` +运行main.py即可开启对test.avi的液滴目标检测与跟踪。 diff --git a/config/LiquidDetect.yml b/config/LiquidDetect.yml new file mode 100644 index 0000000..3e3c3af --- /dev/null +++ b/config/LiquidDetect.yml @@ -0,0 +1,115 @@ +#Config File example +save_dir: workspace/lqd +model: + weight_averager: + name: ExpMovingAverager + decay: 0.9998 + arch: + name: NanoDetPlus + detach_epoch: 10 + backbone: + name: ShuffleNetV2 + model_size: 1.0x + out_stages: [2,3,4] + activation: LeakyReLU + fpn: + name: GhostPAN + in_channels: [116, 232, 464] + out_channels: 96 + kernel_size: 5 + num_extra_level: 1 + use_depthwise: True + activation: LeakyReLU + head: + name: NanoDetPlusHead + num_classes: 1 + input_channel: 96 + feat_channels: 96 + stacked_convs: 2 + kernel_size: 5 + strides: [8, 16, 32, 64] + activation: LeakyReLU + reg_max: 7 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 + # Auxiliary head, only use in training time. + aux_head: + name: SimpleConvHead + num_classes: 1 + input_channel: 192 + feat_channels: 192 + stacked_convs: 4 + strides: [8, 16, 32, 64] + activation: LeakyReLU + reg_max: 7 + +class_names: &class_names ['Liquid'] #Please fill in the category names (not include background category) +data: + train: + name: XMLDataset + class_names: *class_names + img_path: lq/train/img + ann_path: lq/train/an + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.6, 1.4] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.8, 1.2] + saturation: [0.8, 1.2] + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] + val: + name: XMLDataset + class_names: *class_names + img_path: lq/valid/img + ann_path: lq/valid/an + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] +device: + gpu_ids: [0] # Set like [0, 1, 2, 3] if you have multi-GPUs + workers_per_gpu: 8 + batchsize_per_gpu: 4 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: AdamW + lr: 0.001 + weight_decay: 0.05 + warmup: + name: linear + steps: 500 + ratio: 0.0001 + total_epochs: 300 + lr_schedule: + name: CosineAnnealingLR + T_max: 300 + eta_min: 0.00005 + val_intervals: 10 +grad_clip: 35 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 diff --git a/config/LiquidDetect416.yml b/config/LiquidDetect416.yml new file mode 100644 index 0000000..1ca3ceb --- /dev/null +++ b/config/LiquidDetect416.yml @@ -0,0 +1,115 @@ +#Config File example +save_dir: workspace/lqd +model: + weight_averager: + name: ExpMovingAverager + decay: 0.9998 + arch: + name: NanoDetPlus + detach_epoch: 10 + backbone: + name: ShuffleNetV2 + model_size: 1.0x + out_stages: [2,3,4] + activation: LeakyReLU + fpn: + name: GhostPAN + in_channels: [116, 232, 464] + out_channels: 96 + kernel_size: 5 + num_extra_level: 1 + use_depthwise: True + activation: LeakyReLU + head: + name: NanoDetPlusHead + num_classes: 1 + input_channel: 96 + feat_channels: 96 + stacked_convs: 2 + kernel_size: 5 + strides: [8, 16, 32, 64] + activation: LeakyReLU + reg_max: 7 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 + # Auxiliary head, only use in training time. + aux_head: + name: SimpleConvHead + num_classes: 1 + input_channel: 192 + feat_channels: 192 + stacked_convs: 4 + strides: [8, 16, 32, 64] + activation: LeakyReLU + reg_max: 7 + +class_names: &class_names ['Liquid'] #Please fill in the category names (not include background category) +data: + train: + name: XMLDataset + class_names: *class_names + img_path: lq/train/img + ann_path: lq/train/an + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.6, 1.4] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.8, 1.2] + saturation: [0.8, 1.2] + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] + val: + name: XMLDataset + class_names: *class_names + img_path: lq/valid/img + ann_path: lq/valid/an + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] +device: + gpu_ids: [0] # Set like [0, 1, 2, 3] if you have multi-GPUs + workers_per_gpu: 8 + batchsize_per_gpu: 4 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: AdamW + lr: 0.001 + weight_decay: 0.05 + warmup: + name: linear + steps: 500 + ratio: 0.0001 + total_epochs: 300 + lr_schedule: + name: CosineAnnealingLR + T_max: 300 + eta_min: 0.00005 + val_intervals: 10 +grad_clip: 35 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 diff --git a/config/convnext/nanodet-plus_convnext-nano_640.yml b/config/convnext/nanodet-plus_convnext-nano_640.yml new file mode 100644 index 0000000..dfc0a85 --- /dev/null +++ b/config/convnext/nanodet-plus_convnext-nano_640.yml @@ -0,0 +1,130 @@ +save_dir: workspace/convnext/nanodet-plus_convnext-nano_640 +model: + weight_averager: + name: ExpMovingAverager + decay: 0.9998 + arch: + name: NanoDetPlus + detach_epoch: 10 + backbone: + name: TIMMWrapper + model_name: convnext_nano + features_only: True + pretrained: True + # output_stride: 32 + out_indices: [1, 2, 3] + fpn: + name: GhostPAN + in_channels: [160, 320, 640] + out_channels: 128 + kernel_size: 5 + num_extra_level: 1 + use_depthwise: True + activation: SiLU + head: + name: NanoDetPlusHead + num_classes: 80 + input_channel: 128 + feat_channels: 128 + stacked_convs: 2 + kernel_size: 5 + strides: [8, 16, 32, 64] + activation: SiLU + reg_max: 7 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 + # Auxiliary head, only use in training time. + aux_head: + name: SimpleConvHead + num_classes: 80 + input_channel: 256 + feat_channels: 256 + stacked_convs: 4 + strides: [8, 16, 32, 64] + activation: SiLU + reg_max: 7 +data: + train: + name: CocoDataset + img_path: coco/train2017 + ann_path: coco/annotations/instances_train2017.json + input_size: [640,640] #[w,h] + keep_ratio: False + pipeline: + perspective: 0.0 + scale: [0.1, 2.0] + stretch: [[0.8, 1.2], [0.8, 1.2]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.6, 1.4] + saturation: [0.5, 1.2] + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] + val: + name: CocoDataset + img_path: coco/val2017 + ann_path: coco/annotations/instances_val2017.json + input_size: [640,640] #[w,h] + keep_ratio: False + pipeline: + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] +device: + gpu_ids: [0, 1, 2, 3] + workers_per_gpu: 8 + batchsize_per_gpu: 24 +schedule: +# resume: +# load_model: + optimizer: + name: AdamW + lr: 0.001 + weight_decay: 0.05 + no_norm_decay: True + param_level_cfg: + backbone: + lr_mult: 0.1 + warmup: + name: linear + steps: 500 + ratio: 0.0001 + total_epochs: 50 + lr_schedule: + name: CosineAnnealingLR + T_max: 50 + eta_min: 0.0005 + val_intervals: 5 +grad_clip: 35 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP +log: + interval: 50 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite0_320.yml b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite0_320.yml new file mode 100644 index 0000000..1e43f10 --- /dev/null +++ b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite0_320.yml @@ -0,0 +1,118 @@ +# nanodet-EfficientNet-Lite0_320 +# COCO mAP(0.5:0.95) = 0.247 +# AP_50 = 0.404 +# AP_75 = 0.250 +# AP_small = 0.079 +# AP_m = 0.243 +# AP_l = 0.406 +save_dir: workspace/efficient0_320 +model: + arch: + name: OneStageDetector + backbone: + name: EfficientNetLite + model_name: efficientnet_lite0 + out_stages: [2,4,6] + activation: ReLU6 + fpn: + name: PAN + in_channels: [40, 112, 320] + out_channels: 96 + start_level: 0 + num_outs: 3 + head: + name: NanoDetHead + num_classes: 80 + input_channel: 96 + feat_channels: 96 + activation: ReLU6 + stacked_convs: 2 + share_cls_reg: True + octave_base_scale: 5 + scales_per_octave: 1 + strides: [8, 16, 32] + reg_max: 7 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 +data: + train: + name: CocoDataset + img_path: /coco/train2017 + ann_path: /coco/annotations/instances_train2017.json + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.6, 1.4] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.6, 1.4] + saturation: [0.5, 1.2] + normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]] + val: + name: CocoDataset + img_path: /coco/val2017 + ann_path: /coco/annotations/instances_val2017.json + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]] +device: + gpu_ids: [0] + workers_per_gpu: 12 + batchsize_per_gpu: 150 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: SGD + lr: 0.15 + momentum: 0.9 + weight_decay: 0.0001 + warmup: + name: linear + steps: 500 + ratio: 0.01 + total_epochs: 190 + lr_schedule: + name: MultiStepLR + milestones: [140,170,180,185] + gamma: 0.1 + val_intervals: 1 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite1_416.yml b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite1_416.yml new file mode 100644 index 0000000..2e83ab3 --- /dev/null +++ b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite1_416.yml @@ -0,0 +1,119 @@ +# nanodet-EfficientNet-Lite1_416 +# COCO mAP(0.5:0.95) = 0.303 +# AP_50 = 0.471 +# AP_75 = 0.313 +# AP_small = 0.122 +# AP_m = 0.321 +# AP_l = 0.432 +save_dir: workspace/efficient1_416_SGD +model: + arch: + name: OneStageDetector + backbone: + name: EfficientNetLite + model_name: efficientnet_lite1 + out_stages: [2,4,6] + activation: ReLU6 + pretrain: True + fpn: + name: PAN + in_channels: [40, 112, 320] + out_channels: 128 + start_level: 0 + num_outs: 3 + head: + name: NanoDetHead + num_classes: 80 + input_channel: 128 + feat_channels: 128 + stacked_convs: 3 + activation: ReLU6 + share_cls_reg: True + octave_base_scale: 8 + scales_per_octave: 1 + strides: [8, 16, 32] + reg_max: 10 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 +data: + train: + name: CocoDataset + img_path: /coco/train2017 + ann_path: /coco/annotations/instances_train2017.json + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.5, 1.5] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.6, 1.4] + saturation: [0.5, 1.2] + normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]] + val: + name: CocoDataset + img_path: /coco/val2017 + ann_path: /coco/annotations/instances_val2017.json + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]] +device: + gpu_ids: [0] + workers_per_gpu: 12 + batchsize_per_gpu: 100 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: SGD + lr: 0.07 + momentum: 0.9 + weight_decay: 0.0001 + warmup: + name: linear + steps: 500 + ratio: 0.01 + total_epochs: 170 + lr_schedule: + name: MultiStepLR + milestones: [130,150,160,165] + gamma: 0.1 + val_intervals: 5 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite2_512.yml b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite2_512.yml new file mode 100644 index 0000000..62278a6 --- /dev/null +++ b/config/legacy_v0.x_configs/EfficientNet-Lite/nanodet-EfficientNet-Lite2_512.yml @@ -0,0 +1,119 @@ +# nanodet-EfficientNet-Lite2_512 +# COCO mAP(0.5:0.95) = 0.326 +# AP_50 = 0.501 +# AP_75 = 0.344 +# AP_small = 0.152 +# AP_m = 0.342 +# AP_l = 0.481 +save_dir: workspace/efficientlite2_512 +model: + arch: + name: OneStageDetector + backbone: + name: EfficientNetLite + model_name: efficientnet_lite2 + out_stages: [2,4,6] + activation: ReLU6 + pretrain: True + fpn: + name: PAN + in_channels: [48, 120, 352] + out_channels: 128 + start_level: 0 + num_outs: 3 + head: + name: NanoDetHead + num_classes: 80 + input_channel: 128 + feat_channels: 128 + stacked_convs: 4 + activation: ReLU6 + share_cls_reg: True + octave_base_scale: 5 + scales_per_octave: 1 + strides: [8, 16, 32] + reg_max: 10 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 +data: + train: + name: CocoDataset + img_path: /coco/train2017 + ann_path: /coco/annotations/instances_train2017.json + input_size: [512,512] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.5, 1.5] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.6, 1.4] + saturation: [0.5, 1.2] + normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]] + val: + name: CocoDataset + img_path: /coco/val2017 + ann_path: /coco/annotations/instances_val2017.json + input_size: [512,512] #[w,h] + keep_ratio: True + pipeline: + normalize: [[127.0, 127.0, 127.0], [128.0, 128.0, 128.0]] +device: + gpu_ids: [0] + workers_per_gpu: 12 + batchsize_per_gpu: 60 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: SGD + lr: 0.06 + momentum: 0.9 + weight_decay: 0.0001 + warmup: + name: linear + steps: 300 + ratio: 0.1 + total_epochs: 135 + lr_schedule: + name: MultiStepLR + milestones: [90,110,120,130] + gamma: 0.1 + val_intervals: 5 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/config/legacy_v0.x_configs/RepVGG/nanodet-RepVGG-A0_416.yml b/config/legacy_v0.x_configs/RepVGG/nanodet-RepVGG-A0_416.yml new file mode 100644 index 0000000..6694512 --- /dev/null +++ b/config/legacy_v0.x_configs/RepVGG/nanodet-RepVGG-A0_416.yml @@ -0,0 +1,115 @@ +# nanodet-EfficientNet-Lite1_416 +save_dir: workspace/RepVGG-A0-416 +model: + arch: + name: OneStageDetector + backbone: + name: RepVGG + arch: A0 + out_stages: [2,3,4] + activation: ReLU + last_channel: 512 + deploy: False + fpn: + name: PAN + in_channels: [96, 192, 512] + out_channels: 128 + start_level: 0 + num_outs: 3 + head: + name: NanoDetHead + num_classes: 80 + conv_type: Conv + input_channel: 128 + feat_channels: 128 + stacked_convs: 2 + activation: ReLU + share_cls_reg: True + octave_base_scale: 8 + scales_per_octave: 1 + strides: [8, 16, 32] + reg_max: 10 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 +data: + train: + name: CocoDataset + img_path: /coco/train2017 + ann_path: /coco/annotations/instances_train2017.json + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.5, 1.5] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.6, 1.4] + saturation: [0.5, 1.2] + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] + val: + name: CocoDataset + img_path: /coco/val2017 + ann_path: /coco/annotations/instances_val2017.json + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] +device: + gpu_ids: [0] + workers_per_gpu: 1 + batchsize_per_gpu: 100 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: SGD + lr: 0.07 + momentum: 0.9 + weight_decay: 0.0001 + warmup: + name: linear + steps: 500 + ratio: 0.01 + total_epochs: 170 + lr_schedule: + name: MultiStepLR + milestones: [130,150,160,165] + gamma: 0.1 + val_intervals: 5 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/config/legacy_v0.x_configs/Transformer/nanodet-t.yml b/config/legacy_v0.x_configs/Transformer/nanodet-t.yml new file mode 100644 index 0000000..cc9748a --- /dev/null +++ b/config/legacy_v0.x_configs/Transformer/nanodet-t.yml @@ -0,0 +1,122 @@ +# NanoDet-m with transformer attention +# COCO mAP(0.5:0.95) = 0.217 +# AP_50 = 0.363 +# AP_75 = 0.218 +# AP_small = 0.069 +# AP_m = 0.214 +# AP_l = 0.364 + +save_dir: workspace/nanodet_t +model: + arch: + name: OneStageDetector + backbone: + name: ShuffleNetV2 + model_size: 1.0x + out_stages: [2,3,4] + activation: LeakyReLU + fpn: + name: TAN # transformer attention network + in_channels: [116, 232, 464] + out_channels: 128 + feature_hw: [20,20] # size for position embedding + num_heads: 8 + num_encoders: 1 + mlp_ratio: 4 + dropout_ratio: 0.1 + activation: LeakyReLU + head: + name: NanoDetHead + num_classes: 80 + input_channel: 128 + feat_channels: 128 + stacked_convs: 2 + share_cls_reg: True + octave_base_scale: 5 + scales_per_octave: 1 + strides: [8, 16, 32] + reg_max: 7 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 +data: + train: + name: CocoDataset + img_path: coco/train2017 + ann_path: coco/annotations/instances_train2017.json + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.6, 1.4] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.8, 1.2] + saturation: [0.8, 1.2] + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] + val: + name: CocoDataset + img_path: coco/val2017 + ann_path: coco/annotations/instances_val2017.json + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] +device: + gpu_ids: [0] + workers_per_gpu: 8 + batchsize_per_gpu: 160 +schedule: + resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: SGD + lr: 0.14 + momentum: 0.9 + weight_decay: 0.0001 + warmup: + name: linear + steps: 500 + ratio: 0.01 + total_epochs: 190 + lr_schedule: + name: MultiStepLR + milestones: [140,170,180,185] + gamma: 0.1 + val_intervals: 10 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/config/legacy_v0.x_configs/nanodet-g.yml b/config/legacy_v0.x_configs/nanodet-g.yml new file mode 100644 index 0000000..93cb982 --- /dev/null +++ b/config/legacy_v0.x_configs/nanodet-g.yml @@ -0,0 +1,122 @@ +# NanoDet-g-416 is designed for edge NPU, GPU or TPU with high parallel computing power but low memory bandwidth +# COCO mAP(0.5:0.95) = 22.9 +# Flops = 4.2B +# Params = 3.8M +# COCO pre-trained weight link: https://drive.google.com/file/d/10uW7oqZKw231l_tr4C1bJWkbCXgBf7av/view?usp=sharing +save_dir: workspace/nanodet_g +model: + arch: + name: OneStageDetector + backbone: + name: CustomCspNet + net_cfg: [[ 'Conv', 3, 32, 3, 2], # 1/2 + [ 'MaxPool', 3, 2 ], # 1/4 + [ 'CspBlock', 32, 1, 3, 1 ], # 1/4 + [ 'CspBlock', 64, 2, 3, 2 ], # 1/8 + [ 'CspBlock', 128, 2, 3, 2 ], # 1/16 + [ 'CspBlock', 256, 3, 3, 2 ]] # 1/32 + out_stages: [3,4,5] + activation: LeakyReLU + fpn: + name: PAN + in_channels: [128, 256, 512] + out_channels: 128 + start_level: 0 + num_outs: 3 + head: + name: NanoDetHead + num_classes: 80 + conv_type: Conv + activation: LeakyReLU + input_channel: 128 + feat_channels: 128 + stacked_convs: 1 + share_cls_reg: True + octave_base_scale: 8 + scales_per_octave: 1 + strides: [8, 16, 32] + reg_max: 10 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 +data: + train: + name: CocoDataset + img_path: coco/train2017 + ann_path: coco/annotations/instances_train2017.json + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.6, 1.4] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.6, 1.4] + saturation: [0.5, 1.2] + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] + val: + name: CocoDataset + img_path: coco/val2017 + ann_path: coco/annotations/instances_val2017.json + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] +device: + gpu_ids: [0] + workers_per_gpu: 10 + batchsize_per_gpu: 128 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: SGD + lr: 0.1 + momentum: 0.9 + weight_decay: 0.0001 + warmup: + name: linear + steps: 500 + ratio: 0.01 + total_epochs: 190 + lr_schedule: + name: MultiStepLR + milestones: [130,160,175,185] + gamma: 0.1 + val_intervals: 5 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/config/legacy_v0.x_configs/nanodet-m-0.5x.yml b/config/legacy_v0.x_configs/nanodet-m-0.5x.yml new file mode 100644 index 0000000..f5e6e85 --- /dev/null +++ b/config/legacy_v0.x_configs/nanodet-m-0.5x.yml @@ -0,0 +1,117 @@ +# nanodet-m-0.5x +# COCO mAP(0.5:0.95) = 0.135 +# AP_50 = 0.245 +# AP_75 = 0.129 +# AP_small = 0.036 +# AP_m = 0.119 +# AP_l = 0.232 +save_dir: workspace/nanodet_m_0.5x +model: + arch: + name: OneStageDetector + backbone: + name: ShuffleNetV2 + model_size: 0.5x + out_stages: [2,3,4] + activation: LeakyReLU + fpn: + name: PAN + in_channels: [48, 96, 192] + out_channels: 96 + start_level: 0 + num_outs: 3 + head: + name: NanoDetHead + num_classes: 80 + input_channel: 96 + feat_channels: 96 + stacked_convs: 2 + share_cls_reg: True + octave_base_scale: 5 + scales_per_octave: 1 + strides: [8, 16, 32] + reg_max: 7 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 +data: + train: + name: CocoDataset + img_path: coco/train2017 + ann_path: coco/annotations/instances_train2017.json + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.5, 1.5] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.6, 1.4] + saturation: [0.5, 1.2] + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] + val: + name: CocoDataset + img_path: coco/val2017 + ann_path: coco/annotations/instances_val2017.json + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] +device: + gpu_ids: [0] + workers_per_gpu: 8 + batchsize_per_gpu: 96 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: SGD + lr: 0.07 + momentum: 0.9 + weight_decay: 0.0001 + warmup: + name: linear + steps: 1000 + ratio: 0.00001 + total_epochs: 180 + lr_schedule: + name: MultiStepLR + milestones: [130,160,175] + gamma: 0.1 + val_intervals: 10 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 50 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/config/legacy_v0.x_configs/nanodet-m-1.5x-416.yml b/config/legacy_v0.x_configs/nanodet-m-1.5x-416.yml new file mode 100644 index 0000000..f4ff310 --- /dev/null +++ b/config/legacy_v0.x_configs/nanodet-m-1.5x-416.yml @@ -0,0 +1,117 @@ +#nanodet-m-1.5x-416 +# COCO mAP(0.5:0.95) = 0.268 +# AP_50 = 0.424 +# AP_75 = 0.276 +# AP_small = 0.098 +# AP_m = 0.277 +# AP_l = 0.420 +save_dir: workspace/nanodet_m_1.5x_416 +model: + arch: + name: OneStageDetector + backbone: + name: ShuffleNetV2 + model_size: 1.5x + out_stages: [2,3,4] + activation: LeakyReLU + fpn: + name: PAN + in_channels: [176, 352, 704] + out_channels: 128 + start_level: 0 + num_outs: 3 + head: + name: NanoDetHead + num_classes: 80 + input_channel: 128 + feat_channels: 128 + stacked_convs: 2 + share_cls_reg: True + octave_base_scale: 5 + scales_per_octave: 1 + strides: [8, 16, 32] + reg_max: 7 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 +data: + train: + name: CocoDataset + img_path: coco/train2017 + ann_path: coco/annotations/instances_train2017.json + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.5, 1.4] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.6, 1.4] + saturation: [0.5, 1.2] + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] + val: + name: CocoDataset + img_path: coco/val2017 + ann_path: coco/annotations/instances_val2017.json + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] +device: + gpu_ids: [0] + workers_per_gpu: 8 + batchsize_per_gpu: 176 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: SGD + lr: 0.14 + momentum: 0.9 + weight_decay: 0.0001 + warmup: + name: linear + steps: 300 + ratio: 0.1 + total_epochs: 280 + lr_schedule: + name: MultiStepLR + milestones: [240,260,275] + gamma: 0.1 + val_intervals: 10 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/config/legacy_v0.x_configs/nanodet-m-1.5x.yml b/config/legacy_v0.x_configs/nanodet-m-1.5x.yml new file mode 100644 index 0000000..c622c2f --- /dev/null +++ b/config/legacy_v0.x_configs/nanodet-m-1.5x.yml @@ -0,0 +1,117 @@ +#nanodet-m-1.5x +# COCO mAP(0.5:0.95) = 0.235 +# AP_50 = 0.384 +# AP_75 = 0.239 +# AP_small = 0.069 +# AP_m = 0.235 +# AP_l = 0.389 +save_dir: workspace/nanodet_m_1.5x +model: + arch: + name: OneStageDetector + backbone: + name: ShuffleNetV2 + model_size: 1.5x + out_stages: [2,3,4] + activation: LeakyReLU + fpn: + name: PAN + in_channels: [176, 352, 704] + out_channels: 128 + start_level: 0 + num_outs: 3 + head: + name: NanoDetHead + num_classes: 80 + input_channel: 128 + feat_channels: 128 + stacked_convs: 2 + share_cls_reg: True + octave_base_scale: 5 + scales_per_octave: 1 + strides: [8, 16, 32] + reg_max: 7 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 +data: + train: + name: CocoDataset + img_path: coco/train2017 + ann_path: coco/annotations/instances_train2017.json + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.6, 1.4] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.6, 1.4] + saturation: [0.5, 1.2] + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] + val: + name: CocoDataset + img_path: coco/val2017 + ann_path: coco/annotations/instances_val2017.json + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] +device: + gpu_ids: [0] + workers_per_gpu: 8 + batchsize_per_gpu: 192 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: SGD + lr: 0.14 + momentum: 0.9 + weight_decay: 0.0001 + warmup: + name: linear + steps: 300 + ratio: 0.1 + total_epochs: 280 + lr_schedule: + name: MultiStepLR + milestones: [240,260,275] + gamma: 0.1 + val_intervals: 10 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/config/legacy_v0.x_configs/nanodet-m-416.yml b/config/legacy_v0.x_configs/nanodet-m-416.yml new file mode 100644 index 0000000..58c84ad --- /dev/null +++ b/config/legacy_v0.x_configs/nanodet-m-416.yml @@ -0,0 +1,117 @@ +#nanodet-m-416 +# COCO mAP(0.5:0.95) = 0.235 +# AP_50 = 0.384 +# AP_75 = 0.242 +# AP_small = 0.082 +# AP_m = 0.240 +# AP_l = 0.375 +save_dir: workspace/nanodet_m_416 +model: + arch: + name: OneStageDetector + backbone: + name: ShuffleNetV2 + model_size: 1.0x + out_stages: [2,3,4] + activation: LeakyReLU + fpn: + name: PAN + in_channels: [116, 232, 464] + out_channels: 96 + start_level: 0 + num_outs: 3 + head: + name: NanoDetHead + num_classes: 80 + input_channel: 96 + feat_channels: 96 + stacked_convs: 2 + share_cls_reg: True + octave_base_scale: 5 + scales_per_octave: 1 + strides: [8, 16, 32] + reg_max: 7 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 +data: + train: + name: CocoDataset + img_path: coco/train2017 + ann_path: coco/annotations/instances_train2017.json + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.5, 1.4] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.6, 1.4] + saturation: [0.5, 1.2] + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] + val: + name: CocoDataset + img_path: coco/val2017 + ann_path: coco/annotations/instances_val2017.json + input_size: [416,416] #[w,h] + keep_ratio: True + pipeline: + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] +device: + gpu_ids: [0] + workers_per_gpu: 8 + batchsize_per_gpu: 192 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: SGD + lr: 0.14 + momentum: 0.9 + weight_decay: 0.0001 + warmup: + name: linear + steps: 300 + ratio: 0.1 + total_epochs: 280 + lr_schedule: + name: MultiStepLR + milestones: [240,260,275] + gamma: 0.1 + val_intervals: 10 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/config/legacy_v0.x_configs/nanodet-m.yml b/config/legacy_v0.x_configs/nanodet-m.yml new file mode 100644 index 0000000..1c719fd --- /dev/null +++ b/config/legacy_v0.x_configs/nanodet-m.yml @@ -0,0 +1,111 @@ +#Config File example +save_dir: workspace/nanodet_m +model: + arch: + name: OneStageDetector + backbone: + name: ShuffleNetV2 + model_size: 1.0x + out_stages: [2,3,4] + activation: LeakyReLU + fpn: + name: PAN + in_channels: [116, 232, 464] + out_channels: 96 + start_level: 0 + num_outs: 3 + head: + name: NanoDetHead + num_classes: 80 + input_channel: 96 + feat_channels: 96 + stacked_convs: 2 + share_cls_reg: True + octave_base_scale: 5 + scales_per_octave: 1 + strides: [8, 16, 32] + reg_max: 7 + norm_cfg: + type: BN + loss: + loss_qfl: + name: QualityFocalLoss + use_sigmoid: True + beta: 2.0 + loss_weight: 1.0 + loss_dfl: + name: DistributionFocalLoss + loss_weight: 0.25 + loss_bbox: + name: GIoULoss + loss_weight: 2.0 +data: + train: + name: CocoDataset + img_path: coco/train2017 + ann_path: coco/annotations/instances_train2017.json + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + perspective: 0.0 + scale: [0.6, 1.4] + stretch: [[1, 1], [1, 1]] + rotation: 0 + shear: 0 + translate: 0.2 + flip: 0.5 + brightness: 0.2 + contrast: [0.6, 1.4] + saturation: [0.5, 1.2] + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] + val: + name: CocoDataset + img_path: coco/val2017 + ann_path: coco/annotations/instances_val2017.json + input_size: [320,320] #[w,h] + keep_ratio: True + pipeline: + normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]] +device: + gpu_ids: [0] + workers_per_gpu: 8 + batchsize_per_gpu: 192 +schedule: +# resume: +# load_model: YOUR_MODEL_PATH + optimizer: + name: SGD + lr: 0.14 + momentum: 0.9 + weight_decay: 0.0001 + warmup: + name: linear + steps: 300 + ratio: 0.1 + total_epochs: 280 + lr_schedule: + name: MultiStepLR + milestones: [240,260,275] + gamma: 0.1 + val_intervals: 10 +evaluator: + name: CocoDetectionEvaluator + save_key: mAP + +log: + interval: 10 + +class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant', + 'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', + 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', + 'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat', + 'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket', + 'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', + 'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch', + 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop', + 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave', + 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', + 'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'] diff --git a/examples/example_notebooks/logs.txt b/examples/example_notebooks/logs.txt new file mode 100644 index 0000000..58d1c24 --- /dev/null +++ b/examples/example_notebooks/logs.txt @@ -0,0 +1,2 @@ +INFO:root:Press "Esc", "q" or "Q" to exit. +INFO:root:Press "Esc", "q" or "Q" to exit. diff --git a/examples/example_notebooks/mot_Nanodet.ipynb b/examples/example_notebooks/mot_Nanodet.ipynb new file mode 100644 index 0000000..8e4080d --- /dev/null +++ b/examples/example_notebooks/mot_Nanodet.ipynb @@ -0,0 +1,793 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Multiple object tracking with Nanodet-based object detection" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import cv2 as cv\n", + "from motrackers.detectors import Nanodet\n", + "from motrackers import CentroidTracker, CentroidKF_Tracker, SORT, IOUTracker\n", + "from motrackers.utils import draw_tracks\n", + "from nanodet.util import Logger, cfg, load_config, load_model_weight\n", + "import ipywidgets as widgets" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "VIDEO_FILE = r\"D:\\shijue\\LiquidDrop\\22.avi\"\n", + "WEIGHTS_PATH = r'D:\\shijue\\multi-object-tracker\\weight\\LiquidV4.pth'\n", + "CONFIG_FILE_PATH = r'D:\\shijue\\multi-object-tracker\\config\\LiquidDetect416.yml'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ee9c2b6ebbb3476791fc9262227dce83", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Select(description='MOTracker:', options=('CentroidTracker', 'CentroidKF_Tracker', 'SORT', 'IOUTracker'), valu…" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chosen_tracker = widgets.Select(\n", + " options=[\"CentroidTracker\", \"CentroidKF_Tracker\", \"SORT\", \"IOUTracker\"],\n", + " value='CentroidTracker',\n", + " rows=5,\n", + " description='MOTracker:',\n", + " disabled=False\n", + ")\n", + "chosen_tracker" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "if chosen_tracker.value == 'CentroidTracker':\n", + " tracker = CentroidTracker(max_lost=0, tracker_output_format='mot_challenge')\n", + "elif chosen_tracker.value == 'CentroidKF_Tracker':\n", + " tracker = CentroidKF_Tracker(max_lost=0, tracker_output_format='mot_challenge')\n", + "elif chosen_tracker.value == 'SORT':\n", + " tracker = SORT(max_lost=3, tracker_output_format='mot_challenge', iou_threshold=0.3)\n", + "elif chosen_tracker.value == 'IOUTracker':\n", + " tracker = IOUTracker(max_lost=2, iou_threshold=0.5, min_detection_confidence=0.4, max_detection_confidence=0.7,\n", + " tracker_output_format='mot_challenge')\n", + "else:\n", + " print(\"Please choose one tracker from the above list.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "model size is 1.0x\n", + "init weights...\n", + "=> loading pretrained model https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth\n", + "Finish initialize NanoDet-Plus Head.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001b[1m\u001b[35m[root]\u001b[0m\u001b[34m[04-10 23:38:06]\u001b[0m\u001b[32mINFO:\u001b[0m\u001b[37mPress \"Esc\", \"q\" or \"Q\" to exit.\u001b[0m\n" + ] + } + ], + "source": [ + "# 导入模型文件\n", + "local_rank = 0\n", + "modelpath = WEIGHTS_PATH\n", + "device = \"cpu:0\"\n", + "config = CONFIG_FILE_PATH\n", + "logger = Logger(local_rank, use_tensorboard=False)\n", + "load_config(cfg, config)\n", + "detmodel = Nanodet(cfg, modelpath, logger, device)\n", + "logger.log('Press \"Esc\", \"q\" or \"Q\" to exit.')" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "def main(video_path, model, tracker):\n", + "\n", + " cap = cv.VideoCapture(video_path)\n", + " while True:\n", + " ok, image = cap.read()\n", + "\n", + " if not ok:\n", + " print(\"Cannot read the video feed.\")\n", + " break\n", + " \n", + " meta, res = model.inference(image)\n", + " bboxes,confidences,class_ids,updated_image = model.visualize(res[0], meta, cfg.class_names, 0.43)\n", + " \n", + " tracks = tracker.update(bboxes, confidences, class_ids)\n", + "\n", + " updated_image = draw_tracks(updated_image, tracks)\n", + "\n", + " cv.imshow(\"image\", updated_image)\n", + " if cv.waitKey(1) & 0xFF == ord('q'):\n", + " break\n", + "\n", + " cap.release()\n", + " cv.destroyAllWindows()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "forward time: 0.077s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.066s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.069s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.059s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.062s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.097s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.000s | viz time: 0.003s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.045s | decode time: 0.001s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.064s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.062s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.001s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.058s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.070s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.057s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.060s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.071s | decode time: 0.004s | viz time: 0.001s\n", + "forward time: 0.063s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.059s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.057s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.057s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.057s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.058s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.017s | viz time: 0.001s\n", + "forward time: 0.057s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.061s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.059s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.057s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.059s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.061s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.061s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.058s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.004s | viz time: 0.001s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.059s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.000s | viz time: 0.000s\n", + "forward time: 0.072s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.056s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.061s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.057s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.004s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.002s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.056s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.076s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.056s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.060s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.056s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.058s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.070s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.064s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.068s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.064s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.058s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.062s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.066s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.065s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.059s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.057s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.075s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.059s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.056s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.060s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.059s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.057s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.004s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.058s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.078s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.058s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.001s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.061s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.057s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.061s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.057s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.056s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.057s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.063s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.063s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.046s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.044s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.100s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.059s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.043s | decode time: 0.010s | viz time: 0.003s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.001s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.054s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.104s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.064s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.060s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.058s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.056s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.058s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.007s | viz time: 0.000s\n", + "forward time: 0.056s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.061s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.063s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.062s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.057s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.059s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.060s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.056s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.056s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.053s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.044s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.044s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.043s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.044s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.044s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.049s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.044s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.001s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.046s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.050s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.052s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.055s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.049s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.045s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.048s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.001s\n", + "forward time: 0.055s | decode time: 0.004s | viz time: 0.001s\n", + "forward time: 0.054s | decode time: 0.002s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.003s | viz time: 0.000s\n", + "forward time: 0.047s | decode time: 0.004s | viz time: 0.000s\n", + "forward time: 0.051s | decode time: 0.002s | viz time: 0.000s\n", + "Cannot read the video feed.\n" + ] + } + ], + "source": [ + "main(VIDEO_FILE, detmodel, tracker)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "ist", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.16" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/examples/example_notebooks/mot_YOLOv3.ipynb b/examples/example_notebooks/mot_YOLOv3.ipynb index 248ee6d..f280059 100644 --- a/examples/example_notebooks/mot_YOLOv3.ipynb +++ b/examples/example_notebooks/mot_YOLOv3.ipynb @@ -46,7 +46,7 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "ae18feabad2649079498e476cb1cc240", + "model_id": "70c91504b2554928915ed6de8c9dfe63", "version_major": 2, "version_minor": 0 }, @@ -54,8 +54,9 @@ "Select(description='MOTracker:', options=('CentroidTracker', 'CentroidKF_Tracker', 'SORT', 'IOUTracker'), valu…" ] }, + "execution_count": 3, "metadata": {}, - "output_type": "display_data" + "output_type": "execute_result" } ], "source": [ @@ -145,7 +146,15 @@ "cell_type": "code", "execution_count": 7, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cannot read the video feed.\n" + ] + } + ], "source": [ "main(VIDEO_FILE, model, tracker)" ] @@ -160,9 +169,9 @@ ], "metadata": { "kernelspec": { - "display_name": "work_env", + "display_name": "ist", "language": "python", - "name": "work_env" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -174,7 +183,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.9" + "version": "3.8.16" } }, "nbformat": 4, diff --git a/logs.txt b/logs.txt new file mode 100644 index 0000000..c19fe0a --- /dev/null +++ b/logs.txt @@ -0,0 +1 @@ +INFO:root:Press "Esc", "q" or "Q" to exit. diff --git a/main.py b/main.py new file mode 100644 index 0000000..d578be3 --- /dev/null +++ b/main.py @@ -0,0 +1,64 @@ +import numpy as np +import cv2 as cv +from motrackers.detectors import Nanodet +from motrackers import CentroidTracker, CentroidKF_Tracker, SORT, IOUTracker +from motrackers.utils import draw_tracks +from nanodet.util import Logger, cfg, load_config, load_model_weight + +VIDEO_FILE = "test.avi" +WEIGHTS_PATH = 'weight/LiquidV5.pth' +CONFIG_FILE_PATH = 'config/LiquidDetect416.yml' +CHOSEN_TRACKER = 'SORT' +CONFIDENCE_THRESHOLD = 0.4 # 目标检测的置信度筛选 + + + +if CHOSEN_TRACKER == 'CentroidTracker': + tracker = CentroidTracker(max_lost=0, tracker_output_format='mot_challenge') +elif CHOSEN_TRACKER == 'CentroidKF_Tracker': + tracker = CentroidKF_Tracker(max_lost=0, tracker_output_format='mot_challenge') +elif CHOSEN_TRACKER == 'SORT': + tracker = SORT(max_lost=3, tracker_output_format='mot_challenge', iou_threshold=0.3) +elif CHOSEN_TRACKER == 'IOUTracker': + tracker = IOUTracker(max_lost=2, iou_threshold=0.5, min_detection_confidence=0.4, max_detection_confidence=0.7, + tracker_output_format='mot_challenge') +else: + print("Please choose one tracker from the above list.") + +# 导入模型文件 +local_rank = 0 +modelpath = WEIGHTS_PATH +device = "cpu:0" +config = CONFIG_FILE_PATH +logger = Logger(local_rank, use_tensorboard=False) +load_config(cfg, config) +detmodel = Nanodet(cfg, modelpath, logger, device) +logger.log('Press "Esc", "q" or "Q" to exit.') + +def main(video_path, model, tracker): + + cap = cv.VideoCapture(video_path) + while True: + ok, image = cap.read() + + if not ok: + print("Cannot read the video feed.") + break + + meta, res = model.inference(image) + bboxes,confidences,class_ids,updated_image = model.visualize(res[0], meta, cfg.class_names, CONFIDENCE_THRESHOLD) + + tracks = tracker.update(bboxes, confidences, class_ids) + + updated_image = draw_tracks(updated_image, tracks) + + cv.imshow("image", updated_image) + if cv.waitKey(1) & 0xFF == ord('q'): + break + + cap.release() + cv.destroyAllWindows() + + + +main(VIDEO_FILE, detmodel, tracker) \ No newline at end of file diff --git a/motrackers/detectors/__init__.py b/motrackers/detectors/__init__.py index eacd013..58350f4 100644 --- a/motrackers/detectors/__init__.py +++ b/motrackers/detectors/__init__.py @@ -1,3 +1,4 @@ from motrackers.detectors.tf import TF_SSDMobileNetV2 from motrackers.detectors.caffe import Caffe_SSDMobileNet from motrackers.detectors.yolo import YOLOv3 +from motrackers.detectors.nanodet import Nanodet diff --git a/motrackers/detectors/nanodet.py b/motrackers/detectors/nanodet.py new file mode 100644 index 0000000..ff78c9a --- /dev/null +++ b/motrackers/detectors/nanodet.py @@ -0,0 +1,80 @@ +import cv2 +import numpy as np +from nanodet.data.batch_process import stack_batch_img +from nanodet.data.collate import naive_collate +from nanodet.data.transform import Pipeline +from nanodet.model.arch import build_model +from nanodet.util import Logger, cfg, load_config, load_model_weight +from tool import infotrans +import numpy as np +import os +import time +import torch + +class Nanodet(object): + def __init__(self, cfg, model_path, logger, device="cpu:0"): + self.cfg = cfg + self.device = device + model = build_model(cfg.model) + ckpt = torch.load(model_path, map_location=lambda storage, loc: storage) + load_model_weight(model, ckpt, logger) + if cfg.model.arch.backbone.name == "RepVGG": + deploy_config = cfg.model + deploy_config.arch.backbone.update({"deploy": True}) + deploy_model = build_model(deploy_config) + from nanodet.model.backbone.repvgg import repvgg_det_model_convert + model = repvgg_det_model_convert(model, deploy_model) + self.model = model.to(device).eval() + self.pipeline = Pipeline(cfg.data.val.pipeline, cfg.data.val.keep_ratio) + + def inference(self, img): + self.image = img.copy() + img_info = {"id": 0} + if isinstance(img, str): + img_info["file_name"] = os.path.basename(img) + img = cv2.imread(img) + else: + img_info["file_name"] = None + + height, width = img.shape[:2] + img_info["height"] = height + img_info["width"] = width + meta = dict(img_info=img_info, raw_img=img, img=img) + meta = self.pipeline(None, meta, self.cfg.data.val.input_size) + meta["img"] = torch.from_numpy(meta["img"].transpose(2, 0, 1)).to(self.device) + meta = naive_collate([meta]) + meta["img"] = stack_batch_img(meta["img"], divisible=32) + with torch.no_grad(): + results = self.model.inference(meta) + return meta, results + + def visualize(self, dets, meta, class_names, score_thres, wait=0): + """ + 由可视化函数修改得的信息输出函数 + + Outputs: + bboxes (int): [x,y,w,h] + confidences (float): 置信度 + class_ids (int): 类别 + """ + time1 = time.time() + result_img, all_box = self.model.head.show_result( + meta["raw_img"][0], dets, class_names, score_thres=score_thres, show=True + ) + bboxes , confidences , class_ids = infotrans(all_box) + print("viz time: {:.3f}s".format(time.time() - time1)) + self.class_names = dict(zip(class_ids,class_names)) + np.random.seed(12345) + for bb, conf, cid in zip(bboxes, confidences, class_ids): + # bbox_colors = {key: np.random.randint(0, 255, size=(3,)).tolist() for key in self.class_names.keys()} + # clr = [int(c) for c in bbox_colors[cid]] + cv2.rectangle(self.image, (bb[0], bb[1]), (bb[0] + bb[2], bb[1] + bb[3]), (253,230,224), 2) + # label = "{}:{:.4f}".format(self.class_names[cid], conf) + # (label_width, label_height), baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2) + # y_label = max(bb[1], label_height) + # cv2.rectangle(self.image, (bb[0], y_label - label_height), (bb[0] + label_width, y_label + baseLine), + # (255, 255, 255), cv2.FILLED) + # cv2.putText(self.image, label, (bb[0], y_label), cv2.FONT_HERSHEY_SIMPLEX, 0.5, clr, 2) + bboxes = np.array(bboxes).astype('int') + confidences = np.array(confidences) + return bboxes , confidences , class_ids , self.image diff --git a/motrackers/detectors/yolo.py b/motrackers/detectors/yolo.py index d6fd24d..e153fa5 100644 --- a/motrackers/detectors/yolo.py +++ b/motrackers/detectors/yolo.py @@ -23,7 +23,7 @@ def __init__(self, weights_path, configfile_path, labels_path, confidence_thresh object_names = load_labelsjson(labels_path) layer_names = self.net.getLayerNames() - if cv2.__version__ == '4.6.0': + if cv.__version__ == '4.6.0': self.layer_names = [layer_names[i - 1] for i in self.net.getUnconnectedOutLayers()] else: self.layer_names = [layer_names[i[0] - 1] for i in self.net.getUnconnectedOutLayers()] diff --git a/nanodet/__about__.py b/nanodet/__about__.py new file mode 100644 index 0000000..57c1e20 --- /dev/null +++ b/nanodet/__about__.py @@ -0,0 +1,24 @@ +import time + +_this_year = time.strftime("%Y") +__version__ = "1.0.0-alpha" +__author__ = "RangiLyu" +__author_email__ = "lyuchqi@gmail.com" +__license__ = "Apache-2.0" +__copyright__ = f"Copyright (c) 2020-{_this_year}, {__author__}." +__homepage__ = "https://github.com/RangiLyu/nanodet" + +__docs__ = ( + "NanoDet: Deep learning object detection toolbox for super fast and " + "lightweight anchor-free object detection models." +) + +__all__ = [ + "__author__", + "__author_email__", + "__copyright__", + "__docs__", + "__homepage__", + "__license__", + "__version__", +] diff --git a/nanodet/__init__.py b/nanodet/__init__.py new file mode 100644 index 0000000..c0a320a --- /dev/null +++ b/nanodet/__init__.py @@ -0,0 +1,8 @@ +"""package info.""" + +import os + +from nanodet.__about__ import * # noqa: F401 F403 + +_PACKAGE_ROOT = os.path.dirname(__file__) +_PROJECT_ROOT = os.path.dirname(_PACKAGE_ROOT) diff --git a/nanodet/data/batch_process.py b/nanodet/data/batch_process.py new file mode 100644 index 0000000..f84170a --- /dev/null +++ b/nanodet/data/batch_process.py @@ -0,0 +1,37 @@ +from typing import Sequence + +import torch +import torch.nn.functional as F + + +def stack_batch_img( + img_tensors: Sequence[torch.Tensor], divisible: int = 0, pad_value: float = 0.0 +) -> torch.Tensor: + """ + Args: + img_tensors (Sequence[torch.Tensor]): + divisible (int): + pad_value (float): value to pad + + Returns: + torch.Tensor. + """ + assert len(img_tensors) > 0 + assert isinstance(img_tensors, (tuple, list)) + assert divisible >= 0 + img_heights = [] + img_widths = [] + for img in img_tensors: + assert img.shape[:-2] == img_tensors[0].shape[:-2] + img_heights.append(img.shape[-2]) + img_widths.append(img.shape[-1]) + max_h, max_w = max(img_heights), max(img_widths) + if divisible > 0: + max_h = (max_h + divisible - 1) // divisible * divisible + max_w = (max_w + divisible - 1) // divisible * divisible + + batch_imgs = [] + for img in img_tensors: + padding_size = [0, max_w - img.shape[-1], 0, max_h - img.shape[-2]] + batch_imgs.append(F.pad(img, padding_size, value=pad_value)) + return torch.stack(batch_imgs, dim=0).contiguous() diff --git a/nanodet/data/collate.py b/nanodet/data/collate.py new file mode 100644 index 0000000..b559c1a --- /dev/null +++ b/nanodet/data/collate.py @@ -0,0 +1,84 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import collections +import re + +import torch +# from torch._six import string_classes + +string_classes = (str, bytes) + +np_str_obj_array_pattern = re.compile(r"[SaUO]") + +default_collate_err_msg_format = ( + "default_collate: batch must contain tensors, numpy arrays, numbers, " + "dicts or lists; found {}" +) + + +def collate_function(batch): + r"""Puts each data field into a tensor with outer dimension batch size""" + + elem = batch[0] + elem_type = type(elem) + if isinstance(elem, torch.Tensor): + out = None + if torch.utils.data.get_worker_info() is not None: + # If we're in a background process, concatenate directly into a + # shared memory tensor to avoid an extra copy + numel = sum([x.numel() for x in batch]) + storage = elem.storage()._new_shared(numel) + out = elem.new(storage) + return torch.stack(batch, 0, out=out) + elif ( + elem_type.__module__ == "numpy" + and elem_type.__name__ != "str_" + and elem_type.__name__ != "string_" + ): + elem = batch[0] + if elem_type.__name__ == "ndarray": + # array of string classes and object + if np_str_obj_array_pattern.search(elem.dtype.str) is not None: + raise TypeError(default_collate_err_msg_format.format(elem.dtype)) + + return batch + elif elem.shape == (): # scalars + return batch + elif isinstance(elem, float): + return torch.tensor(batch, dtype=torch.float64) + elif isinstance(elem, int): + return torch.tensor(batch) + elif isinstance(elem, string_classes): + return batch + elif isinstance(elem, collections.abc.Mapping): + return {key: collate_function([d[key] for d in batch]) for key in elem} + elif isinstance(elem, tuple) and hasattr(elem, "_fields"): # namedtuple + return elem_type(*(collate_function(samples) for samples in zip(*batch))) + elif isinstance(elem, collections.abc.Sequence): + transposed = zip(*batch) + return [collate_function(samples) for samples in transposed] + + raise TypeError(default_collate_err_msg_format.format(elem_type)) + + +def naive_collate(batch): + """Only collate dict value in to a list. E.g. meta data dict and img_info + dict will be collated.""" + + elem = batch[0] + if isinstance(elem, dict): + return {key: naive_collate([d[key] for d in batch]) for key in elem} + else: + return batch diff --git a/nanodet/data/dataset/__init__.py b/nanodet/data/dataset/__init__.py new file mode 100644 index 0000000..92c405b --- /dev/null +++ b/nanodet/data/dataset/__init__.py @@ -0,0 +1,41 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import warnings + +from .coco import CocoDataset +from .xml_dataset import XMLDataset + + +def build_dataset(cfg, mode): + dataset_cfg = copy.deepcopy(cfg) + name = dataset_cfg.pop("name") + if name == "coco": + warnings.warn( + "Dataset name coco has been deprecated. Please use CocoDataset instead." + ) + return CocoDataset(mode=mode, **dataset_cfg) + elif name == "xml_dataset": + warnings.warn( + "Dataset name xml_dataset has been deprecated. " + "Please use XMLDataset instead." + ) + return XMLDataset(mode=mode, **dataset_cfg) + elif name == "CocoDataset": + return CocoDataset(mode=mode, **dataset_cfg) + elif name == "XMLDataset": + return XMLDataset(mode=mode, **dataset_cfg) + else: + raise NotImplementedError("Unknown dataset type!") diff --git a/nanodet/data/dataset/base.py b/nanodet/data/dataset/base.py new file mode 100644 index 0000000..c47d578 --- /dev/null +++ b/nanodet/data/dataset/base.py @@ -0,0 +1,123 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import random +from abc import ABCMeta, abstractmethod +from typing import Dict, Optional, Tuple + +import numpy as np +from torch.utils.data import Dataset + +from ..transform import Pipeline + + +class BaseDataset(Dataset, metaclass=ABCMeta): + """ + A base class of detection dataset. Referring from MMDetection. + A dataset should have images, annotations and preprocessing pipelines + NanoDet use [xmin, ymin, xmax, ymax] format for box and + [[x0,y0], [x1,y1] ... [xn,yn]] format for key points. + instance masks should decode into binary masks for each instance like + { + 'bbox': [xmin,ymin,xmax,ymax], + 'mask': mask + } + segmentation mask should decode into binary masks for each class. + Args: + img_path (str): image data folder + ann_path (str): annotation file path or folder + use_instance_mask (bool): load instance segmentation data + use_seg_mask (bool): load semantic segmentation data + use_keypoint (bool): load pose keypoint data + load_mosaic (bool): using mosaic data augmentation from yolov4 + mode (str): 'train' or 'val' or 'test' + multi_scale (Tuple[float, float]): Multi-scale factor range. + """ + + def __init__( + self, + img_path: str, + ann_path: str, + input_size: Tuple[int, int], + pipeline: Dict, + keep_ratio: bool = True, + use_instance_mask: bool = False, + use_seg_mask: bool = False, + use_keypoint: bool = False, + load_mosaic: bool = False, + mode: str = "train", + multi_scale: Optional[Tuple[float, float]] = None, + ): + assert mode in ["train", "val", "test"] + self.img_path = img_path + self.ann_path = ann_path + self.input_size = input_size + self.pipeline = Pipeline(pipeline, keep_ratio) + self.keep_ratio = keep_ratio + self.use_instance_mask = use_instance_mask + self.use_seg_mask = use_seg_mask + self.use_keypoint = use_keypoint + self.load_mosaic = load_mosaic + self.multi_scale = multi_scale + self.mode = mode + + self.data_info = self.get_data_info(ann_path) + + def __len__(self): + return len(self.data_info) + + def __getitem__(self, idx): + if self.mode == "val" or self.mode == "test": + return self.get_val_data(idx) + else: + while True: + data = self.get_train_data(idx) + if data is None: + idx = self.get_another_id() + continue + return data + + @staticmethod + def get_random_size( + scale_range: Tuple[float, float], image_size: Tuple[int, int] + ) -> Tuple[int, int]: + """ + Get random image shape by multi-scale factor and image_size. + Args: + scale_range (Tuple[float, float]): Multi-scale factor range. + Format in [(width, height), (width, height)] + image_size (Tuple[int, int]): Image size. Format in (width, height). + + Returns: + Tuple[int, int] + """ + assert len(scale_range) == 2 + scale_factor = random.uniform(*scale_range) + width = int(image_size[0] * scale_factor) + height = int(image_size[1] * scale_factor) + return width, height + + @abstractmethod + def get_data_info(self, ann_path): + pass + + @abstractmethod + def get_train_data(self, idx): + pass + + @abstractmethod + def get_val_data(self, idx): + pass + + def get_another_id(self): + return np.random.random_integers(0, len(self.data_info) - 1) diff --git a/nanodet/data/dataset/coco.py b/nanodet/data/dataset/coco.py new file mode 100644 index 0000000..3c46b14 --- /dev/null +++ b/nanodet/data/dataset/coco.py @@ -0,0 +1,158 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +import cv2 +import numpy as np +import torch +from pycocotools.coco import COCO + +from .base import BaseDataset + + +class CocoDataset(BaseDataset): + def get_data_info(self, ann_path): + """ + Load basic information of dataset such as image path, label and so on. + :param ann_path: coco json file path + :return: image info: + [{'license': 2, + 'file_name': '000000000139.jpg', + 'coco_url': 'http://images.cocodataset.org/val2017/000000000139.jpg', + 'height': 426, + 'width': 640, + 'date_captured': '2013-11-21 01:34:01', + 'flickr_url': + 'http://farm9.staticflickr.com/8035/8024364858_9c41dc1666_z.jpg', + 'id': 139}, + ... + ] + """ + self.coco_api = COCO(ann_path) + self.cat_ids = sorted(self.coco_api.getCatIds()) + self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)} + self.cats = self.coco_api.loadCats(self.cat_ids) + self.class_names = [cat["name"] for cat in self.cats] + self.img_ids = sorted(self.coco_api.imgs.keys()) + img_info = self.coco_api.loadImgs(self.img_ids) + return img_info + + def get_per_img_info(self, idx): + img_info = self.data_info[idx] + file_name = img_info["file_name"] + height = img_info["height"] + width = img_info["width"] + id = img_info["id"] + if not isinstance(id, int): + raise TypeError("Image id must be int.") + info = {"file_name": file_name, "height": height, "width": width, "id": id} + return info + + def get_img_annotation(self, idx): + """ + load per image annotation + :param idx: index in dataloader + :return: annotation dict + """ + img_id = self.img_ids[idx] + ann_ids = self.coco_api.getAnnIds([img_id]) + anns = self.coco_api.loadAnns(ann_ids) + gt_bboxes = [] + gt_labels = [] + gt_bboxes_ignore = [] + if self.use_instance_mask: + gt_masks = [] + if self.use_keypoint: + gt_keypoints = [] + for ann in anns: + if ann.get("ignore", False): + continue + x1, y1, w, h = ann["bbox"] + if ann["area"] <= 0 or w < 1 or h < 1: + continue + if ann["category_id"] not in self.cat_ids: + continue + bbox = [x1, y1, x1 + w, y1 + h] + if ann.get("iscrowd", False): + gt_bboxes_ignore.append(bbox) + else: + gt_bboxes.append(bbox) + gt_labels.append(self.cat2label[ann["category_id"]]) + if self.use_instance_mask: + gt_masks.append(self.coco_api.annToMask(ann)) + if self.use_keypoint: + gt_keypoints.append(ann["keypoints"]) + if gt_bboxes: + gt_bboxes = np.array(gt_bboxes, dtype=np.float32) + gt_labels = np.array(gt_labels, dtype=np.int64) + else: + gt_bboxes = np.zeros((0, 4), dtype=np.float32) + gt_labels = np.array([], dtype=np.int64) + if gt_bboxes_ignore: + gt_bboxes_ignore = np.array(gt_bboxes_ignore, dtype=np.float32) + else: + gt_bboxes_ignore = np.zeros((0, 4), dtype=np.float32) + annotation = dict( + bboxes=gt_bboxes, labels=gt_labels, bboxes_ignore=gt_bboxes_ignore + ) + if self.use_instance_mask: + annotation["masks"] = gt_masks + if self.use_keypoint: + if gt_keypoints: + annotation["keypoints"] = np.array(gt_keypoints, dtype=np.float32) + else: + annotation["keypoints"] = np.zeros((0, 51), dtype=np.float32) + return annotation + + def get_train_data(self, idx): + """ + Load image and annotation + :param idx: + :return: meta-data (a dict containing image, annotation and other information) + """ + img_info = self.get_per_img_info(idx) + file_name = img_info["file_name"] + image_path = os.path.join(self.img_path, file_name) + img = cv2.imread(image_path) + if img is None: + print("image {} read failed.".format(image_path)) + raise FileNotFoundError("Cant load image! Please check image path!") + ann = self.get_img_annotation(idx) + meta = dict( + img=img, img_info=img_info, gt_bboxes=ann["bboxes"], gt_labels=ann["labels"] + ) + if self.use_instance_mask: + meta["gt_masks"] = ann["masks"] + if self.use_keypoint: + meta["gt_keypoints"] = ann["keypoints"] + + input_size = self.input_size + if self.multi_scale: + input_size = self.get_random_size(self.multi_scale, input_size) + + meta = self.pipeline(self, meta, input_size) + + meta["img"] = torch.from_numpy(meta["img"].transpose(2, 0, 1)) + return meta + + def get_val_data(self, idx): + """ + Currently no difference from get_train_data. + Not support TTA(testing time augmentation) yet. + :param idx: + :return: + """ + # TODO: support TTA + return self.get_train_data(idx) diff --git a/nanodet/data/dataset/xml_dataset.py b/nanodet/data/dataset/xml_dataset.py new file mode 100644 index 0000000..5300660 --- /dev/null +++ b/nanodet/data/dataset/xml_dataset.py @@ -0,0 +1,157 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging +import os +import time +import xml.etree.ElementTree as ET +from collections import defaultdict + +from pycocotools.coco import COCO + +from .coco import CocoDataset + + +def get_file_list(path, type=".xml"): + file_names = [] + for maindir, subdir, file_name_list in os.walk(path): + for filename in file_name_list: + apath = os.path.join(maindir, filename) + ext = os.path.splitext(apath)[1] + if ext == type: + file_names.append(filename) + return file_names + + +class CocoXML(COCO): + def __init__(self, annotation): + """ + Constructor of Microsoft COCO helper class for + reading and visualizing annotations. + :param annotation: annotation dict + :return: + """ + # load dataset + self.dataset, self.anns, self.cats, self.imgs = dict(), dict(), dict(), dict() + self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list) + dataset = annotation + assert type(dataset) == dict, "annotation file format {} not supported".format( + type(dataset) + ) + self.dataset = dataset + self.createIndex() + + +class XMLDataset(CocoDataset): + def __init__(self, class_names, **kwargs): + self.class_names = class_names + super(XMLDataset, self).__init__(**kwargs) + + def xml_to_coco(self, ann_path): + """ + convert xml annotations to coco_api + :param ann_path: + :return: + """ + logging.info("loading annotations into memory...") + tic = time.time() + ann_file_names = get_file_list(ann_path, type=".xml") + logging.info("Found {} annotation files.".format(len(ann_file_names))) + image_info = [] + categories = [] + annotations = [] + for idx, supercat in enumerate(self.class_names): + categories.append( + {"supercategory": supercat, "id": idx + 1, "name": supercat} + ) + ann_id = 1 + for idx, xml_name in enumerate(ann_file_names): + tree = ET.parse(os.path.join(ann_path, xml_name)) + root = tree.getroot() + file_name = root.find("filename").text + width = int(root.find("size").find("width").text) + height = int(root.find("size").find("height").text) + info = { + "file_name": file_name, + "height": height, + "width": width, + "id": idx + 1, + } + image_info.append(info) + for _object in root.findall("object"): + category = _object.find("name").text + if category not in self.class_names: + logging.warning( + "WARNING! {} is not in class_names! " + "Pass this box annotation.".format(category) + ) + continue + for cat in categories: + if category == cat["name"]: + cat_id = cat["id"] + xmin = int(_object.find("bndbox").find("xmin").text) + ymin = int(_object.find("bndbox").find("ymin").text) + xmax = int(_object.find("bndbox").find("xmax").text) + ymax = int(_object.find("bndbox").find("ymax").text) + w = xmax - xmin + h = ymax - ymin + if w < 0 or h < 0: + logging.warning( + "WARNING! Find error data in file {}! Box w and " + "h should > 0. Pass this box annotation.".format(xml_name) + ) + continue + coco_box = [max(xmin, 0), max(ymin, 0), min(w, width), min(h, height)] + ann = { + "image_id": idx + 1, + "bbox": coco_box, + "category_id": cat_id, + "iscrowd": 0, + "id": ann_id, + "area": coco_box[2] * coco_box[3], + } + annotations.append(ann) + ann_id += 1 + + coco_dict = { + "images": image_info, + "categories": categories, + "annotations": annotations, + } + logging.info( + "Load {} xml files and {} boxes".format(len(image_info), len(annotations)) + ) + logging.info("Done (t={:0.2f}s)".format(time.time() - tic)) + return coco_dict + + def get_data_info(self, ann_path): + """ + Load basic information of dataset such as image path, label and so on. + :param ann_path: coco json file path + :return: image info: + [{'file_name': '000000000139.jpg', + 'height': 426, + 'width': 640, + 'id': 139}, + ... + ] + """ + coco_dict = self.xml_to_coco(ann_path) + self.coco_api = CocoXML(coco_dict) + self.cat_ids = sorted(self.coco_api.getCatIds()) + self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)} + self.cats = self.coco_api.loadCats(self.cat_ids) + self.img_ids = sorted(self.coco_api.imgs.keys()) + img_info = self.coco_api.loadImgs(self.img_ids) + return img_info diff --git a/nanodet/data/transform/__init__.py b/nanodet/data/transform/__init__.py new file mode 100644 index 0000000..c30ae76 --- /dev/null +++ b/nanodet/data/transform/__init__.py @@ -0,0 +1,17 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .pipeline import Pipeline + +__all__ = ["Pipeline"] diff --git a/nanodet/data/transform/color.py b/nanodet/data/transform/color.py new file mode 100644 index 0000000..9eb0236 --- /dev/null +++ b/nanodet/data/transform/color.py @@ -0,0 +1,70 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import random + +import cv2 +import numpy as np + + +def random_brightness(img, delta): + img += random.uniform(-delta, delta) + return img + + +def random_contrast(img, alpha_low, alpha_up): + img *= random.uniform(alpha_low, alpha_up) + return img + + +def random_saturation(img, alpha_low, alpha_up): + hsv_img = cv2.cvtColor(img.astype(np.float32), cv2.COLOR_BGR2HSV) + hsv_img[..., 1] *= random.uniform(alpha_low, alpha_up) + img = cv2.cvtColor(hsv_img, cv2.COLOR_HSV2BGR) + return img + + +def normalize(meta, mean, std): + img = meta["img"].astype(np.float32) + mean = np.array(mean, dtype=np.float64).reshape(1, -1) + stdinv = 1 / np.array(std, dtype=np.float64).reshape(1, -1) + cv2.subtract(img, mean, img) + cv2.multiply(img, stdinv, img) + meta["img"] = img + return meta + + +def _normalize(img, mean, std): + mean = np.array(mean, dtype=np.float32).reshape(1, 1, 3) / 255 + std = np.array(std, dtype=np.float32).reshape(1, 1, 3) / 255 + img = (img - mean) / std + return img + + +def color_aug_and_norm(meta, kwargs): + img = meta["img"].astype(np.float32) / 255 + + if "brightness" in kwargs and random.randint(0, 1): + img = random_brightness(img, kwargs["brightness"]) + + if "contrast" in kwargs and random.randint(0, 1): + img = random_contrast(img, *kwargs["contrast"]) + + if "saturation" in kwargs and random.randint(0, 1): + img = random_saturation(img, *kwargs["saturation"]) + # cv2.imshow('trans', img) + # cv2.waitKey(0) + img = _normalize(img, *kwargs["normalize"]) + meta["img"] = img + return meta diff --git a/nanodet/data/transform/mosaic.py b/nanodet/data/transform/mosaic.py new file mode 100644 index 0000000..e69de29 diff --git a/nanodet/data/transform/pipeline.py b/nanodet/data/transform/pipeline.py new file mode 100644 index 0000000..71b8f7d --- /dev/null +++ b/nanodet/data/transform/pipeline.py @@ -0,0 +1,59 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import functools +import warnings +from typing import Dict, Tuple + +from torch.utils.data import Dataset + +from .color import color_aug_and_norm +from .warp import ShapeTransform, warp_and_resize + + +class LegacyPipeline: + def __init__(self, cfg, keep_ratio): + warnings.warn( + "Deprecated warning! Pipeline from nanodet v0.x has been deprecated," + "Please use new Pipeline and update your config!" + ) + self.warp = functools.partial( + warp_and_resize, warp_kwargs=cfg, keep_ratio=keep_ratio + ) + self.color = functools.partial(color_aug_and_norm, kwargs=cfg) + + def __call__(self, meta, dst_shape): + meta = self.warp(meta, dst_shape=dst_shape) + meta = self.color(meta=meta) + return meta + + +class Pipeline: + """Data process pipeline. Apply augmentation and pre-processing on + meta_data from dataset. + + Args: + cfg (Dict): Data pipeline config. + keep_ratio (bool): Whether to keep aspect ratio when resizing image. + + """ + + def __init__(self, cfg: Dict, keep_ratio: bool): + self.shape_transform = ShapeTransform(keep_ratio, **cfg) + self.color = functools.partial(color_aug_and_norm, kwargs=cfg) + + def __call__(self, dataset: Dataset, meta: Dict, dst_shape: Tuple[int, int]): + meta = self.shape_transform(meta, dst_shape=dst_shape) + meta = self.color(meta=meta) + return meta diff --git a/nanodet/data/transform/warp.py b/nanodet/data/transform/warp.py new file mode 100644 index 0000000..a102348 --- /dev/null +++ b/nanodet/data/transform/warp.py @@ -0,0 +1,352 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import random +from typing import Dict, Optional, Tuple + +import cv2 +import numpy as np + + +def get_flip_matrix(prob=0.5): + F = np.eye(3) + if random.random() < prob: + F[0, 0] = -1 + return F + + +def get_perspective_matrix(perspective=0.0): + """ + + :param perspective: + :return: + """ + P = np.eye(3) + P[2, 0] = random.uniform(-perspective, perspective) # x perspective (about y) + P[2, 1] = random.uniform(-perspective, perspective) # y perspective (about x) + return P + + +def get_rotation_matrix(degree=0.0): + """ + + :param degree: + :return: + """ + R = np.eye(3) + a = random.uniform(-degree, degree) + R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=1) + return R + + +def get_scale_matrix(ratio=(1, 1)): + """ + + :param ratio: + """ + Scl = np.eye(3) + scale = random.uniform(*ratio) + Scl[0, 0] *= scale + Scl[1, 1] *= scale + return Scl + + +def get_stretch_matrix(width_ratio=(1, 1), height_ratio=(1, 1)): + """ + + :param width_ratio: + :param height_ratio: + """ + Str = np.eye(3) + Str[0, 0] *= random.uniform(*width_ratio) + Str[1, 1] *= random.uniform(*height_ratio) + return Str + + +def get_shear_matrix(degree): + """ + + :param degree: + :return: + """ + Sh = np.eye(3) + Sh[0, 1] = math.tan( + random.uniform(-degree, degree) * math.pi / 180 + ) # x shear (deg) + Sh[1, 0] = math.tan( + random.uniform(-degree, degree) * math.pi / 180 + ) # y shear (deg) + return Sh + + +def get_translate_matrix(translate, width, height): + """ + + :param translate: + :return: + """ + T = np.eye(3) + T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width # x translation + T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height # y translation + return T + + +def get_resize_matrix(raw_shape, dst_shape, keep_ratio): + """ + Get resize matrix for resizing raw img to input size + :param raw_shape: (width, height) of raw image + :param dst_shape: (width, height) of input image + :param keep_ratio: whether keep original ratio + :return: 3x3 Matrix + """ + r_w, r_h = raw_shape + d_w, d_h = dst_shape + Rs = np.eye(3) + if keep_ratio: + C = np.eye(3) + C[0, 2] = -r_w / 2 + C[1, 2] = -r_h / 2 + + if r_w / r_h < d_w / d_h: + ratio = d_h / r_h + else: + ratio = d_w / r_w + Rs[0, 0] *= ratio + Rs[1, 1] *= ratio + + T = np.eye(3) + T[0, 2] = 0.5 * d_w + T[1, 2] = 0.5 * d_h + return T @ Rs @ C + else: + Rs[0, 0] *= d_w / r_w + Rs[1, 1] *= d_h / r_h + return Rs + + +def warp_and_resize( + meta: Dict, + warp_kwargs: Dict, + dst_shape: Tuple[int, int], + keep_ratio: bool = True, +): + # TODO: background, type + raw_img = meta["img"] + height = raw_img.shape[0] # shape(h,w,c) + width = raw_img.shape[1] + + # center + C = np.eye(3) + C[0, 2] = -width / 2 + C[1, 2] = -height / 2 + + # do not change the order of mat mul + if "perspective" in warp_kwargs and random.randint(0, 1): + P = get_perspective_matrix(warp_kwargs["perspective"]) + C = P @ C + if "scale" in warp_kwargs and random.randint(0, 1): + Scl = get_scale_matrix(warp_kwargs["scale"]) + C = Scl @ C + if "stretch" in warp_kwargs and random.randint(0, 1): + Str = get_stretch_matrix(*warp_kwargs["stretch"]) + C = Str @ C + if "rotation" in warp_kwargs and random.randint(0, 1): + R = get_rotation_matrix(warp_kwargs["rotation"]) + C = R @ C + if "shear" in warp_kwargs and random.randint(0, 1): + Sh = get_shear_matrix(warp_kwargs["shear"]) + C = Sh @ C + if "flip" in warp_kwargs: + F = get_flip_matrix(warp_kwargs["flip"]) + C = F @ C + if "translate" in warp_kwargs and random.randint(0, 1): + T = get_translate_matrix(warp_kwargs["translate"], width, height) + else: + T = get_translate_matrix(0, width, height) + M = T @ C + # M = T @ Sh @ R @ Str @ P @ C + ResizeM = get_resize_matrix((width, height), dst_shape, keep_ratio) + M = ResizeM @ M + img = cv2.warpPerspective(raw_img, M, dsize=tuple(dst_shape)) + meta["img"] = img + meta["warp_matrix"] = M + if "gt_bboxes" in meta: + boxes = meta["gt_bboxes"] + meta["gt_bboxes"] = warp_boxes(boxes, M, dst_shape[0], dst_shape[1]) + if "gt_masks" in meta: + for i, mask in enumerate(meta["gt_masks"]): + meta["gt_masks"][i] = cv2.warpPerspective(mask, M, dsize=tuple(dst_shape)) + + # TODO: keypoints + # if 'gt_keypoints' in meta: + + return meta + + +def warp_boxes(boxes, M, width, height): + n = len(boxes) + if n: + # warp points + xy = np.ones((n * 4, 3)) + xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape( + n * 4, 2 + ) # x1y1, x2y2, x1y2, x2y1 + xy = xy @ M.T # transform + xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8) # rescale + # create new boxes + x = xy[:, [0, 2, 4, 6]] + y = xy[:, [1, 3, 5, 7]] + xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T + # clip boxes + xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width) + xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height) + return xy.astype(np.float32) + else: + return boxes + + +# def warp_keypoints(keypoints, M, width, height): +# n = len(keypoints) +# if n: +# # warp points +# xy = np.ones((n * 4, 3)) +# # x1y1, x2y2, x1y2, x2y1 +# xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(n * 4, 2) +# xy = xy @ M.T # transform +# xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8) # rescale +# # create new boxes +# x = xy[:, [0, 2, 4, 6]] +# y = xy[:, [1, 3, 5, 7]] +# xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T +# # clip boxes +# xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width) +# xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height) +# return xy + + +def get_minimum_dst_shape( + src_shape: Tuple[int, int], + dst_shape: Tuple[int, int], + divisible: Optional[int] = None, +) -> Tuple[int, int]: + """Calculate minimum dst shape""" + src_w, src_h = src_shape + dst_w, dst_h = dst_shape + + if src_w / src_h < dst_w / dst_h: + ratio = dst_h / src_h + else: + ratio = dst_w / src_w + + dst_w = int(ratio * src_w) + dst_h = int(ratio * src_h) + + if divisible and divisible > 0: + dst_w = max(divisible, int((dst_w + divisible - 1) // divisible * divisible)) + dst_h = max(divisible, int((dst_h + divisible - 1) // divisible * divisible)) + return dst_w, dst_h + + +class ShapeTransform: + """Shape transforms including resize, random perspective, random scale, + random stretch, random rotation, random shear, random translate, + and random flip. + + Args: + keep_ratio: Whether to keep aspect ratio of the image. + divisible: Make image height and width is divisible by a number. + perspective: Random perspective factor. + scale: Random scale ratio. + stretch: Width and height stretch ratio range. + rotation: Random rotate degree. + shear: Random shear degree. + translate: Random translate ratio. + flip: Random flip probability. + """ + + def __init__( + self, + keep_ratio: bool, + divisible: int = 0, + perspective: float = 0.0, + scale: Tuple[int, int] = (1, 1), + stretch: Tuple = ((1, 1), (1, 1)), + rotation: float = 0.0, + shear: float = 0.0, + translate: float = 0.0, + flip: float = 0.0, + **kwargs + ): + self.keep_ratio = keep_ratio + self.divisible = divisible + self.perspective = perspective + self.scale_ratio = scale + self.stretch_ratio = stretch + self.rotation_degree = rotation + self.shear_degree = shear + self.flip_prob = flip + self.translate_ratio = translate + + def __call__(self, meta_data, dst_shape): + raw_img = meta_data["img"] + height = raw_img.shape[0] # shape(h,w,c) + width = raw_img.shape[1] + + # center + C = np.eye(3) + C[0, 2] = -width / 2 + C[1, 2] = -height / 2 + + P = get_perspective_matrix(self.perspective) + C = P @ C + + Scl = get_scale_matrix(self.scale_ratio) + C = Scl @ C + + Str = get_stretch_matrix(*self.stretch_ratio) + C = Str @ C + + R = get_rotation_matrix(self.rotation_degree) + C = R @ C + + Sh = get_shear_matrix(self.shear_degree) + C = Sh @ C + + F = get_flip_matrix(self.flip_prob) + C = F @ C + + T = get_translate_matrix(self.translate_ratio, width, height) + M = T @ C + + if self.keep_ratio: + dst_shape = get_minimum_dst_shape( + (width, height), dst_shape, self.divisible + ) + + ResizeM = get_resize_matrix((width, height), dst_shape, self.keep_ratio) + M = ResizeM @ M + img = cv2.warpPerspective(raw_img, M, dsize=tuple(dst_shape)) + meta_data["img"] = img + meta_data["warp_matrix"] = M + if "gt_bboxes" in meta_data: + boxes = meta_data["gt_bboxes"] + meta_data["gt_bboxes"] = warp_boxes(boxes, M, dst_shape[0], dst_shape[1]) + if "gt_masks" in meta_data: + for i, mask in enumerate(meta_data["gt_masks"]): + meta_data["gt_masks"][i] = cv2.warpPerspective( + mask, M, dsize=tuple(dst_shape) + ) + + return meta_data diff --git a/nanodet/evaluator/__init__.py b/nanodet/evaluator/__init__.py new file mode 100644 index 0000000..4285845 --- /dev/null +++ b/nanodet/evaluator/__init__.py @@ -0,0 +1,25 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import copy + +from .coco_detection import CocoDetectionEvaluator + + +def build_evaluator(cfg, dataset): + evaluator_cfg = copy.deepcopy(cfg) + name = evaluator_cfg.pop("name") + if name == "CocoDetectionEvaluator": + return CocoDetectionEvaluator(dataset) + else: + raise NotImplementedError diff --git a/nanodet/evaluator/coco_detection.py b/nanodet/evaluator/coco_detection.py new file mode 100644 index 0000000..5b51d54 --- /dev/null +++ b/nanodet/evaluator/coco_detection.py @@ -0,0 +1,149 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import contextlib +import copy +import io +import itertools +import json +import logging +import os +import warnings + +import numpy as np +from pycocotools.cocoeval import COCOeval +from tabulate import tabulate + +logger = logging.getLogger("NanoDet") + + +def xyxy2xywh(bbox): + """ + change bbox to coco format + :param bbox: [x1, y1, x2, y2] + :return: [x, y, w, h] + """ + return [ + bbox[0], + bbox[1], + bbox[2] - bbox[0], + bbox[3] - bbox[1], + ] + + +class CocoDetectionEvaluator: + def __init__(self, dataset): + assert hasattr(dataset, "coco_api") + self.class_names = dataset.class_names + self.coco_api = dataset.coco_api + self.cat_ids = dataset.cat_ids + self.metric_names = ["mAP", "AP_50", "AP_75", "AP_small", "AP_m", "AP_l"] + + def results2json(self, results): + """ + results: {image_id: {label: [bboxes...] } } + :return coco json format: {image_id: + category_id: + bbox: + score: } + """ + json_results = [] + for image_id, dets in results.items(): + for label, bboxes in dets.items(): + category_id = self.cat_ids[label] + for bbox in bboxes: + score = float(bbox[4]) + detection = dict( + image_id=int(image_id), + category_id=int(category_id), + bbox=xyxy2xywh(bbox), + score=score, + ) + json_results.append(detection) + return json_results + + def evaluate(self, results, save_dir, rank=-1): + results_json = self.results2json(results) + if len(results_json) == 0: + warnings.warn( + "Detection result is empty! Please check whether " + "training set is too small (need to increase val_interval " + "in config and train more epochs). Or check annotation " + "correctness." + ) + empty_eval_results = {} + for key in self.metric_names: + empty_eval_results[key] = 0 + return empty_eval_results + json_path = os.path.join(save_dir, "results{}.json".format(rank)) + json.dump(results_json, open(json_path, "w")) + coco_dets = self.coco_api.loadRes(json_path) + coco_eval = COCOeval( + copy.deepcopy(self.coco_api), copy.deepcopy(coco_dets), "bbox" + ) + coco_eval.evaluate() + coco_eval.accumulate() + + # use logger to log coco eval results + redirect_string = io.StringIO() + with contextlib.redirect_stdout(redirect_string): + coco_eval.summarize() + logger.info("\n" + redirect_string.getvalue()) + + # print per class AP + headers = ["class", "AP50", "mAP"] + colums = 6 + per_class_ap50s = [] + per_class_maps = [] + precisions = coco_eval.eval["precision"] + # dimension of precisions: [TxRxKxAxM] + # precision has dims (iou, recall, cls, area range, max dets) + assert len(self.class_names) == precisions.shape[2] + + for idx, name in enumerate(self.class_names): + # area range index 0: all area ranges + # max dets index -1: typically 100 per image + precision_50 = precisions[0, :, idx, 0, -1] + precision_50 = precision_50[precision_50 > -1] + ap50 = np.mean(precision_50) if precision_50.size else float("nan") + per_class_ap50s.append(float(ap50 * 100)) + + precision = precisions[:, :, idx, 0, -1] + precision = precision[precision > -1] + ap = np.mean(precision) if precision.size else float("nan") + per_class_maps.append(float(ap * 100)) + + num_cols = min(colums, len(self.class_names) * len(headers)) + flatten_results = [] + for name, ap50, mAP in zip(self.class_names, per_class_ap50s, per_class_maps): + flatten_results += [name, ap50, mAP] + + row_pair = itertools.zip_longest( + *[flatten_results[i::num_cols] for i in range(num_cols)] + ) + table_headers = headers * (num_cols // len(headers)) + table = tabulate( + row_pair, + tablefmt="pipe", + floatfmt=".1f", + headers=table_headers, + numalign="left", + ) + logger.info("\n" + table) + + aps = coco_eval.stats[:6] + eval_results = {} + for k, v in zip(self.metric_names, aps): + eval_results[k] = v + return eval_results diff --git a/nanodet/model/arch/__init__.py b/nanodet/model/arch/__init__.py new file mode 100644 index 0000000..c15509b --- /dev/null +++ b/nanodet/model/arch/__init__.py @@ -0,0 +1,42 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import warnings + +from .nanodet_plus import NanoDetPlus +from .one_stage_detector import OneStageDetector + + +def build_model(model_cfg): + model_cfg = copy.deepcopy(model_cfg) + name = model_cfg.arch.pop("name") + if name == "GFL": + warnings.warn( + "Model architecture name is changed to 'OneStageDetector'. " + "The name 'GFL' is deprecated, please change the model->arch->name " + "in your YAML config file to OneStageDetector." + ) + model = OneStageDetector( + model_cfg.arch.backbone, model_cfg.arch.fpn, model_cfg.arch.head + ) + elif name == "OneStageDetector": + model = OneStageDetector( + model_cfg.arch.backbone, model_cfg.arch.fpn, model_cfg.arch.head + ) + elif name == "NanoDetPlus": + model = NanoDetPlus(**model_cfg.arch) + else: + raise NotImplementedError + return model diff --git a/nanodet/model/arch/nanodet_plus.py b/nanodet/model/arch/nanodet_plus.py new file mode 100644 index 0000000..0de099d --- /dev/null +++ b/nanodet/model/arch/nanodet_plus.py @@ -0,0 +1,57 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy + +import torch + +from ..head import build_head +from .one_stage_detector import OneStageDetector + + +class NanoDetPlus(OneStageDetector): + def __init__( + self, + backbone, + fpn, + aux_head, + head, + detach_epoch=0, + ): + super(NanoDetPlus, self).__init__( + backbone_cfg=backbone, fpn_cfg=fpn, head_cfg=head + ) + self.aux_fpn = copy.deepcopy(self.fpn) + self.aux_head = build_head(aux_head) + self.detach_epoch = detach_epoch + + def forward_train(self, gt_meta): + img = gt_meta["img"] + feat = self.backbone(img) + fpn_feat = self.fpn(feat) + if self.epoch >= self.detach_epoch: + aux_fpn_feat = self.aux_fpn([f.detach() for f in feat]) + dual_fpn_feat = ( + torch.cat([f.detach(), aux_f], dim=1) + for f, aux_f in zip(fpn_feat, aux_fpn_feat) + ) + else: + aux_fpn_feat = self.aux_fpn(feat) + dual_fpn_feat = ( + torch.cat([f, aux_f], dim=1) for f, aux_f in zip(fpn_feat, aux_fpn_feat) + ) + head_out = self.head(fpn_feat) + aux_head_out = self.aux_head(dual_fpn_feat) + loss, loss_states = self.head.loss(head_out, gt_meta, aux_preds=aux_head_out) + return head_out, loss, loss_states diff --git a/nanodet/model/arch/one_stage_detector.py b/nanodet/model/arch/one_stage_detector.py new file mode 100644 index 0000000..e791d9f --- /dev/null +++ b/nanodet/model/arch/one_stage_detector.py @@ -0,0 +1,68 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import time + +import torch +import torch.nn as nn + +from ..backbone import build_backbone +from ..fpn import build_fpn +from ..head import build_head + + +class OneStageDetector(nn.Module): + def __init__( + self, + backbone_cfg, + fpn_cfg=None, + head_cfg=None, + ): + super(OneStageDetector, self).__init__() + self.backbone = build_backbone(backbone_cfg) + if fpn_cfg is not None: + self.fpn = build_fpn(fpn_cfg) + if head_cfg is not None: + self.head = build_head(head_cfg) + self.epoch = 0 + + def forward(self, x): + x = self.backbone(x) + if hasattr(self, "fpn"): + x = self.fpn(x) + if hasattr(self, "head"): + x = self.head(x) + return x + + def inference(self, meta): + with torch.no_grad(): + # torch.cuda.synchronize() + time1 = time.time() + preds = self(meta["img"]) + # torch.cuda.synchronize() + time2 = time.time() + print("forward time: {:.3f}s".format((time2 - time1)), end=" | ") + results = self.head.post_process(preds, meta) + # torch.cuda.synchronize() + print("decode time: {:.3f}s".format((time.time() - time2)), end=" | ") + return results + + def forward_train(self, gt_meta): + preds = self(gt_meta["img"]) + loss, loss_states = self.head.loss(preds, gt_meta) + + return preds, loss, loss_states + + def set_epoch(self, epoch): + self.epoch = epoch diff --git a/nanodet/model/backbone/__init__.py b/nanodet/model/backbone/__init__.py new file mode 100644 index 0000000..e66cdff --- /dev/null +++ b/nanodet/model/backbone/__init__.py @@ -0,0 +1,47 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy + +from .custom_csp import CustomCspNet +from .efficientnet_lite import EfficientNetLite +from .ghostnet import GhostNet +from .mobilenetv2 import MobileNetV2 +from .repvgg import RepVGG +from .resnet import ResNet +from .shufflenetv2 import ShuffleNetV2 +from .timm_wrapper import TIMMWrapper + + +def build_backbone(cfg): + backbone_cfg = copy.deepcopy(cfg) + name = backbone_cfg.pop("name") + if name == "ResNet": + return ResNet(**backbone_cfg) + elif name == "ShuffleNetV2": + return ShuffleNetV2(**backbone_cfg) + elif name == "GhostNet": + return GhostNet(**backbone_cfg) + elif name == "MobileNetV2": + return MobileNetV2(**backbone_cfg) + elif name == "EfficientNetLite": + return EfficientNetLite(**backbone_cfg) + elif name == "CustomCspNet": + return CustomCspNet(**backbone_cfg) + elif name == "RepVGG": + return RepVGG(**backbone_cfg) + elif name == "TIMMWrapper": + return TIMMWrapper(**backbone_cfg) + else: + raise NotImplementedError diff --git a/nanodet/model/backbone/custom_csp.py b/nanodet/model/backbone/custom_csp.py new file mode 100644 index 0000000..441d149 --- /dev/null +++ b/nanodet/model/backbone/custom_csp.py @@ -0,0 +1,168 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch +import torch.nn as nn + +from ..module.conv import ConvModule + + +class TinyResBlock(nn.Module): + def __init__( + self, in_channels, kernel_size, norm_cfg, activation, res_type="concat" + ): + super(TinyResBlock, self).__init__() + assert in_channels % 2 == 0 + assert res_type in ["concat", "add"] + self.res_type = res_type + self.in_conv = ConvModule( + in_channels, + in_channels // 2, + kernel_size, + padding=(kernel_size - 1) // 2, + norm_cfg=norm_cfg, + activation=activation, + ) + self.mid_conv = ConvModule( + in_channels // 2, + in_channels // 2, + kernel_size, + padding=(kernel_size - 1) // 2, + norm_cfg=norm_cfg, + activation=activation, + ) + if res_type == "add": + self.out_conv = ConvModule( + in_channels // 2, + in_channels, + kernel_size, + padding=(kernel_size - 1) // 2, + norm_cfg=norm_cfg, + activation=activation, + ) + + def forward(self, x): + x = self.in_conv(x) + x1 = self.mid_conv(x) + if self.res_type == "add": + return self.out_conv(x + x1) + else: + return torch.cat((x1, x), dim=1) + + +class CspBlock(nn.Module): + def __init__( + self, + in_channels, + num_res, + kernel_size=3, + stride=0, + norm_cfg=dict(type="BN", requires_grad=True), + activation="LeakyReLU", + ): + super(CspBlock, self).__init__() + assert in_channels % 2 == 0 + self.in_conv = ConvModule( + in_channels, + in_channels, + kernel_size, + stride, + padding=(kernel_size - 1) // 2, + norm_cfg=norm_cfg, + activation=activation, + ) + res_blocks = [] + for i in range(num_res): + res_block = TinyResBlock(in_channels, kernel_size, norm_cfg, activation) + res_blocks.append(res_block) + self.res_blocks = nn.Sequential(*res_blocks) + self.res_out_conv = ConvModule( + in_channels, + in_channels, + kernel_size, + padding=(kernel_size - 1) // 2, + norm_cfg=norm_cfg, + activation=activation, + ) + + def forward(self, x): + x = self.in_conv(x) + x1 = self.res_blocks(x) + x1 = self.res_out_conv(x1) + out = torch.cat((x1, x), dim=1) + return out + + +class CustomCspNet(nn.Module): + def __init__( + self, + net_cfg, + out_stages, + norm_cfg=dict(type="BN", requires_grad=True), + activation="LeakyReLU", + ): + super(CustomCspNet, self).__init__() + assert isinstance(net_cfg, list) + assert set(out_stages).issubset(i for i in range(len(net_cfg))) + self.out_stages = out_stages + self.activation = activation + self.stages = nn.ModuleList() + for stage_cfg in net_cfg: + if stage_cfg[0] == "Conv": + in_channels, out_channels, kernel_size, stride = stage_cfg[1:] + stage = ConvModule( + in_channels, + out_channels, + kernel_size, + stride, + padding=(kernel_size - 1) // 2, + norm_cfg=norm_cfg, + activation=activation, + ) + elif stage_cfg[0] == "CspBlock": + in_channels, num_res, kernel_size, stride = stage_cfg[1:] + stage = CspBlock( + in_channels, num_res, kernel_size, stride, norm_cfg, activation + ) + elif stage_cfg[0] == "MaxPool": + kernel_size, stride = stage_cfg[1:] + stage = nn.MaxPool2d( + kernel_size, stride, padding=(kernel_size - 1) // 2 + ) + else: + raise ModuleNotFoundError + self.stages.append(stage) + self._init_weight() + + def forward(self, x): + output = [] + for i, stage in enumerate(self.stages): + x = stage(x) + if i in self.out_stages: + output.append(x) + return tuple(output) + + def _init_weight(self): + for m in self.modules(): + if self.activation == "LeakyReLU": + nonlinearity = "leaky_relu" + else: + nonlinearity = "relu" + if isinstance(m, nn.Conv2d): + nn.init.kaiming_normal_( + m.weight, mode="fan_out", nonlinearity=nonlinearity + ) + elif isinstance(m, nn.BatchNorm2d): + m.weight.data.fill_(1) + m.bias.data.zero_() diff --git a/nanodet/model/backbone/efficientnet_lite.py b/nanodet/model/backbone/efficientnet_lite.py new file mode 100644 index 0000000..090ab7c --- /dev/null +++ b/nanodet/model/backbone/efficientnet_lite.py @@ -0,0 +1,287 @@ +import math + +import torch +import torch.functional as F +import torch.utils.model_zoo as model_zoo +from torch import nn + +from ..module.activation import act_layers + +efficientnet_lite_params = { + # width_coefficient, depth_coefficient, image_size, dropout_rate + "efficientnet_lite0": [1.0, 1.0, 224, 0.2], + "efficientnet_lite1": [1.0, 1.1, 240, 0.2], + "efficientnet_lite2": [1.1, 1.2, 260, 0.3], + "efficientnet_lite3": [1.2, 1.4, 280, 0.3], + "efficientnet_lite4": [1.4, 1.8, 300, 0.3], +} + +model_urls = { + "efficientnet_lite0": "https://github.com/RangiLyu/EfficientNet-Lite/releases/download/v1.0/efficientnet_lite0.pth", # noqa: E501 + "efficientnet_lite1": "https://github.com/RangiLyu/EfficientNet-Lite/releases/download/v1.0/efficientnet_lite1.pth", # noqa: E501 + "efficientnet_lite2": "https://github.com/RangiLyu/EfficientNet-Lite/releases/download/v1.0/efficientnet_lite2.pth", # noqa: E501 + "efficientnet_lite3": "https://github.com/RangiLyu/EfficientNet-Lite/releases/download/v1.0/efficientnet_lite3.pth", # noqa: E501 + "efficientnet_lite4": "https://github.com/RangiLyu/EfficientNet-Lite/releases/download/v1.0/efficientnet_lite4.pth", # noqa: E501 +} + + +def round_filters(filters, multiplier, divisor=8, min_width=None): + """Calculate and round number of filters based on width multiplier.""" + if not multiplier: + return filters + filters *= multiplier + min_width = min_width or divisor + new_filters = max(min_width, int(filters + divisor / 2) // divisor * divisor) + # Make sure that round down does not go down by more than 10%. + if new_filters < 0.9 * filters: + new_filters += divisor + return int(new_filters) + + +def round_repeats(repeats, multiplier): + """Round number of filters based on depth multiplier.""" + if not multiplier: + return repeats + return int(math.ceil(multiplier * repeats)) + + +def drop_connect(x, drop_connect_rate, training): + if not training: + return x + keep_prob = 1.0 - drop_connect_rate + batch_size = x.shape[0] + random_tensor = keep_prob + random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=x.dtype, device=x.device) + binary_mask = torch.floor(random_tensor) + x = (x / keep_prob) * binary_mask + return x + + +class MBConvBlock(nn.Module): + def __init__( + self, + inp, + final_oup, + k, + s, + expand_ratio, + se_ratio, + has_se=False, + activation="ReLU6", + ): + super(MBConvBlock, self).__init__() + + self._momentum = 0.01 + self._epsilon = 1e-3 + self.input_filters = inp + self.output_filters = final_oup + self.stride = s + self.expand_ratio = expand_ratio + self.has_se = has_se + self.id_skip = True # skip connection and drop connect + + # Expansion phase + oup = inp * expand_ratio # number of output channels + if expand_ratio != 1: + self._expand_conv = nn.Conv2d( + in_channels=inp, out_channels=oup, kernel_size=1, bias=False + ) + self._bn0 = nn.BatchNorm2d( + num_features=oup, momentum=self._momentum, eps=self._epsilon + ) + + # Depthwise convolution phase + self._depthwise_conv = nn.Conv2d( + in_channels=oup, + out_channels=oup, + groups=oup, # groups makes it depthwise + kernel_size=k, + padding=(k - 1) // 2, + stride=s, + bias=False, + ) + self._bn1 = nn.BatchNorm2d( + num_features=oup, momentum=self._momentum, eps=self._epsilon + ) + + # Squeeze and Excitation layer, if desired + if self.has_se: + num_squeezed_channels = max(1, int(inp * se_ratio)) + self._se_reduce = nn.Conv2d( + in_channels=oup, out_channels=num_squeezed_channels, kernel_size=1 + ) + self._se_expand = nn.Conv2d( + in_channels=num_squeezed_channels, out_channels=oup, kernel_size=1 + ) + + # Output phase + self._project_conv = nn.Conv2d( + in_channels=oup, out_channels=final_oup, kernel_size=1, bias=False + ) + self._bn2 = nn.BatchNorm2d( + num_features=final_oup, momentum=self._momentum, eps=self._epsilon + ) + self._relu = act_layers(activation) + + def forward(self, x, drop_connect_rate=None): + """ + :param x: input tensor + :param drop_connect_rate: drop connect rate (float, between 0 and 1) + :return: output of block + """ + + # Expansion and Depthwise Convolution + identity = x + if self.expand_ratio != 1: + x = self._relu(self._bn0(self._expand_conv(x))) + x = self._relu(self._bn1(self._depthwise_conv(x))) + + # Squeeze and Excitation + if self.has_se: + x_squeezed = F.adaptive_avg_pool2d(x, 1) + x_squeezed = self._se_expand(self._relu(self._se_reduce(x_squeezed))) + x = torch.sigmoid(x_squeezed) * x + + x = self._bn2(self._project_conv(x)) + + # Skip connection and drop connect + if ( + self.id_skip + and self.stride == 1 + and self.input_filters == self.output_filters + ): + if drop_connect_rate: + x = drop_connect(x, drop_connect_rate, training=self.training) + x += identity # skip connection + return x + + +class EfficientNetLite(nn.Module): + def __init__( + self, model_name, out_stages=(2, 4, 6), activation="ReLU6", pretrain=True + ): + super(EfficientNetLite, self).__init__() + assert set(out_stages).issubset(i for i in range(0, 7)) + assert model_name in efficientnet_lite_params + + self.model_name = model_name + # Batch norm parameters + momentum = 0.01 + epsilon = 1e-3 + width_multiplier, depth_multiplier, _, dropout_rate = efficientnet_lite_params[ + model_name + ] + self.drop_connect_rate = 0.2 + self.out_stages = out_stages + + mb_block_settings = [ + # repeat|kernel_size|stride|expand|input|output|se_ratio + [1, 3, 1, 1, 32, 16, 0.25], # stage0 + [2, 3, 2, 6, 16, 24, 0.25], # stage1 - 1/4 + [2, 5, 2, 6, 24, 40, 0.25], # stage2 - 1/8 + [3, 3, 2, 6, 40, 80, 0.25], # stage3 + [3, 5, 1, 6, 80, 112, 0.25], # stage4 - 1/16 + [4, 5, 2, 6, 112, 192, 0.25], # stage5 + [1, 3, 1, 6, 192, 320, 0.25], # stage6 - 1/32 + ] + + # Stem + out_channels = 32 + self.stem = nn.Sequential( + nn.Conv2d(3, out_channels, kernel_size=3, stride=2, padding=1, bias=False), + nn.BatchNorm2d(num_features=out_channels, momentum=momentum, eps=epsilon), + act_layers(activation), + ) + + # Build blocks + self.blocks = nn.ModuleList([]) + for i, stage_setting in enumerate(mb_block_settings): + stage = nn.ModuleList([]) + ( + num_repeat, + kernal_size, + stride, + expand_ratio, + input_filters, + output_filters, + se_ratio, + ) = stage_setting + # Update block input and output filters based on width multiplier. + input_filters = ( + input_filters + if i == 0 + else round_filters(input_filters, width_multiplier) + ) + output_filters = round_filters(output_filters, width_multiplier) + num_repeat = ( + num_repeat + if i == 0 or i == len(mb_block_settings) - 1 + else round_repeats(num_repeat, depth_multiplier) + ) + + # The first block needs to take care of stride and filter size increase. + stage.append( + MBConvBlock( + input_filters, + output_filters, + kernal_size, + stride, + expand_ratio, + se_ratio, + has_se=False, + ) + ) + if num_repeat > 1: + input_filters = output_filters + stride = 1 + for _ in range(num_repeat - 1): + stage.append( + MBConvBlock( + input_filters, + output_filters, + kernal_size, + stride, + expand_ratio, + se_ratio, + has_se=False, + ) + ) + + self.blocks.append(stage) + self._initialize_weights(pretrain) + + def forward(self, x): + x = self.stem(x) + output = [] + idx = 0 + for j, stage in enumerate(self.blocks): + for block in stage: + drop_connect_rate = self.drop_connect_rate + if drop_connect_rate: + drop_connect_rate *= float(idx) / len(self.blocks) + x = block(x, drop_connect_rate) + idx += 1 + if j in self.out_stages: + output.append(x) + return output + + def _initialize_weights(self, pretrain=True): + for m in self.modules(): + if isinstance(m, nn.Conv2d): + n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels + m.weight.data.normal_(0, math.sqrt(2.0 / n)) + if m.bias is not None: + m.bias.data.zero_() + elif isinstance(m, nn.BatchNorm2d): + m.weight.data.fill_(1) + m.bias.data.zero_() + if pretrain: + url = model_urls[self.model_name] + if url is not None: + pretrained_state_dict = model_zoo.load_url(url) + print("=> loading pretrained model {}".format(url)) + self.load_state_dict(pretrained_state_dict, strict=False) + + def load_pretrain(self, path): + state_dict = torch.load(path) + self.load_state_dict(state_dict, strict=True) diff --git a/nanodet/model/backbone/ghostnet.py b/nanodet/model/backbone/ghostnet.py new file mode 100644 index 0000000..06c7119 --- /dev/null +++ b/nanodet/model/backbone/ghostnet.py @@ -0,0 +1,348 @@ +""" +2020.06.09-Changed for building GhostNet +Huawei Technologies Co., Ltd. +Creates a GhostNet Model as defined in: +GhostNet: More Features from Cheap Operations By Kai Han, Yunhe Wang, +Qi Tian, Jianyuan Guo, Chunjing Xu, Chang Xu. +https://arxiv.org/abs/1911.11907 +Modified from https://github.com/d-li14/mobilenetv3.pytorch +and https://github.com/rwightman/pytorch-image-models +""" +import logging +import math +import warnings + +import torch +import torch.nn as nn +import torch.nn.functional as F + +from ..module.activation import act_layers + + +def get_url(width_mult=1.0): + if width_mult == 1.0: + return "https://raw.githubusercontent.com/huawei-noah/CV-Backbones/master/ghostnet_pytorch/models/state_dict_73.98.pth" # noqa E501 + else: + logging.info("GhostNet only has 1.0 pretrain model. ") + return None + + +def _make_divisible(v, divisor, min_value=None): + """ + This function is taken from the original tf repo. + It ensures that all layers have a channel number that is divisible by 8 + It can be seen here: + https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py + """ + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + # Make sure that round down does not go down by more than 10%. + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +def hard_sigmoid(x, inplace: bool = False): + if inplace: + return x.add_(3.0).clamp_(0.0, 6.0).div_(6.0) + else: + return F.relu6(x + 3.0) / 6.0 + + +class SqueezeExcite(nn.Module): + def __init__( + self, + in_chs, + se_ratio=0.25, + reduced_base_chs=None, + activation="ReLU", + gate_fn=hard_sigmoid, + divisor=4, + **_ + ): + super(SqueezeExcite, self).__init__() + self.gate_fn = gate_fn + reduced_chs = _make_divisible((reduced_base_chs or in_chs) * se_ratio, divisor) + self.avg_pool = nn.AdaptiveAvgPool2d(1) + self.conv_reduce = nn.Conv2d(in_chs, reduced_chs, 1, bias=True) + self.act1 = act_layers(activation) + self.conv_expand = nn.Conv2d(reduced_chs, in_chs, 1, bias=True) + + def forward(self, x): + x_se = self.avg_pool(x) + x_se = self.conv_reduce(x_se) + x_se = self.act1(x_se) + x_se = self.conv_expand(x_se) + x = x * self.gate_fn(x_se) + return x + + +class ConvBnAct(nn.Module): + def __init__(self, in_chs, out_chs, kernel_size, stride=1, activation="ReLU"): + super(ConvBnAct, self).__init__() + self.conv = nn.Conv2d( + in_chs, out_chs, kernel_size, stride, kernel_size // 2, bias=False + ) + self.bn1 = nn.BatchNorm2d(out_chs) + self.act1 = act_layers(activation) + + def forward(self, x): + x = self.conv(x) + x = self.bn1(x) + x = self.act1(x) + return x + + +class GhostModule(nn.Module): + def __init__( + self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, activation="ReLU" + ): + super(GhostModule, self).__init__() + self.oup = oup + init_channels = math.ceil(oup / ratio) + new_channels = init_channels * (ratio - 1) + + self.primary_conv = nn.Sequential( + nn.Conv2d( + inp, init_channels, kernel_size, stride, kernel_size // 2, bias=False + ), + nn.BatchNorm2d(init_channels), + act_layers(activation) if activation else nn.Sequential(), + ) + + self.cheap_operation = nn.Sequential( + nn.Conv2d( + init_channels, + new_channels, + dw_size, + 1, + dw_size // 2, + groups=init_channels, + bias=False, + ), + nn.BatchNorm2d(new_channels), + act_layers(activation) if activation else nn.Sequential(), + ) + + def forward(self, x): + x1 = self.primary_conv(x) + x2 = self.cheap_operation(x1) + out = torch.cat([x1, x2], dim=1) + return out + + +class GhostBottleneck(nn.Module): + """Ghost bottleneck w/ optional SE""" + + def __init__( + self, + in_chs, + mid_chs, + out_chs, + dw_kernel_size=3, + stride=1, + activation="ReLU", + se_ratio=0.0, + ): + super(GhostBottleneck, self).__init__() + has_se = se_ratio is not None and se_ratio > 0.0 + self.stride = stride + + # Point-wise expansion + self.ghost1 = GhostModule(in_chs, mid_chs, activation=activation) + + # Depth-wise convolution + if self.stride > 1: + self.conv_dw = nn.Conv2d( + mid_chs, + mid_chs, + dw_kernel_size, + stride=stride, + padding=(dw_kernel_size - 1) // 2, + groups=mid_chs, + bias=False, + ) + self.bn_dw = nn.BatchNorm2d(mid_chs) + + # Squeeze-and-excitation + if has_se: + self.se = SqueezeExcite(mid_chs, se_ratio=se_ratio) + else: + self.se = None + + # Point-wise linear projection + self.ghost2 = GhostModule(mid_chs, out_chs, activation=None) + + # shortcut + if in_chs == out_chs and self.stride == 1: + self.shortcut = nn.Sequential() + else: + self.shortcut = nn.Sequential( + nn.Conv2d( + in_chs, + in_chs, + dw_kernel_size, + stride=stride, + padding=(dw_kernel_size - 1) // 2, + groups=in_chs, + bias=False, + ), + nn.BatchNorm2d(in_chs), + nn.Conv2d(in_chs, out_chs, 1, stride=1, padding=0, bias=False), + nn.BatchNorm2d(out_chs), + ) + + def forward(self, x): + residual = x + + # 1st ghost bottleneck + x = self.ghost1(x) + + # Depth-wise convolution + if self.stride > 1: + x = self.conv_dw(x) + x = self.bn_dw(x) + + # Squeeze-and-excitation + if self.se is not None: + x = self.se(x) + + # 2nd ghost bottleneck + x = self.ghost2(x) + + x += self.shortcut(residual) + return x + + +class GhostNet(nn.Module): + def __init__( + self, + width_mult=1.0, + out_stages=(4, 6, 9), + activation="ReLU", + pretrain=True, + act=None, + ): + super(GhostNet, self).__init__() + assert set(out_stages).issubset(i for i in range(10)) + self.width_mult = width_mult + self.out_stages = out_stages + # setting of inverted residual blocks + self.cfgs = [ + # k, t, c, SE, s + # stage1 + [[3, 16, 16, 0, 1]], # 0 + # stage2 + [[3, 48, 24, 0, 2]], # 1 + [[3, 72, 24, 0, 1]], # 2 1/4 + # stage3 + [[5, 72, 40, 0.25, 2]], # 3 + [[5, 120, 40, 0.25, 1]], # 4 1/8 + # stage4 + [[3, 240, 80, 0, 2]], # 5 + [ + [3, 200, 80, 0, 1], + [3, 184, 80, 0, 1], + [3, 184, 80, 0, 1], + [3, 480, 112, 0.25, 1], + [3, 672, 112, 0.25, 1], + ], # 6 1/16 + # stage5 + [[5, 672, 160, 0.25, 2]], # 7 + [ + [5, 960, 160, 0, 1], + [5, 960, 160, 0.25, 1], + [5, 960, 160, 0, 1], + [5, 960, 160, 0.25, 1], + ], # 8 + ] + # ------conv+bn+act----------# 9 1/32 + + self.activation = activation + if act is not None: + warnings.warn( + "Warning! act argument has been deprecated, " "use activation instead!" + ) + self.activation = act + + # building first layer + output_channel = _make_divisible(16 * width_mult, 4) + self.conv_stem = nn.Conv2d(3, output_channel, 3, 2, 1, bias=False) + self.bn1 = nn.BatchNorm2d(output_channel) + self.act1 = act_layers(self.activation) + input_channel = output_channel + + # building inverted residual blocks + stages = [] + block = GhostBottleneck + for cfg in self.cfgs: + layers = [] + for k, exp_size, c, se_ratio, s in cfg: + output_channel = _make_divisible(c * width_mult, 4) + hidden_channel = _make_divisible(exp_size * width_mult, 4) + layers.append( + block( + input_channel, + hidden_channel, + output_channel, + k, + s, + activation=self.activation, + se_ratio=se_ratio, + ) + ) + input_channel = output_channel + stages.append(nn.Sequential(*layers)) + + output_channel = _make_divisible(exp_size * width_mult, 4) + stages.append( + nn.Sequential( + ConvBnAct(input_channel, output_channel, 1, activation=self.activation) + ) + ) # 9 + + self.blocks = nn.Sequential(*stages) + + self._initialize_weights(pretrain) + + def forward(self, x): + x = self.conv_stem(x) + x = self.bn1(x) + x = self.act1(x) + output = [] + for i in range(10): + x = self.blocks[i](x) + if i in self.out_stages: + output.append(x) + return tuple(output) + + def _initialize_weights(self, pretrain=True): + print("init weights...") + for name, m in self.named_modules(): + if isinstance(m, nn.Conv2d): + if "conv_stem" in name: + nn.init.normal_(m.weight, 0, 0.01) + else: + nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1]) + if m.bias is not None: + nn.init.constant_(m.bias, 0) + elif isinstance(m, nn.BatchNorm2d): + nn.init.constant_(m.weight, 1) + if m.bias is not None: + nn.init.constant_(m.bias, 0.0001) + nn.init.constant_(m.running_mean, 0) + elif isinstance(m, nn.BatchNorm1d): + nn.init.constant_(m.weight, 1) + if m.bias is not None: + nn.init.constant_(m.bias, 0.0001) + nn.init.constant_(m.running_mean, 0) + elif isinstance(m, nn.Linear): + nn.init.normal_(m.weight, 0, 0.01) + if m.bias is not None: + nn.init.constant_(m.bias, 0) + if pretrain: + url = get_url(self.width_mult) + if url is not None: + state_dict = torch.hub.load_state_dict_from_url(url, progress=True) + self.load_state_dict(state_dict, strict=False) diff --git a/nanodet/model/backbone/mobilenetv2.py b/nanodet/model/backbone/mobilenetv2.py new file mode 100644 index 0000000..11d7978 --- /dev/null +++ b/nanodet/model/backbone/mobilenetv2.py @@ -0,0 +1,176 @@ +from __future__ import absolute_import, division, print_function + +import warnings + +import torch.nn as nn + +from ..module.activation import act_layers + + +class ConvBNReLU(nn.Sequential): + def __init__( + self, + in_planes, + out_planes, + kernel_size=3, + stride=1, + groups=1, + activation="ReLU", + ): + padding = (kernel_size - 1) // 2 + super(ConvBNReLU, self).__init__( + nn.Conv2d( + in_planes, + out_planes, + kernel_size, + stride, + padding, + groups=groups, + bias=False, + ), + nn.BatchNorm2d(out_planes), + act_layers(activation), + ) + + +class InvertedResidual(nn.Module): + def __init__(self, inp, oup, stride, expand_ratio, activation="ReLU"): + super(InvertedResidual, self).__init__() + self.stride = stride + assert stride in [1, 2] + + hidden_dim = int(round(inp * expand_ratio)) + self.use_res_connect = self.stride == 1 and inp == oup + + layers = [] + if expand_ratio != 1: + # pw + layers.append( + ConvBNReLU(inp, hidden_dim, kernel_size=1, activation=activation) + ) + layers.extend( + [ + # dw + ConvBNReLU( + hidden_dim, + hidden_dim, + stride=stride, + groups=hidden_dim, + activation=activation, + ), + # pw-linear + nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), + nn.BatchNorm2d(oup), + ] + ) + self.conv = nn.Sequential(*layers) + + def forward(self, x): + if self.use_res_connect: + return x + self.conv(x) + else: + return self.conv(x) + + +class MobileNetV2(nn.Module): + def __init__( + self, + width_mult=1.0, + out_stages=(1, 2, 4, 6), + last_channel=1280, + activation="ReLU", + act=None, + ): + super(MobileNetV2, self).__init__() + # TODO: support load torchvison pretrained weight + assert set(out_stages).issubset(i for i in range(7)) + self.width_mult = width_mult + self.out_stages = out_stages + input_channel = 32 + self.last_channel = last_channel + self.activation = activation + if act is not None: + warnings.warn( + "Warning! act argument has been deprecated, " "use activation instead!" + ) + self.activation = act + self.interverted_residual_setting = [ + # t, c, n, s + [1, 16, 1, 1], + [6, 24, 2, 2], + [6, 32, 3, 2], + [6, 64, 4, 2], + [6, 96, 3, 1], + [6, 160, 3, 2], + [6, 320, 1, 1], + ] + + # building first layer + self.input_channel = int(input_channel * width_mult) + self.first_layer = ConvBNReLU( + 3, self.input_channel, stride=2, activation=self.activation + ) + # building inverted residual blocks + for i in range(7): + name = "stage{}".format(i) + setattr(self, name, self.build_mobilenet_stage(stage_num=i)) + + self._initialize_weights() + + def build_mobilenet_stage(self, stage_num): + stage = [] + t, c, n, s = self.interverted_residual_setting[stage_num] + output_channel = int(c * self.width_mult) + for i in range(n): + if i == 0: + stage.append( + InvertedResidual( + self.input_channel, + output_channel, + s, + expand_ratio=t, + activation=self.activation, + ) + ) + else: + stage.append( + InvertedResidual( + self.input_channel, + output_channel, + 1, + expand_ratio=t, + activation=self.activation, + ) + ) + self.input_channel = output_channel + if stage_num == 6: + last_layer = ConvBNReLU( + self.input_channel, + self.last_channel, + kernel_size=1, + activation=self.activation, + ) + stage.append(last_layer) + stage = nn.Sequential(*stage) + return stage + + def forward(self, x): + x = self.first_layer(x) + output = [] + for i in range(0, 7): + stage = getattr(self, "stage{}".format(i)) + x = stage(x) + if i in self.out_stages: + output.append(x) + + return tuple(output) + + def _initialize_weights(self): + for m in self.modules(): + if isinstance(m, nn.Conv2d): + nn.init.normal_(m.weight, std=0.001) + if m.bias is not None: + m.bias.data.zero_() + elif isinstance(m, nn.BatchNorm2d): + m.weight.data.fill_(1) + m.bias.data.zero_() diff --git a/nanodet/model/backbone/repvgg.py b/nanodet/model/backbone/repvgg.py new file mode 100644 index 0000000..8ae9634 --- /dev/null +++ b/nanodet/model/backbone/repvgg.py @@ -0,0 +1,234 @@ +""" +@article{ding2101repvgg, + title={RepVGG: Making VGG-style ConvNets Great Again}, + author={Ding, Xiaohan and Zhang, Xiangyu and Ma, Ningning and Han, + Jungong and Ding, Guiguang and Sun, Jian}, + journal={arXiv preprint arXiv:2101.03697}} +RepVGG Backbone from paper RepVGG: Making VGG-style ConvNets Great Again +Code from https://github.com/DingXiaoH/RepVGG +""" + +import numpy as np +import torch +import torch.nn as nn + +from nanodet.model.module.conv import RepVGGConvModule + +optional_groupwise_layers = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26] +g2_map = {layer: 2 for layer in optional_groupwise_layers} +g4_map = {layer: 4 for layer in optional_groupwise_layers} + +model_param = { + "RepVGG-A0": dict( + num_blocks=[2, 4, 14, 1], + width_multiplier=[0.75, 0.75, 0.75, 2.5], + override_groups_map=None, + ), + "RepVGG-A1": dict( + num_blocks=[2, 4, 14, 1], + width_multiplier=[1, 1, 1, 2.5], + override_groups_map=None, + ), + "RepVGG-A2": dict( + num_blocks=[2, 4, 14, 1], + width_multiplier=[1.5, 1.5, 1.5, 2.75], + override_groups_map=None, + ), + "RepVGG-B0": dict( + num_blocks=[4, 6, 16, 1], + width_multiplier=[1, 1, 1, 2.5], + override_groups_map=None, + ), + "RepVGG-B1": dict( + num_blocks=[4, 6, 16, 1], + width_multiplier=[2, 2, 2, 4], + override_groups_map=None, + ), + "RepVGG-B1g2": dict( + num_blocks=[4, 6, 16, 1], + width_multiplier=[2, 2, 2, 4], + override_groups_map=g2_map, + ), + "RepVGG-B1g4": dict( + num_blocks=[4, 6, 16, 1], + width_multiplier=[2, 2, 2, 4], + override_groups_map=g4_map, + ), + "RepVGG-B2": dict( + num_blocks=[4, 6, 16, 1], + width_multiplier=[2.5, 2.5, 2.5, 5], + override_groups_map=None, + ), + "RepVGG-B2g2": dict( + num_blocks=[4, 6, 16, 1], + width_multiplier=[2.5, 2.5, 2.5, 5], + override_groups_map=g2_map, + ), + "RepVGG-B2g4": dict( + num_blocks=[4, 6, 16, 1], + width_multiplier=[2.5, 2.5, 2.5, 5], + override_groups_map=g4_map, + ), + "RepVGG-B3": dict( + num_blocks=[4, 6, 16, 1], + width_multiplier=[3, 3, 3, 5], + override_groups_map=None, + ), + "RepVGG-B3g2": dict( + num_blocks=[4, 6, 16, 1], + width_multiplier=[3, 3, 3, 5], + override_groups_map=g2_map, + ), + "RepVGG-B3g4": dict( + num_blocks=[4, 6, 16, 1], + width_multiplier=[3, 3, 3, 5], + override_groups_map=g4_map, + ), +} + + +def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1): + result = nn.Sequential() + result.add_module( + "conv", + nn.Conv2d( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + groups=groups, + bias=False, + ), + ) + result.add_module("bn", nn.BatchNorm2d(num_features=out_channels)) + return result + + +class RepVGG(nn.Module): + def __init__( + self, + arch, + out_stages=(1, 2, 3, 4), + activation="ReLU", + deploy=False, + last_channel=None, + ): + super(RepVGG, self).__init__() + # TODO: Update code to Xiaohan's repo + model_name = "RepVGG-" + arch + assert model_name in model_param + assert set(out_stages).issubset((1, 2, 3, 4)) + num_blocks = model_param[model_name]["num_blocks"] + width_multiplier = model_param[model_name]["width_multiplier"] + assert len(width_multiplier) == 4 + self.out_stages = out_stages + self.activation = activation + self.deploy = deploy + self.override_groups_map = ( + model_param[model_name]["override_groups_map"] or dict() + ) + + assert 0 not in self.override_groups_map + + self.in_planes = min(64, int(64 * width_multiplier[0])) + + self.stage0 = RepVGGConvModule( + in_channels=3, + out_channels=self.in_planes, + kernel_size=3, + stride=2, + padding=1, + activation=activation, + deploy=self.deploy, + ) + self.cur_layer_idx = 1 + self.stage1 = self._make_stage( + int(64 * width_multiplier[0]), num_blocks[0], stride=2 + ) + self.stage2 = self._make_stage( + int(128 * width_multiplier[1]), num_blocks[1], stride=2 + ) + self.stage3 = self._make_stage( + int(256 * width_multiplier[2]), num_blocks[2], stride=2 + ) + out_planes = last_channel if last_channel else int(512 * width_multiplier[3]) + self.stage4 = self._make_stage(out_planes, num_blocks[3], stride=2) + + def _make_stage(self, planes, num_blocks, stride): + strides = [stride] + [1] * (num_blocks - 1) + blocks = [] + for stride in strides: + cur_groups = self.override_groups_map.get(self.cur_layer_idx, 1) + blocks.append( + RepVGGConvModule( + in_channels=self.in_planes, + out_channels=planes, + kernel_size=3, + stride=stride, + padding=1, + groups=cur_groups, + activation=self.activation, + deploy=self.deploy, + ) + ) + self.in_planes = planes + self.cur_layer_idx += 1 + return nn.Sequential(*blocks) + + def forward(self, x): + x = self.stage0(x) + output = [] + for i in range(1, 5): + stage = getattr(self, "stage{}".format(i)) + x = stage(x) + if i in self.out_stages: + output.append(x) + return tuple(output) + + +def repvgg_model_convert(model, deploy_model, save_path=None): + """ + Examples: + >>> train_model = RepVGG(arch='A0', deploy=False) + >>> deploy_model = RepVGG(arch='A0', deploy=True) + >>> deploy_model = repvgg_model_convert( + >>> train_model, deploy_model, save_path='repvgg_deploy.pth') + """ + converted_weights = {} + for name, module in model.named_modules(): + if hasattr(module, "repvgg_convert"): + kernel, bias = module.repvgg_convert() + converted_weights[name + ".rbr_reparam.weight"] = kernel + converted_weights[name + ".rbr_reparam.bias"] = bias + elif isinstance(module, torch.nn.Linear): + converted_weights[name + ".weight"] = module.weight.detach().cpu().numpy() + converted_weights[name + ".bias"] = module.bias.detach().cpu().numpy() + del model + + for name, param in deploy_model.named_parameters(): + print("deploy param: ", name, param.size(), np.mean(converted_weights[name])) + param.data = torch.from_numpy(converted_weights[name]).float() + + if save_path is not None: + torch.save(deploy_model.state_dict(), save_path) + + return deploy_model + + +def repvgg_det_model_convert(model, deploy_model): + converted_weights = {} + deploy_model.load_state_dict(model.state_dict(), strict=False) + for name, module in model.backbone.named_modules(): + if hasattr(module, "repvgg_convert"): + kernel, bias = module.repvgg_convert() + converted_weights[name + ".rbr_reparam.weight"] = kernel + converted_weights[name + ".rbr_reparam.bias"] = bias + elif isinstance(module, torch.nn.Linear): + converted_weights[name + ".weight"] = module.weight.detach().cpu().numpy() + converted_weights[name + ".bias"] = module.bias.detach().cpu().numpy() + del model + for name, param in deploy_model.backbone.named_parameters(): + print("deploy param: ", name, param.size(), np.mean(converted_weights[name])) + param.data = torch.from_numpy(converted_weights[name]).float() + return deploy_model diff --git a/nanodet/model/backbone/resnet.py b/nanodet/model/backbone/resnet.py new file mode 100644 index 0000000..0a863c9 --- /dev/null +++ b/nanodet/model/backbone/resnet.py @@ -0,0 +1,196 @@ +from __future__ import absolute_import, division, print_function + +import torch.nn as nn +import torch.utils.model_zoo as model_zoo + +from ..module.activation import act_layers + +model_urls = { + "resnet18": "https://download.pytorch.org/models/resnet18-5c106cde.pth", + "resnet34": "https://download.pytorch.org/models/resnet34-333f7ec4.pth", + "resnet50": "https://download.pytorch.org/models/resnet50-19c8e357.pth", + "resnet101": "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth", + "resnet152": "https://download.pytorch.org/models/resnet152-b121ed2d.pth", +} + + +def conv3x3(in_planes, out_planes, stride=1): + """3x3 convolution with padding""" + return nn.Conv2d( + in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False + ) + + +class BasicBlock(nn.Module): + expansion = 1 + + def __init__(self, inplanes, planes, stride=1, downsample=None, activation="ReLU"): + super(BasicBlock, self).__init__() + self.conv1 = conv3x3(inplanes, planes, stride) + self.bn1 = nn.BatchNorm2d(planes) + self.act = act_layers(activation) + self.conv2 = conv3x3(planes, planes) + self.bn2 = nn.BatchNorm2d(planes) + self.downsample = downsample + self.stride = stride + + def forward(self, x): + residual = x + + out = self.conv1(x) + out = self.bn1(out) + out = self.act(out) + + out = self.conv2(out) + out = self.bn2(out) + + if self.downsample is not None: + residual = self.downsample(x) + + out += residual + out = self.act(out) + + return out + + +class Bottleneck(nn.Module): + expansion = 4 + + def __init__(self, inplanes, planes, stride=1, downsample=None, activation="ReLU"): + super(Bottleneck, self).__init__() + self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) + self.bn1 = nn.BatchNorm2d(planes) + self.conv2 = nn.Conv2d( + planes, planes, kernel_size=3, stride=stride, padding=1, bias=False + ) + self.bn2 = nn.BatchNorm2d(planes) + self.conv3 = nn.Conv2d( + planes, planes * self.expansion, kernel_size=1, bias=False + ) + self.bn3 = nn.BatchNorm2d(planes * self.expansion) + self.act = act_layers(activation) + self.downsample = downsample + self.stride = stride + + def forward(self, x): + residual = x + + out = self.conv1(x) + out = self.bn1(out) + out = self.act(out) + + out = self.conv2(out) + out = self.bn2(out) + out = self.act(out) + + out = self.conv3(out) + out = self.bn3(out) + + if self.downsample is not None: + residual = self.downsample(x) + + out += residual + out = self.act(out) + + return out + + +def fill_fc_weights(layers): + for m in layers.modules(): + if isinstance(m, nn.Conv2d): + nn.init.normal_(m.weight, std=0.001) + # torch.nn.init.kaiming_normal_(m.weight.data, nonlinearity='relu') + # torch.nn.init.xavier_normal_(m.weight.data) + if m.bias is not None: + nn.init.constant_(m.bias, 0) + + +class ResNet(nn.Module): + resnet_spec = { + 18: (BasicBlock, [2, 2, 2, 2]), + 34: (BasicBlock, [3, 4, 6, 3]), + 50: (Bottleneck, [3, 4, 6, 3]), + 101: (Bottleneck, [3, 4, 23, 3]), + 152: (Bottleneck, [3, 8, 36, 3]), + } + + def __init__( + self, depth, out_stages=(1, 2, 3, 4), activation="ReLU", pretrain=True + ): + super(ResNet, self).__init__() + if depth not in self.resnet_spec: + raise KeyError("invalid resnet depth {}".format(depth)) + assert set(out_stages).issubset((1, 2, 3, 4)) + self.activation = activation + block, layers = self.resnet_spec[depth] + self.depth = depth + self.inplanes = 64 + self.out_stages = out_stages + + self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False) + self.bn1 = nn.BatchNorm2d(64) + self.act = act_layers(self.activation) + self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) + self.layer1 = self._make_layer(block, 64, layers[0]) + self.layer2 = self._make_layer(block, 128, layers[1], stride=2) + self.layer3 = self._make_layer(block, 256, layers[2], stride=2) + self.layer4 = self._make_layer(block, 512, layers[3], stride=2) + self.init_weights(pretrain=pretrain) + + def _make_layer(self, block, planes, blocks, stride=1): + downsample = None + if stride != 1 or self.inplanes != planes * block.expansion: + downsample = nn.Sequential( + nn.Conv2d( + self.inplanes, + planes * block.expansion, + kernel_size=1, + stride=stride, + bias=False, + ), + nn.BatchNorm2d(planes * block.expansion), + ) + + layers = [] + layers.append( + block(self.inplanes, planes, stride, downsample, activation=self.activation) + ) + self.inplanes = planes * block.expansion + for i in range(1, blocks): + layers.append(block(self.inplanes, planes, activation=self.activation)) + + return nn.Sequential(*layers) + + def forward(self, x): + x = self.conv1(x) + x = self.bn1(x) + x = self.act(x) + x = self.maxpool(x) + output = [] + for i in range(1, 5): + res_layer = getattr(self, "layer{}".format(i)) + x = res_layer(x) + if i in self.out_stages: + output.append(x) + + return tuple(output) + + def init_weights(self, pretrain=True): + if pretrain: + url = model_urls["resnet{}".format(self.depth)] + pretrained_state_dict = model_zoo.load_url(url) + print("=> loading pretrained model {}".format(url)) + self.load_state_dict(pretrained_state_dict, strict=False) + else: + for m in self.modules(): + if self.activation == "LeakyReLU": + nonlinearity = "leaky_relu" + else: + nonlinearity = "relu" + if isinstance(m, nn.Conv2d): + nn.init.kaiming_normal_( + m.weight, mode="fan_out", nonlinearity=nonlinearity + ) + elif isinstance(m, nn.BatchNorm2d): + m.weight.data.fill_(1) + m.bias.data.zero_() diff --git a/nanodet/model/backbone/shufflenetv2.py b/nanodet/model/backbone/shufflenetv2.py new file mode 100644 index 0000000..e821f41 --- /dev/null +++ b/nanodet/model/backbone/shufflenetv2.py @@ -0,0 +1,207 @@ +import torch +import torch.nn as nn +import torch.utils.model_zoo as model_zoo + +from ..module.activation import act_layers + +model_urls = { + "shufflenetv2_0.5x": "https://download.pytorch.org/models/shufflenetv2_x0.5-f707e7126e.pth", # noqa: E501 + "shufflenetv2_1.0x": "https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth", # noqa: E501 + "shufflenetv2_1.5x": None, + "shufflenetv2_2.0x": None, +} + + +def channel_shuffle(x, groups): + # type: (torch.Tensor, int) -> torch.Tensor + batchsize, num_channels, height, width = x.data.size() + channels_per_group = num_channels // groups + + # reshape + x = x.view(batchsize, groups, channels_per_group, height, width) + + x = torch.transpose(x, 1, 2).contiguous() + + # flatten + x = x.view(batchsize, -1, height, width) + + return x + + +class ShuffleV2Block(nn.Module): + def __init__(self, inp, oup, stride, activation="ReLU"): + super(ShuffleV2Block, self).__init__() + + if not (1 <= stride <= 3): + raise ValueError("illegal stride value") + self.stride = stride + + branch_features = oup // 2 + assert (self.stride != 1) or (inp == branch_features << 1) + + if self.stride > 1: + self.branch1 = nn.Sequential( + self.depthwise_conv( + inp, inp, kernel_size=3, stride=self.stride, padding=1 + ), + nn.BatchNorm2d(inp), + nn.Conv2d( + inp, branch_features, kernel_size=1, stride=1, padding=0, bias=False + ), + nn.BatchNorm2d(branch_features), + act_layers(activation), + ) + else: + self.branch1 = nn.Sequential() + + self.branch2 = nn.Sequential( + nn.Conv2d( + inp if (self.stride > 1) else branch_features, + branch_features, + kernel_size=1, + stride=1, + padding=0, + bias=False, + ), + nn.BatchNorm2d(branch_features), + act_layers(activation), + self.depthwise_conv( + branch_features, + branch_features, + kernel_size=3, + stride=self.stride, + padding=1, + ), + nn.BatchNorm2d(branch_features), + nn.Conv2d( + branch_features, + branch_features, + kernel_size=1, + stride=1, + padding=0, + bias=False, + ), + nn.BatchNorm2d(branch_features), + act_layers(activation), + ) + + @staticmethod + def depthwise_conv(i, o, kernel_size, stride=1, padding=0, bias=False): + return nn.Conv2d(i, o, kernel_size, stride, padding, bias=bias, groups=i) + + def forward(self, x): + if self.stride == 1: + x1, x2 = x.chunk(2, dim=1) + out = torch.cat((x1, self.branch2(x2)), dim=1) + else: + out = torch.cat((self.branch1(x), self.branch2(x)), dim=1) + + out = channel_shuffle(out, 2) + + return out + + +class ShuffleNetV2(nn.Module): + def __init__( + self, + model_size="1.5x", + out_stages=(2, 3, 4), + with_last_conv=False, + kernal_size=3, + activation="ReLU", + pretrain=True, + ): + super(ShuffleNetV2, self).__init__() + # out_stages can only be a subset of (2, 3, 4) + assert set(out_stages).issubset((2, 3, 4)) + + print("model size is ", model_size) + + self.stage_repeats = [4, 8, 4] + self.model_size = model_size + self.out_stages = out_stages + self.with_last_conv = with_last_conv + self.kernal_size = kernal_size + self.activation = activation + if model_size == "0.5x": + self._stage_out_channels = [24, 48, 96, 192, 1024] + elif model_size == "1.0x": + self._stage_out_channels = [24, 116, 232, 464, 1024] + elif model_size == "1.5x": + self._stage_out_channels = [24, 176, 352, 704, 1024] + elif model_size == "2.0x": + self._stage_out_channels = [24, 244, 488, 976, 2048] + else: + raise NotImplementedError + + # building first layer + input_channels = 3 + output_channels = self._stage_out_channels[0] + self.conv1 = nn.Sequential( + nn.Conv2d(input_channels, output_channels, 3, 2, 1, bias=False), + nn.BatchNorm2d(output_channels), + act_layers(activation), + ) + input_channels = output_channels + + self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) + + stage_names = ["stage{}".format(i) for i in [2, 3, 4]] + for name, repeats, output_channels in zip( + stage_names, self.stage_repeats, self._stage_out_channels[1:] + ): + seq = [ + ShuffleV2Block( + input_channels, output_channels, 2, activation=activation + ) + ] + for i in range(repeats - 1): + seq.append( + ShuffleV2Block( + output_channels, output_channels, 1, activation=activation + ) + ) + setattr(self, name, nn.Sequential(*seq)) + input_channels = output_channels + output_channels = self._stage_out_channels[-1] + if self.with_last_conv: + conv5 = nn.Sequential( + nn.Conv2d(input_channels, output_channels, 1, 1, 0, bias=False), + nn.BatchNorm2d(output_channels), + act_layers(activation), + ) + self.stage4.add_module("conv5", conv5) + self._initialize_weights(pretrain) + + def forward(self, x): + x = self.conv1(x) + x = self.maxpool(x) + output = [] + for i in range(2, 5): + stage = getattr(self, "stage{}".format(i)) + x = stage(x) + if i in self.out_stages: + output.append(x) + return tuple(output) + + def _initialize_weights(self, pretrain=True): + print("init weights...") + for name, m in self.named_modules(): + if isinstance(m, nn.Conv2d): + if "first" in name: + nn.init.normal_(m.weight, 0, 0.01) + else: + nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1]) + if m.bias is not None: + nn.init.constant_(m.bias, 0) + elif isinstance(m, nn.BatchNorm2d): + nn.init.constant_(m.weight, 1) + if m.bias is not None: + nn.init.constant_(m.bias, 0.0001) + nn.init.constant_(m.running_mean, 0) + if pretrain: + url = model_urls["shufflenetv2_{}".format(self.model_size)] + if url is not None: + pretrained_state_dict = model_zoo.load_url(url) + print("=> loading pretrained model {}".format(url)) + self.load_state_dict(pretrained_state_dict, strict=False) diff --git a/nanodet/model/backbone/timm_wrapper.py b/nanodet/model/backbone/timm_wrapper.py new file mode 100644 index 0000000..ccd2cd8 --- /dev/null +++ b/nanodet/model/backbone/timm_wrapper.py @@ -0,0 +1,66 @@ +# Copyright 2022 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging + +import torch.nn as nn + +logger = logging.getLogger("NanoDet") + + +class TIMMWrapper(nn.Module): + """Wrapper to use backbones in timm + https://github.com/rwightman/pytorch-image-models.""" + + def __init__( + self, + model_name, + features_only=True, + pretrained=True, + checkpoint_path="", + in_channels=3, + **kwargs, + ): + try: + import timm + except ImportError as exc: + raise RuntimeError( + "timm is not installed, please install it first" + ) from exc + super(TIMMWrapper, self).__init__() + self.timm = timm.create_model( + model_name=model_name, + features_only=features_only, + pretrained=pretrained, + in_chans=in_channels, + checkpoint_path=checkpoint_path, + **kwargs, + ) + + # Remove unused layers + self.timm.global_pool = None + self.timm.fc = None + self.timm.classifier = None + + feature_info = getattr(self.timm, "feature_info", None) + if feature_info: + logger.info(f"TIMM backbone feature channels: {feature_info.channels()}") + + def forward(self, x): + outs = self.timm(x) + if isinstance(outs, (list, tuple)): + features = tuple(outs) + else: + features = (outs,) + return features diff --git a/nanodet/model/fpn/__init__.py b/nanodet/model/fpn/__init__.py new file mode 100644 index 0000000..e55e2f6 --- /dev/null +++ b/nanodet/model/fpn/__init__.py @@ -0,0 +1,35 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy + +from .fpn import FPN +from .ghost_pan import GhostPAN +from .pan import PAN +from .tan import TAN + + +def build_fpn(cfg): + fpn_cfg = copy.deepcopy(cfg) + name = fpn_cfg.pop("name") + if name == "FPN": + return FPN(**fpn_cfg) + elif name == "PAN": + return PAN(**fpn_cfg) + elif name == "TAN": + return TAN(**fpn_cfg) + elif name == "GhostPAN": + return GhostPAN(**fpn_cfg) + else: + raise NotImplementedError diff --git a/nanodet/model/fpn/fpn.py b/nanodet/model/fpn/fpn.py new file mode 100644 index 0000000..a163ca1 --- /dev/null +++ b/nanodet/model/fpn/fpn.py @@ -0,0 +1,100 @@ +# Modification 2020 RangiLyu +# Copyright 2018-2019 Open-MMLab. + +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at + +# http://www.apache.org/licenses/LICENSE-2.0 + +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch.nn as nn +import torch.nn.functional as F + +from ..module.conv import ConvModule +from ..module.init_weights import xavier_init + + +class FPN(nn.Module): + def __init__( + self, + in_channels, + out_channels, + num_outs, + start_level=0, + end_level=-1, + conv_cfg=None, + norm_cfg=None, + activation=None, + ): + super(FPN, self).__init__() + assert isinstance(in_channels, list) + self.in_channels = in_channels + self.out_channels = out_channels + self.num_ins = len(in_channels) + self.num_outs = num_outs + self.fp16_enabled = False + + if end_level == -1: + self.backbone_end_level = self.num_ins + assert num_outs >= self.num_ins - start_level + else: + # if end_level < inputs, no extra level is allowed + self.backbone_end_level = end_level + assert end_level <= len(in_channels) + assert num_outs == end_level - start_level + self.start_level = start_level + self.end_level = end_level + self.lateral_convs = nn.ModuleList() + + for i in range(self.start_level, self.backbone_end_level): + l_conv = ConvModule( + in_channels[i], + out_channels, + 1, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + activation=activation, + inplace=False, + ) + + self.lateral_convs.append(l_conv) + self.init_weights() + + # default init_weights for conv(msra) and norm in ConvModule + def init_weights(self): + for m in self.modules(): + if isinstance(m, nn.Conv2d): + xavier_init(m, distribution="uniform") + + def forward(self, inputs): + assert len(inputs) == len(self.in_channels) + + # build laterals + laterals = [ + lateral_conv(inputs[i + self.start_level]) + for i, lateral_conv in enumerate(self.lateral_convs) + ] + + # build top-down path + used_backbone_levels = len(laterals) + for i in range(used_backbone_levels - 1, 0, -1): + laterals[i - 1] += F.interpolate( + laterals[i], scale_factor=2, mode="bilinear" + ) + + # build outputs + outs = [ + # self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels) + laterals[i] + for i in range(used_backbone_levels) + ] + return tuple(outs) + + +# if __name__ == '__main__': diff --git a/nanodet/model/fpn/ghost_pan.py b/nanodet/model/fpn/ghost_pan.py new file mode 100644 index 0000000..0cb4740 --- /dev/null +++ b/nanodet/model/fpn/ghost_pan.py @@ -0,0 +1,244 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import torch +import torch.nn as nn + +from ..backbone.ghostnet import GhostBottleneck +from ..module.conv import ConvModule, DepthwiseConvModule + + +class GhostBlocks(nn.Module): + """Stack of GhostBottleneck used in GhostPAN. + + Args: + in_channels (int): Number of input channels. + out_channels (int): Number of output channels. + expand (int): Expand ratio of GhostBottleneck. Default: 1. + kernel_size (int): Kernel size of depthwise convolution. Default: 5. + num_blocks (int): Number of GhostBottlecneck blocks. Default: 1. + use_res (bool): Whether to use residual connection. Default: False. + activation (str): Name of activation function. Default: LeakyReLU. + """ + + def __init__( + self, + in_channels, + out_channels, + expand=1, + kernel_size=5, + num_blocks=1, + use_res=False, + activation="LeakyReLU", + ): + super(GhostBlocks, self).__init__() + self.use_res = use_res + if use_res: + self.reduce_conv = ConvModule( + in_channels, + out_channels, + kernel_size=1, + stride=1, + padding=0, + activation=activation, + ) + blocks = [] + for _ in range(num_blocks): + blocks.append( + GhostBottleneck( + in_channels, + int(out_channels * expand), + out_channels, + dw_kernel_size=kernel_size, + activation=activation, + ) + ) + self.blocks = nn.Sequential(*blocks) + + def forward(self, x): + out = self.blocks(x) + if self.use_res: + out = out + self.reduce_conv(x) + return out + + +class GhostPAN(nn.Module): + """Path Aggregation Network with Ghost block. + + Args: + in_channels (List[int]): Number of input channels per scale. + out_channels (int): Number of output channels (used at each scale) + num_csp_blocks (int): Number of bottlenecks in CSPLayer. Default: 3 + use_depthwise (bool): Whether to depthwise separable convolution in + blocks. Default: False + kernel_size (int): Kernel size of depthwise convolution. Default: 5. + expand (int): Expand ratio of GhostBottleneck. Default: 1. + num_blocks (int): Number of GhostBottlecneck blocks. Default: 1. + use_res (bool): Whether to use residual connection. Default: False. + num_extra_level (int): Number of extra conv layers for more feature levels. + Default: 0. + upsample_cfg (dict): Config dict for interpolate layer. + Default: `dict(scale_factor=2, mode='nearest')` + norm_cfg (dict): Config dict for normalization layer. + Default: dict(type='BN') + activation (str): Activation layer name. + Default: LeakyReLU. + """ + + def __init__( + self, + in_channels, + out_channels, + use_depthwise=False, + kernel_size=5, + expand=1, + num_blocks=1, + use_res=False, + num_extra_level=0, + upsample_cfg=dict(scale_factor=2, mode="bilinear"), + norm_cfg=dict(type="BN"), + activation="LeakyReLU", + ): + super(GhostPAN, self).__init__() + assert num_extra_level >= 0 + assert num_blocks >= 1 + self.in_channels = in_channels + self.out_channels = out_channels + + conv = DepthwiseConvModule if use_depthwise else ConvModule + + # build top-down blocks + self.upsample = nn.Upsample(**upsample_cfg) + self.reduce_layers = nn.ModuleList() + for idx in range(len(in_channels)): + self.reduce_layers.append( + ConvModule( + in_channels[idx], + out_channels, + 1, + norm_cfg=norm_cfg, + activation=activation, + ) + ) + self.top_down_blocks = nn.ModuleList() + for idx in range(len(in_channels) - 1, 0, -1): + self.top_down_blocks.append( + GhostBlocks( + out_channels * 2, + out_channels, + expand, + kernel_size=kernel_size, + num_blocks=num_blocks, + use_res=use_res, + activation=activation, + ) + ) + + # build bottom-up blocks + self.downsamples = nn.ModuleList() + self.bottom_up_blocks = nn.ModuleList() + for idx in range(len(in_channels) - 1): + self.downsamples.append( + conv( + out_channels, + out_channels, + kernel_size, + stride=2, + padding=kernel_size // 2, + norm_cfg=norm_cfg, + activation=activation, + ) + ) + self.bottom_up_blocks.append( + GhostBlocks( + out_channels * 2, + out_channels, + expand, + kernel_size=kernel_size, + num_blocks=num_blocks, + use_res=use_res, + activation=activation, + ) + ) + + # extra layers + self.extra_lvl_in_conv = nn.ModuleList() + self.extra_lvl_out_conv = nn.ModuleList() + for i in range(num_extra_level): + self.extra_lvl_in_conv.append( + conv( + out_channels, + out_channels, + kernel_size, + stride=2, + padding=kernel_size // 2, + norm_cfg=norm_cfg, + activation=activation, + ) + ) + self.extra_lvl_out_conv.append( + conv( + out_channels, + out_channels, + kernel_size, + stride=2, + padding=kernel_size // 2, + norm_cfg=norm_cfg, + activation=activation, + ) + ) + + def forward(self, inputs): + """ + Args: + inputs (tuple[Tensor]): input features. + Returns: + tuple[Tensor]: multi level features. + """ + assert len(inputs) == len(self.in_channels) + inputs = [ + reduce(input_x) for input_x, reduce in zip(inputs, self.reduce_layers) + ] + # top-down path + inner_outs = [inputs[-1]] + for idx in range(len(self.in_channels) - 1, 0, -1): + feat_heigh = inner_outs[0] + feat_low = inputs[idx - 1] + + inner_outs[0] = feat_heigh + + upsample_feat = self.upsample(feat_heigh) + + inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx]( + torch.cat([upsample_feat, feat_low], 1) + ) + inner_outs.insert(0, inner_out) + + # bottom-up path + outs = [inner_outs[0]] + for idx in range(len(self.in_channels) - 1): + feat_low = outs[-1] + feat_height = inner_outs[idx + 1] + downsample_feat = self.downsamples[idx](feat_low) + out = self.bottom_up_blocks[idx]( + torch.cat([downsample_feat, feat_height], 1) + ) + outs.append(out) + + # extra layers + for extra_in_layer, extra_out_layer in zip( + self.extra_lvl_in_conv, self.extra_lvl_out_conv + ): + outs.append(extra_in_layer(inputs[-1]) + extra_out_layer(outs[-1])) + + return tuple(outs) diff --git a/nanodet/model/fpn/pan.py b/nanodet/model/fpn/pan.py new file mode 100644 index 0000000..807ddf9 --- /dev/null +++ b/nanodet/model/fpn/pan.py @@ -0,0 +1,94 @@ +# Modification 2020 RangiLyu +# Copyright 2018-2019 Open-MMLab. + +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at + +# http://www.apache.org/licenses/LICENSE-2.0 + +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch.nn.functional as F + +from .fpn import FPN + + +class PAN(FPN): + """Path Aggregation Network for Instance Segmentation. + + This is an implementation of the `PAN in Path Aggregation Network + `_. + + Args: + in_channels (List[int]): Number of input channels per scale. + out_channels (int): Number of output channels (used at each scale) + num_outs (int): Number of output scales. + start_level (int): Index of the start input backbone level used to + build the feature pyramid. Default: 0. + end_level (int): Index of the end input backbone level (exclusive) to + build the feature pyramid. Default: -1, which means the last level. + conv_cfg (dict): Config dict for convolution layer. Default: None. + norm_cfg (dict): Config dict for normalization layer. Default: None. + activation (str): Config dict for activation layer in ConvModule. + Default: None. + """ + + def __init__( + self, + in_channels, + out_channels, + num_outs, + start_level=0, + end_level=-1, + conv_cfg=None, + norm_cfg=None, + activation=None, + ): + super(PAN, self).__init__( + in_channels, + out_channels, + num_outs, + start_level, + end_level, + conv_cfg, + norm_cfg, + activation, + ) + self.init_weights() + + def forward(self, inputs): + """Forward function.""" + assert len(inputs) == len(self.in_channels) + + # build laterals + laterals = [ + lateral_conv(inputs[i + self.start_level]) + for i, lateral_conv in enumerate(self.lateral_convs) + ] + + # build top-down path + used_backbone_levels = len(laterals) + for i in range(used_backbone_levels - 1, 0, -1): + laterals[i - 1] += F.interpolate( + laterals[i], scale_factor=2, mode="bilinear" + ) + + # build outputs + # part 1: from original levels + inter_outs = [laterals[i] for i in range(used_backbone_levels)] + + # part 2: add bottom-up path + for i in range(0, used_backbone_levels - 1): + inter_outs[i + 1] += F.interpolate( + inter_outs[i], scale_factor=0.5, mode="bilinear" + ) + + outs = [] + outs.append(inter_outs[0]) + outs.extend([inter_outs[i] for i in range(1, used_backbone_levels)]) + return tuple(outs) diff --git a/nanodet/model/fpn/tan.py b/nanodet/model/fpn/tan.py new file mode 100644 index 0000000..6ffc305 --- /dev/null +++ b/nanodet/model/fpn/tan.py @@ -0,0 +1,123 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch +import torch.nn as nn +import torch.nn.functional as F + +from ..module.conv import ConvModule +from ..module.init_weights import normal_init +from ..module.transformer import TransformerBlock + + +class TAN(nn.Module): + """ + Transformer Attention Network. + + :param in_channels: Number of input channels per scale. + :param out_channels: Number of output channel. + :param feature_hw: Size of feature map input to transformer. + :param num_heads: Number of attention heads. + :param num_encoders: Number of transformer encoder layers. + :param mlp_ratio: Hidden layer dimension expand ratio in MLP. + :param dropout_ratio: Probability of an element to be zeroed. + :param activation: Activation layer type. + """ + + def __init__( + self, + in_channels, + out_channels, + feature_hw, + num_heads, + num_encoders, + mlp_ratio, + dropout_ratio, + activation="LeakyReLU", + ): + super(TAN, self).__init__() + assert isinstance(in_channels, list) + self.in_channels = in_channels + self.out_channels = out_channels + self.num_ins = len(in_channels) + assert self.num_ins == 3 + + self.lateral_convs = nn.ModuleList() + for i in range(self.num_ins): + l_conv = ConvModule( + in_channels[i], + out_channels, + 1, + norm_cfg=dict(type="BN"), + activation=activation, + inplace=False, + ) + self.lateral_convs.append(l_conv) + self.transformer = TransformerBlock( + out_channels * self.num_ins, + out_channels, + num_heads, + num_encoders, + mlp_ratio, + dropout_ratio, + activation=activation, + ) + self.pos_embed = nn.Parameter( + torch.zeros(feature_hw[0] * feature_hw[1], 1, out_channels) + ) + + self.init_weights() + + def init_weights(self): + torch.nn.init.trunc_normal_(self.pos_embed, std=0.02) + for m in self.modules(): + if isinstance(m, nn.Linear): + torch.nn.init.trunc_normal_(m.weight, std=0.02) + if isinstance(m, nn.Linear) and m.bias is not None: + nn.init.constant_(m.bias, 0) + elif isinstance(m, nn.LayerNorm): + nn.init.constant_(m.bias, 0) + nn.init.constant_(m.weight, 1.0) + elif isinstance(m, nn.Conv2d): + normal_init(m, 0.01) + + def forward(self, inputs): + assert len(inputs) == len(self.in_channels) + + # build laterals + laterals = [ + lateral_conv(inputs[i]) for i, lateral_conv in enumerate(self.lateral_convs) + ] + + # transformer attention + mid_shape = laterals[1].shape[2:] + mid_lvl = torch.cat( + ( + F.interpolate(laterals[0], size=mid_shape, mode="bilinear"), + laterals[1], + F.interpolate(laterals[2], size=mid_shape, mode="bilinear"), + ), + dim=1, + ) + mid_lvl = self.transformer(mid_lvl, self.pos_embed) + + # build outputs + outs = [ + laterals[0] + + F.interpolate(mid_lvl, size=laterals[0].shape[2:], mode="bilinear"), + laterals[1] + mid_lvl, + laterals[2] + + F.interpolate(mid_lvl, size=laterals[2].shape[2:], mode="bilinear"), + ] + return tuple(outs) diff --git a/nanodet/model/head/__init__.py b/nanodet/model/head/__init__.py new file mode 100644 index 0000000..d1ef2dd --- /dev/null +++ b/nanodet/model/head/__init__.py @@ -0,0 +1,21 @@ +import copy + +from .gfl_head import GFLHead +from .nanodet_head import NanoDetHead +from .nanodet_plus_head import NanoDetPlusHead +from .simple_conv_head import SimpleConvHead + + +def build_head(cfg): + head_cfg = copy.deepcopy(cfg) + name = head_cfg.pop("name") + if name == "GFLHead": + return GFLHead(**head_cfg) + elif name == "NanoDetHead": + return NanoDetHead(**head_cfg) + elif name == "NanoDetPlusHead": + return NanoDetPlusHead(**head_cfg) + elif name == "SimpleConvHead": + return SimpleConvHead(**head_cfg) + else: + raise NotImplementedError diff --git a/nanodet/model/head/assigner/assign_result.py b/nanodet/model/head/assigner/assign_result.py new file mode 100644 index 0000000..fb7c65e --- /dev/null +++ b/nanodet/model/head/assigner/assign_result.py @@ -0,0 +1,227 @@ +# Modification 2020 RangiLyu +# Copyright 2018-2019 Open-MMLab. + +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at + +# http://www.apache.org/licenses/LICENSE-2.0 + +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch + +from nanodet.util import util_mixins + + +class AssignResult(util_mixins.NiceRepr): + """ + Stores assignments between predicted and truth boxes. + + Attributes: + num_gts (int): the number of truth boxes considered when computing this + assignment + + gt_inds (LongTensor): for each predicted box indicates the 1-based + index of the assigned truth box. 0 means unassigned and -1 means + ignore. + + max_overlaps (FloatTensor): the iou between the predicted box and its + assigned truth box. + + labels (None | LongTensor): If specified, for each predicted box + indicates the category label of the assigned truth box. + + Example: + >>> # An assign result between 4 predicted boxes and 9 true boxes + >>> # where only two boxes were assigned. + >>> num_gts = 9 + >>> max_overlaps = torch.LongTensor([0, .5, .9, 0]) + >>> gt_inds = torch.LongTensor([-1, 1, 2, 0]) + >>> labels = torch.LongTensor([0, 3, 4, 0]) + >>> self = AssignResult(num_gts, gt_inds, max_overlaps, labels) + >>> print(str(self)) # xdoctest: +IGNORE_WANT + + >>> # Force addition of gt labels (when adding gt as proposals) + >>> new_labels = torch.LongTensor([3, 4, 5]) + >>> self.add_gt_(new_labels) + >>> print(str(self)) # xdoctest: +IGNORE_WANT + + """ + + def __init__(self, num_gts, gt_inds, max_overlaps, labels=None): + self.num_gts = num_gts + self.gt_inds = gt_inds + self.max_overlaps = max_overlaps + self.labels = labels + # Interface for possible user-defined properties + self._extra_properties = {} + + @property + def num_preds(self): + """int: the number of predictions in this assignment""" + return len(self.gt_inds) + + def set_extra_property(self, key, value): + """Set user-defined new property.""" + assert key not in self.info + self._extra_properties[key] = value + + def get_extra_property(self, key): + """Get user-defined property.""" + return self._extra_properties.get(key, None) + + @property + def info(self): + """dict: a dictionary of info about the object""" + basic_info = { + "num_gts": self.num_gts, + "num_preds": self.num_preds, + "gt_inds": self.gt_inds, + "max_overlaps": self.max_overlaps, + "labels": self.labels, + } + basic_info.update(self._extra_properties) + return basic_info + + def __nice__(self): + """str: a "nice" summary string describing this assign result""" + parts = [] + parts.append(f"num_gts={self.num_gts!r}") + if self.gt_inds is None: + parts.append(f"gt_inds={self.gt_inds!r}") + else: + parts.append(f"gt_inds.shape={tuple(self.gt_inds.shape)!r}") + if self.max_overlaps is None: + parts.append(f"max_overlaps={self.max_overlaps!r}") + else: + parts.append("max_overlaps.shape=" f"{tuple(self.max_overlaps.shape)!r}") + if self.labels is None: + parts.append(f"labels={self.labels!r}") + else: + parts.append(f"labels.shape={tuple(self.labels.shape)!r}") + return ", ".join(parts) + + @classmethod + def random(cls, **kwargs): + """Create random AssignResult for tests or debugging. + + Args: + num_preds: number of predicted boxes + num_gts: number of true boxes + p_ignore (float): probability of a predicted box assinged to an + ignored truth + p_assigned (float): probability of a predicted box not being + assigned + p_use_label (float | bool): with labels or not + rng (None | int | numpy.random.RandomState): seed or state + + Returns: + :obj:`AssignResult`: Randomly generated assign results. + + Example: + >>> from nanodet.model.head.assigner.assign_result import AssignResult + >>> self = AssignResult.random() + >>> print(self.info) + """ + rng = kwargs.get("rng", None) + num_gts = kwargs.get("num_gts", None) + num_preds = kwargs.get("num_preds", None) + p_ignore = kwargs.get("p_ignore", 0.3) + p_assigned = kwargs.get("p_assigned", 0.7) + p_use_label = kwargs.get("p_use_label", 0.5) + num_classes = kwargs.get("p_use_label", 3) + + import numpy as np + + if rng is None: + rng = np.random.mtrand._rand + elif isinstance(rng, int): + rng = np.random.RandomState(rng) + else: + rng = rng + if num_gts is None: + num_gts = rng.randint(0, 8) + if num_preds is None: + num_preds = rng.randint(0, 16) + + if num_gts == 0: + max_overlaps = torch.zeros(num_preds, dtype=torch.float32) + gt_inds = torch.zeros(num_preds, dtype=torch.int64) + if p_use_label is True or p_use_label < rng.rand(): + labels = torch.zeros(num_preds, dtype=torch.int64) + else: + labels = None + else: + import numpy as np + + # Create an overlap for each predicted box + max_overlaps = torch.from_numpy(rng.rand(num_preds)) + + # Construct gt_inds for each predicted box + is_assigned = torch.from_numpy(rng.rand(num_preds) < p_assigned) + # maximum number of assignments constraints + n_assigned = min(num_preds, min(num_gts, is_assigned.sum())) + + assigned_idxs = np.where(is_assigned)[0] + rng.shuffle(assigned_idxs) + assigned_idxs = assigned_idxs[0:n_assigned] + assigned_idxs.sort() + + is_assigned[:] = 0 + is_assigned[assigned_idxs] = True + + is_ignore = torch.from_numpy(rng.rand(num_preds) < p_ignore) & is_assigned + + gt_inds = torch.zeros(num_preds, dtype=torch.int64) + + true_idxs = np.arange(num_gts) + rng.shuffle(true_idxs) + true_idxs = torch.from_numpy(true_idxs) + gt_inds[is_assigned] = true_idxs[:n_assigned] + + gt_inds = torch.from_numpy(rng.randint(1, num_gts + 1, size=num_preds)) + gt_inds[is_ignore] = -1 + gt_inds[~is_assigned] = 0 + max_overlaps[~is_assigned] = 0 + + if p_use_label is True or p_use_label < rng.rand(): + if num_classes == 0: + labels = torch.zeros(num_preds, dtype=torch.int64) + else: + labels = torch.from_numpy( + # remind that we set FG labels to [0, num_class-1] + # since mmdet v2.0 + # BG cat_id: num_class + rng.randint(0, num_classes, size=num_preds) + ) + labels[~is_assigned] = 0 + else: + labels = None + + self = cls(num_gts, gt_inds, max_overlaps, labels) + return self + + def add_gt_(self, gt_labels): + """Add ground truth as assigned results. + + Args: + gt_labels (torch.Tensor): Labels of gt boxes + """ + self_inds = torch.arange( + 1, len(gt_labels) + 1, dtype=torch.long, device=gt_labels.device + ) + self.gt_inds = torch.cat([self_inds, self.gt_inds]) + + self.max_overlaps = torch.cat( + [self.max_overlaps.new_ones(len(gt_labels)), self.max_overlaps] + ) + + if self.labels is not None: + self.labels = torch.cat([gt_labels, self.labels]) diff --git a/nanodet/model/head/assigner/atss_assigner.py b/nanodet/model/head/assigner/atss_assigner.py new file mode 100644 index 0000000..c182bff --- /dev/null +++ b/nanodet/model/head/assigner/atss_assigner.py @@ -0,0 +1,174 @@ +# Modification 2020 RangiLyu +# Copyright 2018-2019 Open-MMLab. + +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at + +# http://www.apache.org/licenses/LICENSE-2.0 + +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch + +from ...loss.iou_loss import bbox_overlaps +from .assign_result import AssignResult +from .base_assigner import BaseAssigner + + +class ATSSAssigner(BaseAssigner): + """Assign a corresponding gt bbox or background to each bbox. + + Each proposals will be assigned with `0` or a positive integer + indicating the ground truth index. + + - 0: negative sample, no assigned gt + - positive integer: positive sample, index (1-based) of assigned gt + + Args: + topk (float): number of bbox selected in each level + """ + + def __init__(self, topk): + self.topk = topk + + # https://github.com/sfzhang15/ATSS/blob/master/atss_core/modeling/rpn/atss/loss.py + + def assign( + self, bboxes, num_level_bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None + ): + """Assign gt to bboxes. + + The assignment is done in following steps + + 1. compute iou between all bbox (bbox of all pyramid levels) and gt + 2. compute center distance between all bbox and gt + 3. on each pyramid level, for each gt, select k bbox whose center + are closest to the gt center, so we total select k*l bbox as + candidates for each gt + 4. get corresponding iou for the these candidates, and compute the + mean and std, set mean + std as the iou threshold + 5. select these candidates whose iou are greater than or equal to + the threshold as postive + 6. limit the positive sample's center in gt + + + Args: + bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4). + num_level_bboxes (List): num of bboxes in each level + gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4). + gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are + labelled as `ignored`, e.g., crowd boxes in COCO. + gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ). + + Returns: + :obj:`AssignResult`: The assign result. + """ + INF = 100000000 + bboxes = bboxes[:, :4] + num_gt, num_bboxes = gt_bboxes.size(0), bboxes.size(0) + + # compute iou between all bbox and gt + overlaps = bbox_overlaps(bboxes, gt_bboxes) + + # assign 0 by default + assigned_gt_inds = overlaps.new_full((num_bboxes,), 0, dtype=torch.long) + + if num_gt == 0 or num_bboxes == 0: + # No ground truth or boxes, return empty assignment + max_overlaps = overlaps.new_zeros((num_bboxes,)) + if num_gt == 0: + # No truth, assign everything to background + assigned_gt_inds[:] = 0 + if gt_labels is None: + assigned_labels = None + else: + assigned_labels = overlaps.new_full((num_bboxes,), -1, dtype=torch.long) + return AssignResult( + num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels + ) + + # compute center distance between all bbox and gt + gt_cx = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0 + gt_cy = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0 + gt_points = torch.stack((gt_cx, gt_cy), dim=1) + + bboxes_cx = (bboxes[:, 0] + bboxes[:, 2]) / 2.0 + bboxes_cy = (bboxes[:, 1] + bboxes[:, 3]) / 2.0 + bboxes_points = torch.stack((bboxes_cx, bboxes_cy), dim=1) + + distances = ( + (bboxes_points[:, None, :] - gt_points[None, :, :]).pow(2).sum(-1).sqrt() + ) + + # Selecting candidates based on the center distance + candidate_idxs = [] + start_idx = 0 + for level, bboxes_per_level in enumerate(num_level_bboxes): + # on each pyramid level, for each gt, + # select k bbox whose center are closest to the gt center + end_idx = start_idx + bboxes_per_level + distances_per_level = distances[start_idx:end_idx, :] + selectable_k = min(self.topk, bboxes_per_level) + _, topk_idxs_per_level = distances_per_level.topk( + selectable_k, dim=0, largest=False + ) + candidate_idxs.append(topk_idxs_per_level + start_idx) + start_idx = end_idx + candidate_idxs = torch.cat(candidate_idxs, dim=0) + + # get corresponding iou for the these candidates, and compute the + # mean and std, set mean + std as the iou threshold + candidate_overlaps = overlaps[candidate_idxs, torch.arange(num_gt)] + overlaps_mean_per_gt = candidate_overlaps.mean(0) + overlaps_std_per_gt = candidate_overlaps.std(0) + overlaps_thr_per_gt = overlaps_mean_per_gt + overlaps_std_per_gt + + is_pos = candidate_overlaps >= overlaps_thr_per_gt[None, :] + + # limit the positive sample's center in gt + for gt_idx in range(num_gt): + candidate_idxs[:, gt_idx] += gt_idx * num_bboxes + ep_bboxes_cx = ( + bboxes_cx.view(1, -1).expand(num_gt, num_bboxes).contiguous().view(-1) + ) + ep_bboxes_cy = ( + bboxes_cy.view(1, -1).expand(num_gt, num_bboxes).contiguous().view(-1) + ) + candidate_idxs = candidate_idxs.view(-1) + + # calculate the left, top, right, bottom distance between positive + # bbox center and gt side + l_ = ep_bboxes_cx[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 0] + t_ = ep_bboxes_cy[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 1] + r_ = gt_bboxes[:, 2] - ep_bboxes_cx[candidate_idxs].view(-1, num_gt) + b_ = gt_bboxes[:, 3] - ep_bboxes_cy[candidate_idxs].view(-1, num_gt) + is_in_gts = torch.stack([l_, t_, r_, b_], dim=1).min(dim=1)[0] > 0.01 + is_pos = is_pos & is_in_gts + + # if an anchor box is assigned to multiple gts, + # the one with the highest IoU will be selected. + overlaps_inf = torch.full_like(overlaps, -INF).t().contiguous().view(-1) + index = candidate_idxs.view(-1)[is_pos.view(-1)] + overlaps_inf[index] = overlaps.t().contiguous().view(-1)[index] + overlaps_inf = overlaps_inf.view(num_gt, -1).t() + + max_overlaps, argmax_overlaps = overlaps_inf.max(dim=1) + assigned_gt_inds[max_overlaps != -INF] = ( + argmax_overlaps[max_overlaps != -INF] + 1 + ) + + if gt_labels is not None: + assigned_labels = assigned_gt_inds.new_full((num_bboxes,), -1) + pos_inds = torch.nonzero(assigned_gt_inds > 0, as_tuple=False).squeeze() + if pos_inds.numel() > 0: + assigned_labels[pos_inds] = gt_labels[assigned_gt_inds[pos_inds] - 1] + else: + assigned_labels = None + return AssignResult( + num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels + ) diff --git a/nanodet/model/head/assigner/base_assigner.py b/nanodet/model/head/assigner/base_assigner.py new file mode 100644 index 0000000..8a9094f --- /dev/null +++ b/nanodet/model/head/assigner/base_assigner.py @@ -0,0 +1,7 @@ +from abc import ABCMeta, abstractmethod + + +class BaseAssigner(metaclass=ABCMeta): + @abstractmethod + def assign(self, bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None): + pass diff --git a/nanodet/model/head/assigner/dsl_assigner.py b/nanodet/model/head/assigner/dsl_assigner.py new file mode 100644 index 0000000..e74dc08 --- /dev/null +++ b/nanodet/model/head/assigner/dsl_assigner.py @@ -0,0 +1,154 @@ +import torch +import torch.nn.functional as F + +from ...loss.iou_loss import bbox_overlaps +from .assign_result import AssignResult +from .base_assigner import BaseAssigner + + +class DynamicSoftLabelAssigner(BaseAssigner): + """Computes matching between predictions and ground truth with + dynamic soft label assignment. + + Args: + topk (int): Select top-k predictions to calculate dynamic k + best matchs for each gt. Default 13. + iou_factor (float): The scale factor of iou cost. Default 3.0. + """ + + def __init__(self, topk=13, iou_factor=3.0): + self.topk = topk + self.iou_factor = iou_factor + + def assign( + self, + pred_scores, + priors, + decoded_bboxes, + gt_bboxes, + gt_labels, + ): + """Assign gt to priors with dynamic soft label assignment. + Args: + pred_scores (Tensor): Classification scores of one image, + a 2D-Tensor with shape [num_priors, num_classes] + priors (Tensor): All priors of one image, a 2D-Tensor with shape + [num_priors, 4] in [cx, xy, stride_w, stride_y] format. + decoded_bboxes (Tensor): Predicted bboxes, a 2D-Tensor with shape + [num_priors, 4] in [tl_x, tl_y, br_x, br_y] format. + gt_bboxes (Tensor): Ground truth bboxes of one image, a 2D-Tensor + with shape [num_gts, 4] in [tl_x, tl_y, br_x, br_y] format. + gt_labels (Tensor): Ground truth labels of one image, a Tensor + with shape [num_gts]. + + Returns: + :obj:`AssignResult`: The assigned result. + """ + INF = 100000000 + num_gt = gt_bboxes.size(0) + num_bboxes = decoded_bboxes.size(0) + + # assign 0 by default + assigned_gt_inds = decoded_bboxes.new_full((num_bboxes,), 0, dtype=torch.long) + + prior_center = priors[:, :2] + lt_ = prior_center[:, None] - gt_bboxes[:, :2] + rb_ = gt_bboxes[:, 2:] - prior_center[:, None] + + deltas = torch.cat([lt_, rb_], dim=-1) + is_in_gts = deltas.min(dim=-1).values > 0 + valid_mask = is_in_gts.sum(dim=1) > 0 + + valid_decoded_bbox = decoded_bboxes[valid_mask] + valid_pred_scores = pred_scores[valid_mask] + num_valid = valid_decoded_bbox.size(0) + + if num_gt == 0 or num_bboxes == 0 or num_valid == 0: + # No ground truth or boxes, return empty assignment + max_overlaps = decoded_bboxes.new_zeros((num_bboxes,)) + if num_gt == 0: + # No truth, assign everything to background + assigned_gt_inds[:] = 0 + if gt_labels is None: + assigned_labels = None + else: + assigned_labels = decoded_bboxes.new_full( + (num_bboxes,), -1, dtype=torch.long + ) + return AssignResult( + num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels + ) + + pairwise_ious = bbox_overlaps(valid_decoded_bbox, gt_bboxes) + iou_cost = -torch.log(pairwise_ious + 1e-7) + + gt_onehot_label = ( + F.one_hot(gt_labels.to(torch.int64), pred_scores.shape[-1]) + .float() + .unsqueeze(0) + .repeat(num_valid, 1, 1) + ) + valid_pred_scores = valid_pred_scores.unsqueeze(1).repeat(1, num_gt, 1) + + soft_label = gt_onehot_label * pairwise_ious[..., None] + scale_factor = soft_label - valid_pred_scores + + cls_cost = F.binary_cross_entropy( + valid_pred_scores, soft_label, reduction="none" + ) * scale_factor.abs().pow(2.0) + + cls_cost = cls_cost.sum(dim=-1) + + cost_matrix = cls_cost + iou_cost * self.iou_factor + + matched_pred_ious, matched_gt_inds = self.dynamic_k_matching( + cost_matrix, pairwise_ious, num_gt, valid_mask + ) + + # convert to AssignResult format + assigned_gt_inds[valid_mask] = matched_gt_inds + 1 + assigned_labels = assigned_gt_inds.new_full((num_bboxes,), -1) + assigned_labels[valid_mask] = gt_labels[matched_gt_inds].long() + max_overlaps = assigned_gt_inds.new_full( + (num_bboxes,), -INF, dtype=torch.float32 + ) + max_overlaps[valid_mask] = matched_pred_ious + return AssignResult( + num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels + ) + + def dynamic_k_matching(self, cost, pairwise_ious, num_gt, valid_mask): + """Use sum of topk pred iou as dynamic k. Refer from OTA and YOLOX. + + Args: + cost (Tensor): Cost matrix. + pairwise_ious (Tensor): Pairwise iou matrix. + num_gt (int): Number of gt. + valid_mask (Tensor): Mask for valid bboxes. + """ + matching_matrix = torch.zeros_like(cost) + # select candidate topk ious for dynamic-k calculation + candidate_topk = min(self.topk, pairwise_ious.size(0)) + topk_ious, _ = torch.topk(pairwise_ious, candidate_topk, dim=0) + # calculate dynamic k for each gt + dynamic_ks = torch.clamp(topk_ious.sum(0).int(), min=1) + for gt_idx in range(num_gt): + _, pos_idx = torch.topk( + cost[:, gt_idx], k=dynamic_ks[gt_idx].item(), largest=False + ) + matching_matrix[:, gt_idx][pos_idx] = 1.0 + + del topk_ious, dynamic_ks, pos_idx + + prior_match_gt_mask = matching_matrix.sum(1) > 1 + if prior_match_gt_mask.sum() > 0: + cost_min, cost_argmin = torch.min(cost[prior_match_gt_mask, :], dim=1) + matching_matrix[prior_match_gt_mask, :] *= 0.0 + matching_matrix[prior_match_gt_mask, cost_argmin] = 1.0 + # get foreground mask inside box and center prior + fg_mask_inboxes = matching_matrix.sum(1) > 0.0 + valid_mask[valid_mask.clone()] = fg_mask_inboxes + + matched_gt_inds = matching_matrix[fg_mask_inboxes, :].argmax(1) + matched_pred_ious = (matching_matrix * pairwise_ious).sum(1)[fg_mask_inboxes] + return matched_pred_ious, matched_gt_inds diff --git a/nanodet/model/head/gfl_head.py b/nanodet/model/head/gfl_head.py new file mode 100644 index 0000000..ee5409c --- /dev/null +++ b/nanodet/model/head/gfl_head.py @@ -0,0 +1,708 @@ +import math + +import cv2 +import numpy as np +import torch +import torch.distributed as dist +import torch.nn as nn +import torch.nn.functional as F + +from nanodet.util import ( + bbox2distance, + distance2bbox, + images_to_levels, + multi_apply, + overlay_bbox_cv, +) + +from ...data.transform.warp import warp_boxes +from ..loss.gfocal_loss import DistributionFocalLoss, QualityFocalLoss +from ..loss.iou_loss import GIoULoss, bbox_overlaps +from ..module.conv import ConvModule +from ..module.init_weights import normal_init +from ..module.nms import multiclass_nms +from ..module.scale import Scale +from .assigner.atss_assigner import ATSSAssigner + + +def reduce_mean(tensor): + if not (dist.is_available() and dist.is_initialized()): + return tensor + tensor = tensor.clone() + dist.all_reduce(tensor.true_divide(dist.get_world_size()), op=dist.ReduceOp.SUM) + return tensor + + +class Integral(nn.Module): + """A fixed layer for calculating integral result from distribution. + This layer calculates the target location by :math: `sum{P(y_i) * y_i}`, + P(y_i) denotes the softmax vector that represents the discrete distribution + y_i denotes the discrete set, usually {0, 1, 2, ..., reg_max} + Args: + reg_max (int): The maximal value of the discrete set. Default: 16. You + may want to reset it according to your new dataset or related + settings. + """ + + def __init__(self, reg_max=16): + super(Integral, self).__init__() + self.reg_max = reg_max + self.register_buffer( + "project", torch.linspace(0, self.reg_max, self.reg_max + 1) + ) + + def forward(self, x): + """Forward feature from the regression head to get integral result of + bounding box location. + Args: + x (Tensor): Features of the regression head, shape (N, 4*(n+1)), + n is self.reg_max. + Returns: + x (Tensor): Integral result of box locations, i.e., distance + offsets from the box center in four directions, shape (N, 4). + """ + shape = x.size() + x = F.softmax(x.reshape(*shape[:-1], 4, self.reg_max + 1), dim=-1) + x = F.linear(x, self.project.type_as(x)).reshape(*shape[:-1], 4) + return x + + +class GFLHead(nn.Module): + """Generalized Focal Loss: Learning Qualified and Distributed Bounding + Boxes for Dense Object Detection. + + GFL head structure is similar with ATSS, however GFL uses + 1) joint representation for classification and localization quality, and + 2) flexible General distribution for bounding box locations, + which are supervised by + Quality Focal Loss (QFL) and Distribution Focal Loss (DFL), respectively + + https://arxiv.org/abs/2006.04388 + + :param num_classes: Number of categories excluding the background category. + :param loss: Config of all loss functions. + :param input_channel: Number of channels in the input feature map. + :param feat_channels: Number of conv layers in cls and reg tower. Default: 4. + :param stacked_convs: Number of conv layers in cls and reg tower. Default: 4. + :param octave_base_scale: Scale factor of grid cells. + :param strides: Down sample strides of all level feature map + :param conv_cfg: Dictionary to construct and config conv layer. Default: None. + :param norm_cfg: Dictionary to construct and config norm layer. + :param reg_max: Max value of integral set :math: `{0, ..., reg_max}` + in QFL setting. Default: 16. + :param kwargs: + """ + + def __init__( + self, + num_classes, + loss, + input_channel, + feat_channels=256, + stacked_convs=4, + octave_base_scale=4, + strides=[8, 16, 32], + conv_cfg=None, + norm_cfg=dict(type="GN", num_groups=32, requires_grad=True), + reg_max=16, + **kwargs + ): + super(GFLHead, self).__init__() + self.num_classes = num_classes + self.in_channels = input_channel + self.feat_channels = feat_channels + self.stacked_convs = stacked_convs + self.grid_cell_scale = octave_base_scale + self.strides = strides + self.reg_max = reg_max + + self.loss_cfg = loss + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.use_sigmoid = self.loss_cfg.loss_qfl.use_sigmoid + if self.use_sigmoid: + self.cls_out_channels = num_classes + else: + self.cls_out_channels = num_classes + 1 + + self.assigner = ATSSAssigner(topk=9) + self.distribution_project = Integral(self.reg_max) + + self.loss_qfl = QualityFocalLoss( + use_sigmoid=self.use_sigmoid, + beta=self.loss_cfg.loss_qfl.beta, + loss_weight=self.loss_cfg.loss_qfl.loss_weight, + ) + self.loss_dfl = DistributionFocalLoss( + loss_weight=self.loss_cfg.loss_dfl.loss_weight + ) + self.loss_bbox = GIoULoss(loss_weight=self.loss_cfg.loss_bbox.loss_weight) + self._init_layers() + self.init_weights() + + def _init_layers(self): + self.relu = nn.ReLU(inplace=True) + self.cls_convs = nn.ModuleList() + self.reg_convs = nn.ModuleList() + for i in range(self.stacked_convs): + chn = self.in_channels if i == 0 else self.feat_channels + self.cls_convs.append( + ConvModule( + chn, + self.feat_channels, + 3, + stride=1, + padding=1, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + ) + ) + self.reg_convs.append( + ConvModule( + chn, + self.feat_channels, + 3, + stride=1, + padding=1, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + ) + ) + self.gfl_cls = nn.Conv2d( + self.feat_channels, self.cls_out_channels, 3, padding=1 + ) + self.gfl_reg = nn.Conv2d( + self.feat_channels, 4 * (self.reg_max + 1), 3, padding=1 + ) + self.scales = nn.ModuleList([Scale(1.0) for _ in self.strides]) + + def init_weights(self): + for m in self.cls_convs: + normal_init(m.conv, std=0.01) + for m in self.reg_convs: + normal_init(m.conv, std=0.01) + bias_cls = -4.595 + normal_init(self.gfl_cls, std=0.01, bias=bias_cls) + normal_init(self.gfl_reg, std=0.01) + + def forward(self, feats): + if torch.onnx.is_in_onnx_export(): + return self._forward_onnx(feats) + outputs = [] + for x, scale in zip(feats, self.scales): + cls_feat = x + reg_feat = x + for cls_conv in self.cls_convs: + cls_feat = cls_conv(cls_feat) + for reg_conv in self.reg_convs: + reg_feat = reg_conv(reg_feat) + cls_score = self.gfl_cls(cls_feat) + bbox_pred = scale(self.gfl_reg(reg_feat)).float() + output = torch.cat([cls_score, bbox_pred], dim=1) + outputs.append(output.flatten(start_dim=2)) + outputs = torch.cat(outputs, dim=2).permute(0, 2, 1) + return outputs + + def loss(self, preds, gt_meta): + cls_scores, bbox_preds = preds.split( + [self.num_classes, 4 * (self.reg_max + 1)], dim=-1 + ) + device = cls_scores.device + gt_bboxes = gt_meta["gt_bboxes"] + gt_labels = gt_meta["gt_labels"] + input_height, input_width = gt_meta["img"].shape[2:] + gt_bboxes_ignore = None + + featmap_sizes = [ + (math.ceil(input_height / stride), math.ceil(input_width) / stride) + for stride in self.strides + ] + + cls_reg_targets = self.target_assign( + cls_scores, + bbox_preds, + featmap_sizes, + gt_bboxes, + gt_bboxes_ignore, + gt_labels, + device=device, + ) + if cls_reg_targets is None: + return None + + ( + cls_preds_list, + reg_preds_list, + grid_cells_list, + labels_list, + label_weights_list, + bbox_targets_list, + bbox_weights_list, + num_total_pos, + num_total_neg, + ) = cls_reg_targets + + num_total_samples = reduce_mean(torch.tensor(num_total_pos).to(device)).item() + num_total_samples = max(num_total_samples, 1.0) + + losses_qfl, losses_bbox, losses_dfl, avg_factor = multi_apply( + self.loss_single, + grid_cells_list, + cls_preds_list, + reg_preds_list, + labels_list, + label_weights_list, + bbox_targets_list, + self.strides, + num_total_samples=num_total_samples, + ) + + avg_factor = sum(avg_factor) + avg_factor = reduce_mean(avg_factor).item() + if avg_factor <= 0: + loss_qfl = torch.tensor(0, dtype=torch.float32, requires_grad=True).to( + device + ) + loss_bbox = torch.tensor(0, dtype=torch.float32, requires_grad=True).to( + device + ) + loss_dfl = torch.tensor(0, dtype=torch.float32, requires_grad=True).to( + device + ) + else: + losses_bbox = list(map(lambda x: x / avg_factor, losses_bbox)) + losses_dfl = list(map(lambda x: x / avg_factor, losses_dfl)) + + loss_qfl = sum(losses_qfl) + loss_bbox = sum(losses_bbox) + loss_dfl = sum(losses_dfl) + + loss = loss_qfl + loss_bbox + loss_dfl + loss_states = dict(loss_qfl=loss_qfl, loss_bbox=loss_bbox, loss_dfl=loss_dfl) + + return loss, loss_states + + def loss_single( + self, + grid_cells, + cls_score, + bbox_pred, + labels, + label_weights, + bbox_targets, + stride, + num_total_samples, + ): + grid_cells = grid_cells.reshape(-1, 4) + cls_score = cls_score.reshape(-1, self.cls_out_channels) + bbox_pred = bbox_pred.reshape(-1, 4 * (self.reg_max + 1)) + bbox_targets = bbox_targets.reshape(-1, 4) + labels = labels.reshape(-1) + label_weights = label_weights.reshape(-1) + + # FG cat_id: [0, num_classes -1], BG cat_id: num_classes + bg_class_ind = self.num_classes + pos_inds = torch.nonzero( + (labels >= 0) & (labels < bg_class_ind), as_tuple=False + ).squeeze(1) + + score = label_weights.new_zeros(labels.shape) + + if len(pos_inds) > 0: + pos_bbox_targets = bbox_targets[pos_inds] + pos_bbox_pred = bbox_pred[pos_inds] # (n, 4 * (reg_max + 1)) + pos_grid_cells = grid_cells[pos_inds] + pos_grid_cell_centers = self.grid_cells_to_center(pos_grid_cells) / stride + + weight_targets = cls_score.detach().sigmoid() + weight_targets = weight_targets.max(dim=1)[0][pos_inds] + pos_bbox_pred_corners = self.distribution_project(pos_bbox_pred) + pos_decode_bbox_pred = distance2bbox( + pos_grid_cell_centers, pos_bbox_pred_corners + ) + pos_decode_bbox_targets = pos_bbox_targets / stride + score[pos_inds] = bbox_overlaps( + pos_decode_bbox_pred.detach(), pos_decode_bbox_targets, is_aligned=True + ) + pred_corners = pos_bbox_pred.reshape(-1, self.reg_max + 1) + target_corners = bbox2distance( + pos_grid_cell_centers, pos_decode_bbox_targets, self.reg_max + ).reshape(-1) + + # regression loss + loss_bbox = self.loss_bbox( + pos_decode_bbox_pred, + pos_decode_bbox_targets, + weight=weight_targets, + avg_factor=1.0, + ) + + # dfl loss + loss_dfl = self.loss_dfl( + pred_corners, + target_corners, + weight=weight_targets[:, None].expand(-1, 4).reshape(-1), + avg_factor=4.0, + ) + else: + loss_bbox = bbox_pred.sum() * 0 + loss_dfl = bbox_pred.sum() * 0 + weight_targets = torch.tensor(0).to(cls_score.device) + + # qfl loss + loss_qfl = self.loss_qfl( + cls_score, + (labels, score), + weight=label_weights, + avg_factor=num_total_samples, + ) + + return loss_qfl, loss_bbox, loss_dfl, weight_targets.sum() + + def target_assign( + self, + cls_preds, + reg_preds, + featmap_sizes, + gt_bboxes_list, + gt_bboxes_ignore_list, + gt_labels_list, + device, + ): + """ + Assign target for a batch of images. + :param batch_size: num of images in one batch + :param featmap_sizes: A list of all grid cell boxes in all image + :param gt_bboxes_list: A list of ground truth boxes in all image + :param gt_bboxes_ignore_list: A list of all ignored boxes in all image + :param gt_labels_list: A list of all ground truth label in all image + :param device: pytorch device + :return: Assign results of all images. + """ + batch_size = cls_preds.shape[0] + # get grid cells of one image + multi_level_grid_cells = [ + self.get_grid_cells( + featmap_sizes[i], + self.grid_cell_scale, + stride, + dtype=torch.float32, + device=device, + ) + for i, stride in enumerate(self.strides) + ] + mlvl_grid_cells_list = [multi_level_grid_cells for i in range(batch_size)] + + # pixel cell number of multi-level feature maps + num_level_cells = [grid_cells.size(0) for grid_cells in mlvl_grid_cells_list[0]] + num_level_cells_list = [num_level_cells] * batch_size + # concat all level cells and to a single tensor + for i in range(batch_size): + mlvl_grid_cells_list[i] = torch.cat(mlvl_grid_cells_list[i]) + # compute targets for each image + if gt_bboxes_ignore_list is None: + gt_bboxes_ignore_list = [None for _ in range(batch_size)] + if gt_labels_list is None: + gt_labels_list = [None for _ in range(batch_size)] + # target assign on all images, get list of tensors + # list length = batch size + # tensor first dim = num of all grid cell + ( + all_grid_cells, + all_labels, + all_label_weights, + all_bbox_targets, + all_bbox_weights, + pos_inds_list, + neg_inds_list, + ) = multi_apply( + self.target_assign_single_img, + mlvl_grid_cells_list, + num_level_cells_list, + gt_bboxes_list, + gt_bboxes_ignore_list, + gt_labels_list, + ) + # no valid cells + if any([labels is None for labels in all_labels]): + return None + # sampled cells of all images + num_total_pos = sum([max(inds.numel(), 1) for inds in pos_inds_list]) + num_total_neg = sum([max(inds.numel(), 1) for inds in neg_inds_list]) + # merge list of targets tensors into one batch then split to multi levels + mlvl_cls_preds = images_to_levels([c for c in cls_preds], num_level_cells) + mlvl_reg_preds = images_to_levels([r for r in reg_preds], num_level_cells) + mlvl_grid_cells = images_to_levels(all_grid_cells, num_level_cells) + mlvl_labels = images_to_levels(all_labels, num_level_cells) + mlvl_label_weights = images_to_levels(all_label_weights, num_level_cells) + mlvl_bbox_targets = images_to_levels(all_bbox_targets, num_level_cells) + mlvl_bbox_weights = images_to_levels(all_bbox_weights, num_level_cells) + return ( + mlvl_cls_preds, + mlvl_reg_preds, + mlvl_grid_cells, + mlvl_labels, + mlvl_label_weights, + mlvl_bbox_targets, + mlvl_bbox_weights, + num_total_pos, + num_total_neg, + ) + + def target_assign_single_img( + self, grid_cells, num_level_cells, gt_bboxes, gt_bboxes_ignore, gt_labels + ): + """ + Using ATSS Assigner to assign target on one image. + :param grid_cells: Grid cell boxes of all pixels on feature map + :param num_level_cells: numbers of grid cells on each level's feature map + :param gt_bboxes: Ground truth boxes + :param gt_bboxes_ignore: Ground truths which are ignored + :param gt_labels: Ground truth labels + :return: Assign results of a single image + """ + device = grid_cells.device + gt_bboxes = torch.from_numpy(gt_bboxes).to(device) + gt_labels = torch.from_numpy(gt_labels).to(device) + + assign_result = self.assigner.assign( + grid_cells, num_level_cells, gt_bboxes, gt_bboxes_ignore, gt_labels + ) + + pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds = self.sample( + assign_result, gt_bboxes + ) + + num_cells = grid_cells.shape[0] + bbox_targets = torch.zeros_like(grid_cells) + bbox_weights = torch.zeros_like(grid_cells) + labels = grid_cells.new_full((num_cells,), self.num_classes, dtype=torch.long) + label_weights = grid_cells.new_zeros(num_cells, dtype=torch.float) + + if len(pos_inds) > 0: + pos_bbox_targets = pos_gt_bboxes + bbox_targets[pos_inds, :] = pos_bbox_targets + bbox_weights[pos_inds, :] = 1.0 + if gt_labels is None: + # Only rpn gives gt_labels as None + # Foreground is the first class + labels[pos_inds] = 0 + else: + labels[pos_inds] = gt_labels[pos_assigned_gt_inds] + + label_weights[pos_inds] = 1.0 + if len(neg_inds) > 0: + label_weights[neg_inds] = 1.0 + + return ( + grid_cells, + labels, + label_weights, + bbox_targets, + bbox_weights, + pos_inds, + neg_inds, + ) + + def sample(self, assign_result, gt_bboxes): + pos_inds = ( + torch.nonzero(assign_result.gt_inds > 0, as_tuple=False) + .squeeze(-1) + .unique() + ) + neg_inds = ( + torch.nonzero(assign_result.gt_inds == 0, as_tuple=False) + .squeeze(-1) + .unique() + ) + pos_assigned_gt_inds = assign_result.gt_inds[pos_inds] - 1 + + if gt_bboxes.numel() == 0: + # hack for index error case + assert pos_assigned_gt_inds.numel() == 0 + pos_gt_bboxes = torch.empty_like(gt_bboxes).view(-1, 4) + else: + if len(gt_bboxes.shape) < 2: + gt_bboxes = gt_bboxes.view(-1, 4) + pos_gt_bboxes = gt_bboxes[pos_assigned_gt_inds, :] + return pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds + + def post_process(self, preds, meta): + cls_scores, bbox_preds = preds.split( + [self.num_classes, 4 * (self.reg_max + 1)], dim=-1 + ) + result_list = self.get_bboxes(cls_scores, bbox_preds, meta) + det_results = {} + warp_matrixes = ( + meta["warp_matrix"] + if isinstance(meta["warp_matrix"], list) + else meta["warp_matrix"] + ) + img_heights = ( + meta["img_info"]["height"].cpu().numpy() + if isinstance(meta["img_info"]["height"], torch.Tensor) + else meta["img_info"]["height"] + ) + img_widths = ( + meta["img_info"]["width"].cpu().numpy() + if isinstance(meta["img_info"]["width"], torch.Tensor) + else meta["img_info"]["width"] + ) + img_ids = ( + meta["img_info"]["id"].cpu().numpy() + if isinstance(meta["img_info"]["id"], torch.Tensor) + else meta["img_info"]["id"] + ) + + for result, img_width, img_height, img_id, warp_matrix in zip( + result_list, img_widths, img_heights, img_ids, warp_matrixes + ): + det_result = {} + det_bboxes, det_labels = result + det_bboxes = det_bboxes.detach().cpu().numpy() + det_bboxes[:, :4] = warp_boxes( + det_bboxes[:, :4], np.linalg.inv(warp_matrix), img_width, img_height + ) + classes = det_labels.detach().cpu().numpy() + for i in range(self.num_classes): + inds = classes == i + det_result[i] = np.concatenate( + [ + det_bboxes[inds, :4].astype(np.float32), + det_bboxes[inds, 4:5].astype(np.float32), + ], + axis=1, + ).tolist() + det_results[img_id] = det_result + return det_results + + def show_result( + self, img, dets, class_names, score_thres=0.3, show=True, save_path=None + ): + result = overlay_bbox_cv(img, dets, class_names, score_thresh=score_thres) + if show: + cv2.imshow("det", result) + return result + + def get_bboxes(self, cls_preds, reg_preds, img_metas): + """Decode the outputs to bboxes. + Args: + cls_preds (Tensor): Shape (num_imgs, num_points, num_classes). + reg_preds (Tensor): Shape (num_imgs, num_points, 4 * (regmax + 1)). + img_metas (dict): Dict of image info. + + Returns: + results_list (list[tuple]): List of detection bboxes and labels. + """ + device = cls_preds.device + b = cls_preds.shape[0] + input_height, input_width = img_metas["img"].shape[2:] + input_shape = (input_height, input_width) + + featmap_sizes = [ + (math.ceil(input_height / stride), math.ceil(input_width) / stride) + for stride in self.strides + ] + # get grid cells of one image + mlvl_center_priors = [] + for i, stride in enumerate(self.strides): + y, x = self.get_single_level_center_point( + featmap_sizes[i], stride, torch.float32, device + ) + strides = x.new_full((x.shape[0],), stride) + proiors = torch.stack([x, y, strides, strides], dim=-1) + mlvl_center_priors.append(proiors.unsqueeze(0).repeat(b, 1, 1)) + + center_priors = torch.cat(mlvl_center_priors, dim=1) + dis_preds = self.distribution_project(reg_preds) * center_priors[..., 2, None] + bboxes = distance2bbox(center_priors[..., :2], dis_preds, max_shape=input_shape) + scores = cls_preds.sigmoid() + result_list = [] + for i in range(b): + # add a dummy background class at the end of all labels + # same with mmdetection2.0 + score, bbox = scores[i], bboxes[i] + padding = score.new_zeros(score.shape[0], 1) + score = torch.cat([score, padding], dim=1) + results = multiclass_nms( + bbox, + score, + score_thr=0.05, + nms_cfg=dict(type="nms", iou_threshold=0.6), + max_num=100, + ) + result_list.append(results) + return result_list + + def get_single_level_center_point( + self, featmap_size, stride, dtype, device, flatten=True + ): + """ + Generate pixel centers of a single stage feature map. + :param featmap_size: height and width of the feature map + :param stride: down sample stride of the feature map + :param dtype: data type of the tensors + :param device: device of the tensors + :param flatten: flatten the x and y tensors + :return: y and x of the center points + """ + h, w = featmap_size + x_range = (torch.arange(w, dtype=dtype, device=device) + 0.5) * stride + y_range = (torch.arange(h, dtype=dtype, device=device) + 0.5) * stride + y, x = torch.meshgrid(y_range, x_range) + if flatten: + y = y.flatten() + x = x.flatten() + return y, x + + def get_grid_cells(self, featmap_size, scale, stride, dtype, device): + """ + Generate grid cells of a feature map for target assignment. + :param featmap_size: Size of a single level feature map. + :param scale: Grid cell scale. + :param stride: Down sample stride of the feature map. + :param dtype: Data type of the tensors. + :param device: Device of the tensors. + :return: Grid_cells xyxy position. Size should be [feat_w * feat_h, 4] + """ + cell_size = stride * scale + y, x = self.get_single_level_center_point( + featmap_size, stride, dtype, device, flatten=True + ) + grid_cells = torch.stack( + [ + x - 0.5 * cell_size, + y - 0.5 * cell_size, + x + 0.5 * cell_size, + y + 0.5 * cell_size, + ], + dim=-1, + ) + return grid_cells + + def grid_cells_to_center(self, grid_cells): + """ + Get center location of each gird cell + :param grid_cells: grid cells of a feature map + :return: center points + """ + cells_cx = (grid_cells[:, 2] + grid_cells[:, 0]) / 2 + cells_cy = (grid_cells[:, 3] + grid_cells[:, 1]) / 2 + return torch.stack([cells_cx, cells_cy], dim=-1) + + def _forward_onnx(self, feats): + """only used for onnx export""" + outputs = [] + for x, scale in zip(feats, self.scales): + cls_feat = x + reg_feat = x + for cls_conv in self.cls_convs: + cls_feat = cls_conv(cls_feat) + for reg_conv in self.reg_convs: + reg_feat = reg_conv(reg_feat) + cls_pred = self.gfl_cls(cls_feat) + reg_pred = scale(self.gfl_reg(reg_feat)) + cls_pred = cls_pred.sigmoid() + out = torch.cat([cls_pred, reg_pred], dim=1) + outputs.append(out.flatten(start_dim=2)) + return torch.cat(outputs, dim=2).permute(0, 2, 1) diff --git a/nanodet/model/head/nanodet_head.py b/nanodet/model/head/nanodet_head.py new file mode 100644 index 0000000..8e145d6 --- /dev/null +++ b/nanodet/model/head/nanodet_head.py @@ -0,0 +1,185 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch +import torch.nn as nn + +from ..module.conv import ConvModule, DepthwiseConvModule +from ..module.init_weights import normal_init +from .gfl_head import GFLHead + + +class NanoDetHead(GFLHead): + """ + Modified from GFL, use same loss functions but much lightweight convolution heads + """ + + def __init__( + self, + num_classes, + loss, + input_channel, + stacked_convs=2, + octave_base_scale=5, + conv_type="DWConv", + conv_cfg=None, + norm_cfg=dict(type="BN"), + reg_max=16, + share_cls_reg=False, + activation="LeakyReLU", + feat_channels=256, + strides=[8, 16, 32], + **kwargs + ): + self.share_cls_reg = share_cls_reg + self.activation = activation + self.ConvModule = ConvModule if conv_type == "Conv" else DepthwiseConvModule + super(NanoDetHead, self).__init__( + num_classes, + loss, + input_channel, + feat_channels, + stacked_convs, + octave_base_scale, + strides, + conv_cfg, + norm_cfg, + reg_max, + **kwargs + ) + + def _init_layers(self): + self.cls_convs = nn.ModuleList() + self.reg_convs = nn.ModuleList() + for _ in self.strides: + cls_convs, reg_convs = self._buid_not_shared_head() + self.cls_convs.append(cls_convs) + self.reg_convs.append(reg_convs) + + self.gfl_cls = nn.ModuleList( + [ + nn.Conv2d( + self.feat_channels, + self.cls_out_channels + 4 * (self.reg_max + 1) + if self.share_cls_reg + else self.cls_out_channels, + 1, + padding=0, + ) + for _ in self.strides + ] + ) + # TODO: if + self.gfl_reg = nn.ModuleList( + [ + nn.Conv2d(self.feat_channels, 4 * (self.reg_max + 1), 1, padding=0) + for _ in self.strides + ] + ) + + def _buid_not_shared_head(self): + cls_convs = nn.ModuleList() + reg_convs = nn.ModuleList() + for i in range(self.stacked_convs): + chn = self.in_channels if i == 0 else self.feat_channels + cls_convs.append( + self.ConvModule( + chn, + self.feat_channels, + 3, + stride=1, + padding=1, + norm_cfg=self.norm_cfg, + bias=self.norm_cfg is None, + activation=self.activation, + ) + ) + if not self.share_cls_reg: + reg_convs.append( + self.ConvModule( + chn, + self.feat_channels, + 3, + stride=1, + padding=1, + norm_cfg=self.norm_cfg, + bias=self.norm_cfg is None, + activation=self.activation, + ) + ) + + return cls_convs, reg_convs + + def init_weights(self): + for m in self.cls_convs.modules(): + if isinstance(m, nn.Conv2d): + normal_init(m, std=0.01) + for m in self.reg_convs.modules(): + if isinstance(m, nn.Conv2d): + normal_init(m, std=0.01) + # init cls head with confidence = 0.01 + bias_cls = -4.595 + for i in range(len(self.strides)): + normal_init(self.gfl_cls[i], std=0.01, bias=bias_cls) + normal_init(self.gfl_reg[i], std=0.01) + print("Finish initialize NanoDet Head.") + + def forward(self, feats): + if torch.onnx.is_in_onnx_export(): + return self._forward_onnx(feats) + outputs = [] + for x, cls_convs, reg_convs, gfl_cls, gfl_reg in zip( + feats, self.cls_convs, self.reg_convs, self.gfl_cls, self.gfl_reg + ): + cls_feat = x + reg_feat = x + for cls_conv in cls_convs: + cls_feat = cls_conv(cls_feat) + for reg_conv in reg_convs: + reg_feat = reg_conv(reg_feat) + if self.share_cls_reg: + output = gfl_cls(cls_feat) + else: + cls_score = gfl_cls(cls_feat) + bbox_pred = gfl_reg(reg_feat) + output = torch.cat([cls_score, bbox_pred], dim=1) + outputs.append(output.flatten(start_dim=2)) + outputs = torch.cat(outputs, dim=2).permute(0, 2, 1) + return outputs + + def _forward_onnx(self, feats): + """only used for onnx export""" + outputs = [] + for x, cls_convs, reg_convs, gfl_cls, gfl_reg in zip( + feats, self.cls_convs, self.reg_convs, self.gfl_cls, self.gfl_reg + ): + cls_feat = x + reg_feat = x + for cls_conv in cls_convs: + cls_feat = cls_conv(cls_feat) + for reg_conv in reg_convs: + reg_feat = reg_conv(reg_feat) + if self.share_cls_reg: + output = gfl_cls(cls_feat) + cls_pred, reg_pred = output.split( + [self.num_classes, 4 * (self.reg_max + 1)], dim=1 + ) + else: + cls_pred = gfl_cls(cls_feat) + reg_pred = gfl_reg(reg_feat) + + cls_pred = cls_pred.sigmoid() + out = torch.cat([cls_pred, reg_pred], dim=1) + outputs.append(out.flatten(start_dim=2)) + return torch.cat(outputs, dim=2).permute(0, 2, 1) diff --git a/nanodet/model/head/nanodet_plus_head.py b/nanodet/model/head/nanodet_plus_head.py new file mode 100644 index 0000000..94bdf01 --- /dev/null +++ b/nanodet/model/head/nanodet_plus_head.py @@ -0,0 +1,518 @@ +import math + +import cv2 +import numpy as np +import torch +import torch.nn as nn + +from nanodet.util import bbox2distance, distance2bbox, multi_apply, overlay_bbox_cv + +from ...data.transform.warp import warp_boxes +from ..loss.gfocal_loss import DistributionFocalLoss, QualityFocalLoss +from ..loss.iou_loss import GIoULoss +from ..module.conv import ConvModule, DepthwiseConvModule +from ..module.init_weights import normal_init +from ..module.nms import multiclass_nms +from .assigner.dsl_assigner import DynamicSoftLabelAssigner +from .gfl_head import Integral, reduce_mean + + +class NanoDetPlusHead(nn.Module): + """Detection head used in NanoDet-Plus. + + Args: + num_classes (int): Number of categories excluding the background + category. + loss (dict): Loss config. + input_channel (int): Number of channels of the input feature. + feat_channels (int): Number of channels of the feature. + Default: 96. + stacked_convs (int): Number of conv layers in the stacked convs. + Default: 2. + kernel_size (int): Size of the convolving kernel. Default: 5. + strides (list[int]): Strides of input multi-level feature maps. + Default: [8, 16, 32]. + conv_type (str): Type of the convolution. + Default: "DWConv". + norm_cfg (dict): Dictionary to construct and config norm layer. + Default: dict(type='BN'). + reg_max (int): The maximal value of the discrete set. Default: 7. + activation (str): Type of activation function. Default: "LeakyReLU". + assigner_cfg (dict): Config dict of the assigner. Default: dict(topk=13). + """ + + def __init__( + self, + num_classes, + loss, + input_channel, + feat_channels=96, + stacked_convs=2, + kernel_size=5, + strides=[8, 16, 32], + conv_type="DWConv", + norm_cfg=dict(type="BN"), + reg_max=7, + activation="LeakyReLU", + assigner_cfg=dict(topk=13), + **kwargs + ): + super(NanoDetPlusHead, self).__init__() + self.num_classes = num_classes + self.in_channels = input_channel + self.feat_channels = feat_channels + self.stacked_convs = stacked_convs + self.kernel_size = kernel_size + self.strides = strides + self.reg_max = reg_max + self.activation = activation + self.ConvModule = ConvModule if conv_type == "Conv" else DepthwiseConvModule + + self.loss_cfg = loss + self.norm_cfg = norm_cfg + + self.assigner = DynamicSoftLabelAssigner(**assigner_cfg) + self.distribution_project = Integral(self.reg_max) + + self.loss_qfl = QualityFocalLoss( + beta=self.loss_cfg.loss_qfl.beta, + loss_weight=self.loss_cfg.loss_qfl.loss_weight, + ) + self.loss_dfl = DistributionFocalLoss( + loss_weight=self.loss_cfg.loss_dfl.loss_weight + ) + self.loss_bbox = GIoULoss(loss_weight=self.loss_cfg.loss_bbox.loss_weight) + self._init_layers() + self.init_weights() + + def _init_layers(self): + self.cls_convs = nn.ModuleList() + for _ in self.strides: + cls_convs = self._buid_not_shared_head() + self.cls_convs.append(cls_convs) + + self.gfl_cls = nn.ModuleList( + [ + nn.Conv2d( + self.feat_channels, + self.num_classes + 4 * (self.reg_max + 1), + 1, + padding=0, + ) + for _ in self.strides + ] + ) + + def _buid_not_shared_head(self): + cls_convs = nn.ModuleList() + for i in range(self.stacked_convs): + chn = self.in_channels if i == 0 else self.feat_channels + cls_convs.append( + self.ConvModule( + chn, + self.feat_channels, + self.kernel_size, + stride=1, + padding=self.kernel_size // 2, + norm_cfg=self.norm_cfg, + bias=self.norm_cfg is None, + activation=self.activation, + ) + ) + return cls_convs + + def init_weights(self): + for m in self.cls_convs.modules(): + if isinstance(m, nn.Conv2d): + normal_init(m, std=0.01) + # init cls head with confidence = 0.01 + bias_cls = -4.595 + for i in range(len(self.strides)): + normal_init(self.gfl_cls[i], std=0.01, bias=bias_cls) + print("Finish initialize NanoDet-Plus Head.") + + def forward(self, feats): + if torch.onnx.is_in_onnx_export(): + return self._forward_onnx(feats) + outputs = [] + for feat, cls_convs, gfl_cls in zip( + feats, + self.cls_convs, + self.gfl_cls, + ): + for conv in cls_convs: + feat = conv(feat) + output = gfl_cls(feat) + outputs.append(output.flatten(start_dim=2)) + outputs = torch.cat(outputs, dim=2).permute(0, 2, 1) + return outputs + + def loss(self, preds, gt_meta, aux_preds=None): + """Compute losses. + Args: + preds (Tensor): Prediction output. + gt_meta (dict): Ground truth information. + aux_preds (tuple[Tensor], optional): Auxiliary head prediction output. + + Returns: + loss (Tensor): Loss tensor. + loss_states (dict): State dict of each loss. + """ + gt_bboxes = gt_meta["gt_bboxes"] + gt_labels = gt_meta["gt_labels"] + device = preds.device + batch_size = preds.shape[0] + input_height, input_width = gt_meta["img"].shape[2:] + featmap_sizes = [ + (math.ceil(input_height / stride), math.ceil(input_width) / stride) + for stride in self.strides + ] + # get grid cells of one image + mlvl_center_priors = [ + self.get_single_level_center_priors( + batch_size, + featmap_sizes[i], + stride, + dtype=torch.float32, + device=device, + ) + for i, stride in enumerate(self.strides) + ] + center_priors = torch.cat(mlvl_center_priors, dim=1) + + cls_preds, reg_preds = preds.split( + [self.num_classes, 4 * (self.reg_max + 1)], dim=-1 + ) + dis_preds = self.distribution_project(reg_preds) * center_priors[..., 2, None] + decoded_bboxes = distance2bbox(center_priors[..., :2], dis_preds) + + if aux_preds is not None: + # use auxiliary head to assign + aux_cls_preds, aux_reg_preds = aux_preds.split( + [self.num_classes, 4 * (self.reg_max + 1)], dim=-1 + ) + aux_dis_preds = ( + self.distribution_project(aux_reg_preds) * center_priors[..., 2, None] + ) + aux_decoded_bboxes = distance2bbox(center_priors[..., :2], aux_dis_preds) + batch_assign_res = multi_apply( + self.target_assign_single_img, + aux_cls_preds.detach(), + center_priors, + aux_decoded_bboxes.detach(), + gt_bboxes, + gt_labels, + ) + else: + # use self prediction to assign + batch_assign_res = multi_apply( + self.target_assign_single_img, + cls_preds.detach(), + center_priors, + decoded_bboxes.detach(), + gt_bboxes, + gt_labels, + ) + + loss, loss_states = self._get_loss_from_assign( + cls_preds, reg_preds, decoded_bboxes, batch_assign_res + ) + + if aux_preds is not None: + aux_loss, aux_loss_states = self._get_loss_from_assign( + aux_cls_preds, aux_reg_preds, aux_decoded_bboxes, batch_assign_res + ) + loss = loss + aux_loss + for k, v in aux_loss_states.items(): + loss_states["aux_" + k] = v + return loss, loss_states + + def _get_loss_from_assign(self, cls_preds, reg_preds, decoded_bboxes, assign): + device = cls_preds.device + labels, label_scores, bbox_targets, dist_targets, num_pos = assign + num_total_samples = max( + reduce_mean(torch.tensor(sum(num_pos)).to(device)).item(), 1.0 + ) + + labels = torch.cat(labels, dim=0) + label_scores = torch.cat(label_scores, dim=0) + bbox_targets = torch.cat(bbox_targets, dim=0) + cls_preds = cls_preds.reshape(-1, self.num_classes) + reg_preds = reg_preds.reshape(-1, 4 * (self.reg_max + 1)) + decoded_bboxes = decoded_bboxes.reshape(-1, 4) + loss_qfl = self.loss_qfl( + cls_preds, (labels, label_scores), avg_factor=num_total_samples + ) + + pos_inds = torch.nonzero( + (labels >= 0) & (labels < self.num_classes), as_tuple=False + ).squeeze(1) + + if len(pos_inds) > 0: + weight_targets = cls_preds[pos_inds].detach().sigmoid().max(dim=1)[0] + bbox_avg_factor = max(reduce_mean(weight_targets.sum()).item(), 1.0) + + loss_bbox = self.loss_bbox( + decoded_bboxes[pos_inds], + bbox_targets[pos_inds], + weight=weight_targets, + avg_factor=bbox_avg_factor, + ) + + dist_targets = torch.cat(dist_targets, dim=0) + loss_dfl = self.loss_dfl( + reg_preds[pos_inds].reshape(-1, self.reg_max + 1), + dist_targets[pos_inds].reshape(-1), + weight=weight_targets[:, None].expand(-1, 4).reshape(-1), + avg_factor=4.0 * bbox_avg_factor, + ) + else: + loss_bbox = reg_preds.sum() * 0 + loss_dfl = reg_preds.sum() * 0 + + loss = loss_qfl + loss_bbox + loss_dfl + loss_states = dict(loss_qfl=loss_qfl, loss_bbox=loss_bbox, loss_dfl=loss_dfl) + return loss, loss_states + + @torch.no_grad() + def target_assign_single_img( + self, cls_preds, center_priors, decoded_bboxes, gt_bboxes, gt_labels + ): + """Compute classification, regression, and objectness targets for + priors in a single image. + Args: + cls_preds (Tensor): Classification predictions of one image, + a 2D-Tensor with shape [num_priors, num_classes] + center_priors (Tensor): All priors of one image, a 2D-Tensor with + shape [num_priors, 4] in [cx, xy, stride_w, stride_y] format. + decoded_bboxes (Tensor): Decoded bboxes predictions of one image, + a 2D-Tensor with shape [num_priors, 4] in [tl_x, tl_y, + br_x, br_y] format. + gt_bboxes (Tensor): Ground truth bboxes of one image, a 2D-Tensor + with shape [num_gts, 4] in [tl_x, tl_y, br_x, br_y] format. + gt_labels (Tensor): Ground truth labels of one image, a Tensor + with shape [num_gts]. + """ + + num_priors = center_priors.size(0) + device = center_priors.device + gt_bboxes = torch.from_numpy(gt_bboxes).to(device) + gt_labels = torch.from_numpy(gt_labels).to(device) + num_gts = gt_labels.size(0) + gt_bboxes = gt_bboxes.to(decoded_bboxes.dtype) + + bbox_targets = torch.zeros_like(center_priors) + dist_targets = torch.zeros_like(center_priors) + labels = center_priors.new_full( + (num_priors,), self.num_classes, dtype=torch.long + ) + label_scores = center_priors.new_zeros(labels.shape, dtype=torch.float) + # No target + if num_gts == 0: + return labels, label_scores, bbox_targets, dist_targets, 0 + + assign_result = self.assigner.assign( + cls_preds.sigmoid(), center_priors, decoded_bboxes, gt_bboxes, gt_labels + ) + pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds = self.sample( + assign_result, gt_bboxes + ) + num_pos_per_img = pos_inds.size(0) + pos_ious = assign_result.max_overlaps[pos_inds] + + if len(pos_inds) > 0: + bbox_targets[pos_inds, :] = pos_gt_bboxes + dist_targets[pos_inds, :] = ( + bbox2distance(center_priors[pos_inds, :2], pos_gt_bboxes) + / center_priors[pos_inds, None, 2] + ) + dist_targets = dist_targets.clamp(min=0, max=self.reg_max - 0.1) + labels[pos_inds] = gt_labels[pos_assigned_gt_inds] + label_scores[pos_inds] = pos_ious + return ( + labels, + label_scores, + bbox_targets, + dist_targets, + num_pos_per_img, + ) + + def sample(self, assign_result, gt_bboxes): + """Sample positive and negative bboxes.""" + pos_inds = ( + torch.nonzero(assign_result.gt_inds > 0, as_tuple=False) + .squeeze(-1) + .unique() + ) + neg_inds = ( + torch.nonzero(assign_result.gt_inds == 0, as_tuple=False) + .squeeze(-1) + .unique() + ) + pos_assigned_gt_inds = assign_result.gt_inds[pos_inds] - 1 + + if gt_bboxes.numel() == 0: + # hack for index error case + assert pos_assigned_gt_inds.numel() == 0 + pos_gt_bboxes = torch.empty_like(gt_bboxes).view(-1, 4) + else: + if len(gt_bboxes.shape) < 2: + gt_bboxes = gt_bboxes.view(-1, 4) + pos_gt_bboxes = gt_bboxes[pos_assigned_gt_inds, :] + return pos_inds, neg_inds, pos_gt_bboxes, pos_assigned_gt_inds + + def post_process(self, preds, meta): + """Prediction results post processing. Decode bboxes and rescale + to original image size. + Args: + preds (Tensor): Prediction output. + meta (dict): Meta info. + """ + cls_scores, bbox_preds = preds.split( + [self.num_classes, 4 * (self.reg_max + 1)], dim=-1 + ) + result_list = self.get_bboxes(cls_scores, bbox_preds, meta) + det_results = {} + warp_matrixes = ( + meta["warp_matrix"] + if isinstance(meta["warp_matrix"], list) + else meta["warp_matrix"] + ) + img_heights = ( + meta["img_info"]["height"].cpu().numpy() + if isinstance(meta["img_info"]["height"], torch.Tensor) + else meta["img_info"]["height"] + ) + img_widths = ( + meta["img_info"]["width"].cpu().numpy() + if isinstance(meta["img_info"]["width"], torch.Tensor) + else meta["img_info"]["width"] + ) + img_ids = ( + meta["img_info"]["id"].cpu().numpy() + if isinstance(meta["img_info"]["id"], torch.Tensor) + else meta["img_info"]["id"] + ) + + for result, img_width, img_height, img_id, warp_matrix in zip( + result_list, img_widths, img_heights, img_ids, warp_matrixes + ): + det_result = {} + det_bboxes, det_labels = result + det_bboxes = det_bboxes.detach().cpu().numpy() + det_bboxes[:, :4] = warp_boxes( + det_bboxes[:, :4], np.linalg.inv(warp_matrix), img_width, img_height + ) + classes = det_labels.detach().cpu().numpy() + for i in range(self.num_classes): + inds = classes == i + det_result[i] = np.concatenate( + [ + det_bboxes[inds, :4].astype(np.float32), + det_bboxes[inds, 4:5].astype(np.float32), + ], + axis=1, + ).tolist() + det_results[img_id] = det_result + return det_results + + def show_result( + self, img, dets, class_names, score_thres=0.3, show=True, save_path=None + ): + result, all_box = overlay_bbox_cv(img, dets, class_names, score_thresh=score_thres) + # if show: + # cv2.imshow("det", result) + return result,all_box + + def get_bboxes(self, cls_preds, reg_preds, img_metas): + """Decode the outputs to bboxes. + Args: + cls_preds (Tensor): Shape (num_imgs, num_points, num_classes). + reg_preds (Tensor): Shape (num_imgs, num_points, 4 * (regmax + 1)). + img_metas (dict): Dict of image info. + + Returns: + results_list (list[tuple]): List of detection bboxes and labels. + """ + device = cls_preds.device + b = cls_preds.shape[0] + input_height, input_width = img_metas["img"].shape[2:] + input_shape = (input_height, input_width) + + featmap_sizes = [ + (math.ceil(input_height / stride), math.ceil(input_width) / stride) + for stride in self.strides + ] + # get grid cells of one image + mlvl_center_priors = [ + self.get_single_level_center_priors( + b, + featmap_sizes[i], + stride, + dtype=torch.float32, + device=device, + ) + for i, stride in enumerate(self.strides) + ] + center_priors = torch.cat(mlvl_center_priors, dim=1) + dis_preds = self.distribution_project(reg_preds) * center_priors[..., 2, None] + bboxes = distance2bbox(center_priors[..., :2], dis_preds, max_shape=input_shape) + scores = cls_preds.sigmoid() + result_list = [] + for i in range(b): + # add a dummy background class at the end of all labels + # same with mmdetection2.0 + score, bbox = scores[i], bboxes[i] + padding = score.new_zeros(score.shape[0], 1) + score = torch.cat([score, padding], dim=1) + results = multiclass_nms( + bbox, + score, + score_thr=0.05, + nms_cfg=dict(type="nms", iou_threshold=0.6), + max_num=100, + ) + result_list.append(results) + return result_list + + def get_single_level_center_priors( + self, batch_size, featmap_size, stride, dtype, device + ): + """Generate centers of a single stage feature map. + Args: + batch_size (int): Number of images in one batch. + featmap_size (tuple[int]): height and width of the feature map + stride (int): down sample stride of the feature map + dtype (obj:`torch.dtype`): data type of the tensors + device (obj:`torch.device`): device of the tensors + Return: + priors (Tensor): center priors of a single level feature map. + """ + h, w = featmap_size + x_range = (torch.arange(w, dtype=dtype, device=device)) * stride + y_range = (torch.arange(h, dtype=dtype, device=device)) * stride + y, x = torch.meshgrid(y_range, x_range) + y = y.flatten() + x = x.flatten() + strides = x.new_full((x.shape[0],), stride) + proiors = torch.stack([x, y, strides, strides], dim=-1) + return proiors.unsqueeze(0).repeat(batch_size, 1, 1) + + def _forward_onnx(self, feats): + """only used for onnx export""" + outputs = [] + for feat, cls_convs, gfl_cls in zip( + feats, + self.cls_convs, + self.gfl_cls, + ): + for conv in cls_convs: + feat = conv(feat) + output = gfl_cls(feat) + cls_pred, reg_pred = output.split( + [self.num_classes, 4 * (self.reg_max + 1)], dim=1 + ) + cls_pred = cls_pred.sigmoid() + out = torch.cat([cls_pred, reg_pred], dim=1) + outputs.append(out.flatten(start_dim=2)) + return torch.cat(outputs, dim=2).permute(0, 2, 1) diff --git a/nanodet/model/head/simple_conv_head.py b/nanodet/model/head/simple_conv_head.py new file mode 100644 index 0000000..cece6d8 --- /dev/null +++ b/nanodet/model/head/simple_conv_head.py @@ -0,0 +1,100 @@ +import torch +import torch.nn as nn + +from ..module.conv import ConvModule +from ..module.init_weights import normal_init +from ..module.scale import Scale + + +class SimpleConvHead(nn.Module): + def __init__( + self, + num_classes, + input_channel, + feat_channels=256, + stacked_convs=4, + strides=[8, 16, 32], + conv_cfg=None, + norm_cfg=dict(type="GN", num_groups=32, requires_grad=True), + activation="LeakyReLU", + reg_max=16, + **kwargs + ): + super(SimpleConvHead, self).__init__() + self.num_classes = num_classes + self.in_channels = input_channel + self.feat_channels = feat_channels + self.stacked_convs = stacked_convs + self.strides = strides + self.reg_max = reg_max + + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.activation = activation + self.cls_out_channels = num_classes + + self._init_layers() + self.init_weights() + + def _init_layers(self): + self.relu = nn.ReLU(inplace=True) + self.cls_convs = nn.ModuleList() + self.reg_convs = nn.ModuleList() + for i in range(self.stacked_convs): + chn = self.in_channels if i == 0 else self.feat_channels + self.cls_convs.append( + ConvModule( + chn, + self.feat_channels, + 3, + stride=1, + padding=1, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + activation=self.activation, + ) + ) + self.reg_convs.append( + ConvModule( + chn, + self.feat_channels, + 3, + stride=1, + padding=1, + conv_cfg=self.conv_cfg, + norm_cfg=self.norm_cfg, + activation=self.activation, + ) + ) + self.gfl_cls = nn.Conv2d( + self.feat_channels, self.cls_out_channels, 3, padding=1 + ) + self.gfl_reg = nn.Conv2d( + self.feat_channels, 4 * (self.reg_max + 1), 3, padding=1 + ) + self.scales = nn.ModuleList([Scale(1.0) for _ in self.strides]) + + def init_weights(self): + for m in self.cls_convs: + normal_init(m.conv, std=0.01) + for m in self.reg_convs: + normal_init(m.conv, std=0.01) + bias_cls = -4.595 + normal_init(self.gfl_cls, std=0.01, bias=bias_cls) + normal_init(self.gfl_reg, std=0.01) + + def forward(self, feats): + outputs = [] + for x, scale in zip(feats, self.scales): + cls_feat = x + reg_feat = x + for cls_conv in self.cls_convs: + cls_feat = cls_conv(cls_feat) + for reg_conv in self.reg_convs: + reg_feat = reg_conv(reg_feat) + cls_score = self.gfl_cls(cls_feat) + bbox_pred = scale(self.gfl_reg(reg_feat)).float() + output = torch.cat([cls_score, bbox_pred], dim=1) + outputs.append(output.flatten(start_dim=2)) + outputs = torch.cat(outputs, dim=2).permute(0, 2, 1) + return outputs diff --git a/nanodet/model/loss/gfocal_loss.py b/nanodet/model/loss/gfocal_loss.py new file mode 100644 index 0000000..6759e93 --- /dev/null +++ b/nanodet/model/loss/gfocal_loss.py @@ -0,0 +1,180 @@ +import torch +import torch.nn as nn +import torch.nn.functional as F + +from .utils import weighted_loss + + +@weighted_loss +def quality_focal_loss(pred, target, beta=2.0): + r"""Quality Focal Loss (QFL) is from `Generalized Focal Loss: Learning + Qualified and Distributed Bounding Boxes for Dense Object Detection + `_. + + Args: + pred (torch.Tensor): Predicted joint representation of classification + and quality (IoU) estimation with shape (N, C), C is the number of + classes. + target (tuple([torch.Tensor])): Target category label with shape (N,) + and target quality label with shape (N,). + beta (float): The beta parameter for calculating the modulating factor. + Defaults to 2.0. + + Returns: + torch.Tensor: Loss tensor with shape (N,). + """ + assert ( + len(target) == 2 + ), """target for QFL must be a tuple of two elements, + including category label and quality label, respectively""" + # label denotes the category id, score denotes the quality score + label, score = target + + # negatives are supervised by 0 quality score + pred_sigmoid = pred.sigmoid() + scale_factor = pred_sigmoid + zerolabel = scale_factor.new_zeros(pred.shape) + loss = F.binary_cross_entropy_with_logits( + pred, zerolabel, reduction="none" + ) * scale_factor.pow(beta) + + # FG cat_id: [0, num_classes -1], BG cat_id: num_classes + bg_class_ind = pred.size(1) + pos = torch.nonzero((label >= 0) & (label < bg_class_ind), as_tuple=False).squeeze( + 1 + ) + pos_label = label[pos].long() + # positives are supervised by bbox quality (IoU) score + scale_factor = score[pos] - pred_sigmoid[pos, pos_label] + loss[pos, pos_label] = F.binary_cross_entropy_with_logits( + pred[pos, pos_label], score[pos], reduction="none" + ) * scale_factor.abs().pow(beta) + + loss = loss.sum(dim=1, keepdim=False) + return loss + + +@weighted_loss +def distribution_focal_loss(pred, label): + r"""Distribution Focal Loss (DFL) is from `Generalized Focal Loss: Learning + Qualified and Distributed Bounding Boxes for Dense Object Detection + `_. + + Args: + pred (torch.Tensor): Predicted general distribution of bounding boxes + (before softmax) with shape (N, n+1), n is the max value of the + integral set `{0, ..., n}` in paper. + label (torch.Tensor): Target distance label for bounding boxes with + shape (N,). + + Returns: + torch.Tensor: Loss tensor with shape (N,). + """ + dis_left = label.long() + dis_right = dis_left + 1 + weight_left = dis_right.float() - label + weight_right = label - dis_left.float() + loss = ( + F.cross_entropy(pred, dis_left, reduction="none") * weight_left + + F.cross_entropy(pred, dis_right, reduction="none") * weight_right + ) + return loss + + +class QualityFocalLoss(nn.Module): + r"""Quality Focal Loss (QFL) is a variant of `Generalized Focal Loss: + Learning Qualified and Distributed Bounding Boxes for Dense Object + Detection `_. + + Args: + use_sigmoid (bool): Whether sigmoid operation is conducted in QFL. + Defaults to True. + beta (float): The beta parameter for calculating the modulating factor. + Defaults to 2.0. + reduction (str): Options are "none", "mean" and "sum". + loss_weight (float): Loss weight of current loss. + """ + + def __init__(self, use_sigmoid=True, beta=2.0, reduction="mean", loss_weight=1.0): + super(QualityFocalLoss, self).__init__() + assert use_sigmoid is True, "Only sigmoid in QFL supported now." + self.use_sigmoid = use_sigmoid + self.beta = beta + self.reduction = reduction + self.loss_weight = loss_weight + + def forward( + self, pred, target, weight=None, avg_factor=None, reduction_override=None + ): + """Forward function. + + Args: + pred (torch.Tensor): Predicted joint representation of + classification and quality (IoU) estimation with shape (N, C), + C is the number of classes. + target (tuple([torch.Tensor])): Target category label with shape + (N,) and target quality label with shape (N,). + weight (torch.Tensor, optional): The weight of loss for each + prediction. Defaults to None. + avg_factor (int, optional): Average factor that is used to average + the loss. Defaults to None. + reduction_override (str, optional): The reduction method used to + override the original reduction method of the loss. + Defaults to None. + """ + assert reduction_override in (None, "none", "mean", "sum") + reduction = reduction_override if reduction_override else self.reduction + if self.use_sigmoid: + loss_cls = self.loss_weight * quality_focal_loss( + pred, + target, + weight, + beta=self.beta, + reduction=reduction, + avg_factor=avg_factor, + ) + else: + raise NotImplementedError + return loss_cls + + +class DistributionFocalLoss(nn.Module): + r"""Distribution Focal Loss (DFL) is a variant of `Generalized Focal Loss: + Learning Qualified and Distributed Bounding Boxes for Dense Object + Detection `_. + + Args: + reduction (str): Options are `'none'`, `'mean'` and `'sum'`. + loss_weight (float): Loss weight of current loss. + """ + + def __init__(self, reduction="mean", loss_weight=1.0): + super(DistributionFocalLoss, self).__init__() + self.reduction = reduction + self.loss_weight = loss_weight + + def forward( + self, pred, target, weight=None, avg_factor=None, reduction_override=None + ): + """Forward function. + + Args: + pred (torch.Tensor): Predicted general distribution of bounding + boxes (before softmax) with shape (N, n+1), n is the max value + of the integral set `{0, ..., n}` in paper. + target (torch.Tensor): Target distance label for bounding boxes + with shape (N,). + weight (torch.Tensor, optional): The weight of loss for each + prediction. Defaults to None. + avg_factor (int, optional): Average factor that is used to average + the loss. Defaults to None. + reduction_override (str, optional): The reduction method used to + override the original reduction method of the loss. + Defaults to None. + """ + assert reduction_override in (None, "none", "mean", "sum") + reduction = reduction_override if reduction_override else self.reduction + loss_cls = self.loss_weight * distribution_focal_loss( + pred, target, weight, reduction=reduction, avg_factor=avg_factor + ) + return loss_cls diff --git a/nanodet/model/loss/iou_loss.py b/nanodet/model/loss/iou_loss.py new file mode 100644 index 0000000..f1f3e26 --- /dev/null +++ b/nanodet/model/loss/iou_loss.py @@ -0,0 +1,548 @@ +# Modification 2020 RangiLyu +# Copyright 2018-2019 Open-MMLab. + +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at + +# http://www.apache.org/licenses/LICENSE-2.0 + +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math + +import torch +import torch.nn as nn + +from .utils import weighted_loss + + +def bbox_overlaps(bboxes1, bboxes2, mode="iou", is_aligned=False, eps=1e-6): + """Calculate overlap between two set of bboxes. + + If ``is_aligned `` is ``False``, then calculate the overlaps between each + bbox of bboxes1 and bboxes2, otherwise the overlaps between each aligned + pair of bboxes1 and bboxes2. + + Args: + bboxes1 (Tensor): shape (B, m, 4) in format or empty. + bboxes2 (Tensor): shape (B, n, 4) in format or empty. + B indicates the batch dim, in shape (B1, B2, ..., Bn). + If ``is_aligned `` is ``True``, then m and n must be equal. + mode (str): "iou" (intersection over union) or "iof" (intersection over + foreground). + is_aligned (bool, optional): If True, then m and n must be equal. + Default False. + eps (float, optional): A value added to the denominator for numerical + stability. Default 1e-6. + + Returns: + Tensor: shape (m, n) if ``is_aligned `` is False else shape (m,) + + Example: + >>> bboxes1 = torch.FloatTensor([ + >>> [0, 0, 10, 10], + >>> [10, 10, 20, 20], + >>> [32, 32, 38, 42], + >>> ]) + >>> bboxes2 = torch.FloatTensor([ + >>> [0, 0, 10, 20], + >>> [0, 10, 10, 19], + >>> [10, 10, 20, 20], + >>> ]) + >>> bbox_overlaps(bboxes1, bboxes2) + tensor([[0.5000, 0.0000, 0.0000], + [0.0000, 0.0000, 1.0000], + [0.0000, 0.0000, 0.0000]]) + >>> bbox_overlaps(bboxes1, bboxes2, mode='giou', eps=1e-7) + tensor([[0.5000, 0.0000, -0.5000], + [-0.2500, -0.0500, 1.0000], + [-0.8371, -0.8766, -0.8214]]) + + Example: + >>> empty = torch.FloatTensor([]) + >>> nonempty = torch.FloatTensor([ + >>> [0, 0, 10, 9], + >>> ]) + >>> assert tuple(bbox_overlaps(empty, nonempty).shape) == (0, 1) + >>> assert tuple(bbox_overlaps(nonempty, empty).shape) == (1, 0) + >>> assert tuple(bbox_overlaps(empty, empty).shape) == (0, 0) + """ + + assert mode in ["iou", "iof", "giou"], f"Unsupported mode {mode}" + # Either the boxes are empty or the length of boxes's last dimenstion is 4 + assert bboxes1.size(-1) == 4 or bboxes1.size(0) == 0 + assert bboxes2.size(-1) == 4 or bboxes2.size(0) == 0 + + # Batch dim must be the same + # Batch dim: (B1, B2, ... Bn) + assert bboxes1.shape[:-2] == bboxes2.shape[:-2] + batch_shape = bboxes1.shape[:-2] + + rows = bboxes1.size(-2) + cols = bboxes2.size(-2) + if is_aligned: + assert rows == cols + + if rows * cols == 0: + if is_aligned: + return bboxes1.new(batch_shape + (rows,)) + else: + return bboxes1.new(batch_shape + (rows, cols)) + + area1 = (bboxes1[..., 2] - bboxes1[..., 0]) * (bboxes1[..., 3] - bboxes1[..., 1]) + area2 = (bboxes2[..., 2] - bboxes2[..., 0]) * (bboxes2[..., 3] - bboxes2[..., 1]) + + if is_aligned: + lt = torch.max(bboxes1[..., :2], bboxes2[..., :2]) # [B, rows, 2] + rb = torch.min(bboxes1[..., 2:], bboxes2[..., 2:]) # [B, rows, 2] + + wh = (rb - lt).clamp(min=0) # [B, rows, 2] + overlap = wh[..., 0] * wh[..., 1] + + if mode in ["iou", "giou"]: + union = area1 + area2 - overlap + else: + union = area1 + if mode == "giou": + enclosed_lt = torch.min(bboxes1[..., :2], bboxes2[..., :2]) + enclosed_rb = torch.max(bboxes1[..., 2:], bboxes2[..., 2:]) + else: + lt = torch.max( + bboxes1[..., :, None, :2], bboxes2[..., None, :, :2] + ) # [B, rows, cols, 2] + rb = torch.min( + bboxes1[..., :, None, 2:], bboxes2[..., None, :, 2:] + ) # [B, rows, cols, 2] + + wh = (rb - lt).clamp(min=0) # [B, rows, cols, 2] + overlap = wh[..., 0] * wh[..., 1] + + if mode in ["iou", "giou"]: + union = area1[..., None] + area2[..., None, :] - overlap + else: + union = area1[..., None] + if mode == "giou": + enclosed_lt = torch.min( + bboxes1[..., :, None, :2], bboxes2[..., None, :, :2] + ) + enclosed_rb = torch.max( + bboxes1[..., :, None, 2:], bboxes2[..., None, :, 2:] + ) + + eps = union.new_tensor([eps]) + union = torch.max(union, eps) + ious = overlap / union + if mode in ["iou", "iof"]: + return ious + # calculate gious + enclose_wh = (enclosed_rb - enclosed_lt).clamp(min=0) + enclose_area = enclose_wh[..., 0] * enclose_wh[..., 1] + enclose_area = torch.max(enclose_area, eps) + gious = ious - (enclose_area - union) / enclose_area + return gious + + +@weighted_loss +def iou_loss(pred, target, eps=1e-6): + """IoU loss. + + Computing the IoU loss between a set of predicted bboxes and target bboxes. + The loss is calculated as negative log of IoU. + + Args: + pred (torch.Tensor): Predicted bboxes of format (x1, y1, x2, y2), + shape (n, 4). + target (torch.Tensor): Corresponding gt bboxes, shape (n, 4). + eps (float): Eps to avoid log(0). + + Return: + torch.Tensor: Loss tensor. + """ + ious = bbox_overlaps(pred, target, is_aligned=True).clamp(min=eps) + loss = -ious.log() + return loss + + +@weighted_loss +def bounded_iou_loss(pred, target, beta=0.2, eps=1e-3): + """BIoULoss. + + This is an implementation of paper + `Improving Object Localization with Fitness NMS and Bounded IoU Loss. + `_. + + Args: + pred (torch.Tensor): Predicted bboxes. + target (torch.Tensor): Target bboxes. + beta (float): beta parameter in smoothl1. + eps (float): eps to avoid NaN. + """ + pred_ctrx = (pred[:, 0] + pred[:, 2]) * 0.5 + pred_ctry = (pred[:, 1] + pred[:, 3]) * 0.5 + pred_w = pred[:, 2] - pred[:, 0] + pred_h = pred[:, 3] - pred[:, 1] + with torch.no_grad(): + target_ctrx = (target[:, 0] + target[:, 2]) * 0.5 + target_ctry = (target[:, 1] + target[:, 3]) * 0.5 + target_w = target[:, 2] - target[:, 0] + target_h = target[:, 3] - target[:, 1] + + dx = target_ctrx - pred_ctrx + dy = target_ctry - pred_ctry + + loss_dx = 1 - torch.max( + (target_w - 2 * dx.abs()) / (target_w + 2 * dx.abs() + eps), + torch.zeros_like(dx), + ) + loss_dy = 1 - torch.max( + (target_h - 2 * dy.abs()) / (target_h + 2 * dy.abs() + eps), + torch.zeros_like(dy), + ) + loss_dw = 1 - torch.min(target_w / (pred_w + eps), pred_w / (target_w + eps)) + loss_dh = 1 - torch.min(target_h / (pred_h + eps), pred_h / (target_h + eps)) + loss_comb = torch.stack([loss_dx, loss_dy, loss_dw, loss_dh], dim=-1).view( + loss_dx.size(0), -1 + ) + + loss = torch.where( + loss_comb < beta, 0.5 * loss_comb * loss_comb / beta, loss_comb - 0.5 * beta + ).sum(dim=-1) + return loss + + +@weighted_loss +def giou_loss(pred, target, eps=1e-7): + r"""`Generalized Intersection over Union: A Metric and A Loss for Bounding + Box Regression `_. + + Args: + pred (torch.Tensor): Predicted bboxes of format (x1, y1, x2, y2), + shape (n, 4). + target (torch.Tensor): Corresponding gt bboxes, shape (n, 4). + eps (float): Eps to avoid log(0). + + Return: + Tensor: Loss tensor. + """ + gious = bbox_overlaps(pred, target, mode="giou", is_aligned=True, eps=eps) + loss = 1 - gious + return loss + + +@weighted_loss +def diou_loss(pred, target, eps=1e-7): + r"""`Implementation of Distance-IoU Loss: Faster and Better + Learning for Bounding Box Regression, https://arxiv.org/abs/1911.08287`_. + + Code is modified from https://github.com/Zzh-tju/DIoU. + + Args: + pred (Tensor): Predicted bboxes of format (x1, y1, x2, y2), + shape (n, 4). + target (Tensor): Corresponding gt bboxes, shape (n, 4). + eps (float): Eps to avoid log(0). + Return: + Tensor: Loss tensor. + """ + # overlap + lt = torch.max(pred[:, :2], target[:, :2]) + rb = torch.min(pred[:, 2:], target[:, 2:]) + wh = (rb - lt).clamp(min=0) + overlap = wh[:, 0] * wh[:, 1] + + # union + ap = (pred[:, 2] - pred[:, 0]) * (pred[:, 3] - pred[:, 1]) + ag = (target[:, 2] - target[:, 0]) * (target[:, 3] - target[:, 1]) + union = ap + ag - overlap + eps + + # IoU + ious = overlap / union + + # enclose area + enclose_x1y1 = torch.min(pred[:, :2], target[:, :2]) + enclose_x2y2 = torch.max(pred[:, 2:], target[:, 2:]) + enclose_wh = (enclose_x2y2 - enclose_x1y1).clamp(min=0) + + cw = enclose_wh[:, 0] + ch = enclose_wh[:, 1] + + c2 = cw**2 + ch**2 + eps + + b1_x1, b1_y1 = pred[:, 0], pred[:, 1] + b1_x2, b1_y2 = pred[:, 2], pred[:, 3] + b2_x1, b2_y1 = target[:, 0], target[:, 1] + b2_x2, b2_y2 = target[:, 2], target[:, 3] + + left = ((b2_x1 + b2_x2) - (b1_x1 + b1_x2)) ** 2 / 4 + right = ((b2_y1 + b2_y2) - (b1_y1 + b1_y2)) ** 2 / 4 + rho2 = left + right + + # DIoU + dious = ious - rho2 / c2 + loss = 1 - dious + return loss + + +@weighted_loss +def ciou_loss(pred, target, eps=1e-7): + r"""`Implementation of paper `Enhancing Geometric Factors into + Model Learning and Inference for Object Detection and Instance + Segmentation `_. + + Code is modified from https://github.com/Zzh-tju/CIoU. + + Args: + pred (Tensor): Predicted bboxes of format (x1, y1, x2, y2), + shape (n, 4). + target (Tensor): Corresponding gt bboxes, shape (n, 4). + eps (float): Eps to avoid log(0). + Return: + Tensor: Loss tensor. + """ + # overlap + lt = torch.max(pred[:, :2], target[:, :2]) + rb = torch.min(pred[:, 2:], target[:, 2:]) + wh = (rb - lt).clamp(min=0) + overlap = wh[:, 0] * wh[:, 1] + + # union + ap = (pred[:, 2] - pred[:, 0]) * (pred[:, 3] - pred[:, 1]) + ag = (target[:, 2] - target[:, 0]) * (target[:, 3] - target[:, 1]) + union = ap + ag - overlap + eps + + # IoU + ious = overlap / union + + # enclose area + enclose_x1y1 = torch.min(pred[:, :2], target[:, :2]) + enclose_x2y2 = torch.max(pred[:, 2:], target[:, 2:]) + enclose_wh = (enclose_x2y2 - enclose_x1y1).clamp(min=0) + + cw = enclose_wh[:, 0] + ch = enclose_wh[:, 1] + + c2 = cw**2 + ch**2 + eps + + b1_x1, b1_y1 = pred[:, 0], pred[:, 1] + b1_x2, b1_y2 = pred[:, 2], pred[:, 3] + b2_x1, b2_y1 = target[:, 0], target[:, 1] + b2_x2, b2_y2 = target[:, 2], target[:, 3] + + w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps + w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps + + left = ((b2_x1 + b2_x2) - (b1_x1 + b1_x2)) ** 2 / 4 + right = ((b2_y1 + b2_y2) - (b1_y1 + b1_y2)) ** 2 / 4 + rho2 = left + right + + factor = 4 / math.pi**2 + v = factor * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2) + + # CIoU + cious = ious - (rho2 / c2 + v**2 / (1 - ious + v)) + loss = 1 - cious + return loss + + +class IoULoss(nn.Module): + """IoULoss. + + Computing the IoU loss between a set of predicted bboxes and target bboxes. + + Args: + eps (float): Eps to avoid log(0). + reduction (str): Options are "none", "mean" and "sum". + loss_weight (float): Weight of loss. + """ + + def __init__(self, eps=1e-6, reduction="mean", loss_weight=1.0): + super(IoULoss, self).__init__() + self.eps = eps + self.reduction = reduction + self.loss_weight = loss_weight + + def forward( + self, + pred, + target, + weight=None, + avg_factor=None, + reduction_override=None, + **kwargs, + ): + """Forward function. + + Args: + pred (torch.Tensor): The prediction. + target (torch.Tensor): The learning target of the prediction. + weight (torch.Tensor, optional): The weight of loss for each + prediction. Defaults to None. + avg_factor (int, optional): Average factor that is used to average + the loss. Defaults to None. + reduction_override (str, optional): The reduction method used to + override the original reduction method of the loss. + Defaults to None. Options are "none", "mean" and "sum". + """ + assert reduction_override in (None, "none", "mean", "sum") + reduction = reduction_override if reduction_override else self.reduction + if ( + (weight is not None) + and (not torch.any(weight > 0)) + and (reduction != "none") + ): + if pred.dim() == weight.dim() + 1: + weight = weight.unsqueeze(1) + return (pred * weight).sum() # 0 + loss = self.loss_weight * iou_loss( + pred, + target, + weight, + eps=self.eps, + reduction=reduction, + avg_factor=avg_factor, + **kwargs, + ) + return loss + + +class BoundedIoULoss(nn.Module): + def __init__(self, beta=0.2, eps=1e-3, reduction="mean", loss_weight=1.0): + super(BoundedIoULoss, self).__init__() + self.beta = beta + self.eps = eps + self.reduction = reduction + self.loss_weight = loss_weight + + def forward( + self, + pred, + target, + weight=None, + avg_factor=None, + reduction_override=None, + **kwargs, + ): + if weight is not None and not torch.any(weight > 0): + if pred.dim() == weight.dim() + 1: + weight = weight.unsqueeze(1) + return (pred * weight).sum() # 0 + assert reduction_override in (None, "none", "mean", "sum") + reduction = reduction_override if reduction_override else self.reduction + loss = self.loss_weight * bounded_iou_loss( + pred, + target, + weight, + beta=self.beta, + eps=self.eps, + reduction=reduction, + avg_factor=avg_factor, + **kwargs, + ) + return loss + + +class GIoULoss(nn.Module): + def __init__(self, eps=1e-6, reduction="mean", loss_weight=1.0): + super(GIoULoss, self).__init__() + self.eps = eps + self.reduction = reduction + self.loss_weight = loss_weight + + def forward( + self, + pred, + target, + weight=None, + avg_factor=None, + reduction_override=None, + **kwargs, + ): + if weight is not None and not torch.any(weight > 0): + if pred.dim() == weight.dim() + 1: + weight = weight.unsqueeze(1) + return (pred * weight).sum() # 0 + assert reduction_override in (None, "none", "mean", "sum") + reduction = reduction_override if reduction_override else self.reduction + loss = self.loss_weight * giou_loss( + pred, + target, + weight, + eps=self.eps, + reduction=reduction, + avg_factor=avg_factor, + **kwargs, + ) + return loss + + +class DIoULoss(nn.Module): + def __init__(self, eps=1e-6, reduction="mean", loss_weight=1.0): + super(DIoULoss, self).__init__() + self.eps = eps + self.reduction = reduction + self.loss_weight = loss_weight + + def forward( + self, + pred, + target, + weight=None, + avg_factor=None, + reduction_override=None, + **kwargs, + ): + if weight is not None and not torch.any(weight > 0): + if pred.dim() == weight.dim() + 1: + weight = weight.unsqueeze(1) + return (pred * weight).sum() # 0 + assert reduction_override in (None, "none", "mean", "sum") + reduction = reduction_override if reduction_override else self.reduction + loss = self.loss_weight * diou_loss( + pred, + target, + weight, + eps=self.eps, + reduction=reduction, + avg_factor=avg_factor, + **kwargs, + ) + return loss + + +class CIoULoss(nn.Module): + def __init__(self, eps=1e-6, reduction="mean", loss_weight=1.0): + super(CIoULoss, self).__init__() + self.eps = eps + self.reduction = reduction + self.loss_weight = loss_weight + + def forward( + self, + pred, + target, + weight=None, + avg_factor=None, + reduction_override=None, + **kwargs, + ): + if weight is not None and not torch.any(weight > 0): + if pred.dim() == weight.dim() + 1: + weight = weight.unsqueeze(1) + return (pred * weight).sum() # 0 + assert reduction_override in (None, "none", "mean", "sum") + reduction = reduction_override if reduction_override else self.reduction + loss = self.loss_weight * ciou_loss( + pred, + target, + weight, + eps=self.eps, + reduction=reduction, + avg_factor=avg_factor, + **kwargs, + ) + return loss diff --git a/nanodet/model/loss/utils.py b/nanodet/model/loss/utils.py new file mode 100644 index 0000000..f8bae7d --- /dev/null +++ b/nanodet/model/loss/utils.py @@ -0,0 +1,93 @@ +import functools + +import torch.nn.functional as F + + +def reduce_loss(loss, reduction): + """Reduce loss as specified. + + Args: + loss (Tensor): Elementwise loss tensor. + reduction (str): Options are "none", "mean" and "sum". + + Return: + Tensor: Reduced loss tensor. + """ + reduction_enum = F._Reduction.get_enum(reduction) + # none: 0, elementwise_mean:1, sum: 2 + if reduction_enum == 0: + return loss + elif reduction_enum == 1: + return loss.mean() + elif reduction_enum == 2: + return loss.sum() + + +def weight_reduce_loss(loss, weight=None, reduction="mean", avg_factor=None): + """Apply element-wise weight and reduce loss. + + Args: + loss (Tensor): Element-wise loss. + weight (Tensor): Element-wise weights. + reduction (str): Same as built-in losses of PyTorch. + avg_factor (float): Avarage factor when computing the mean of losses. + + Returns: + Tensor: Processed loss values. + """ + # if weight is specified, apply element-wise weight + if weight is not None: + loss = loss * weight + + # if avg_factor is not specified, just reduce the loss + if avg_factor is None: + loss = reduce_loss(loss, reduction) + else: + # if reduction is mean, then average the loss by avg_factor + if reduction == "mean": + loss = loss.sum() / avg_factor + # if reduction is 'none', then do nothing, otherwise raise an error + elif reduction != "none": + raise ValueError('avg_factor can not be used with reduction="sum"') + return loss + + +def weighted_loss(loss_func): + """Create a weighted version of a given loss function. + + To use this decorator, the loss function must have the signature like + `loss_func(pred, target, **kwargs)`. The function only needs to compute + element-wise loss without any reduction. This decorator will add weight + and reduction arguments to the function. The decorated function will have + the signature like `loss_func(pred, target, weight=None, reduction='mean', + avg_factor=None, **kwargs)`. + + :Example: + + >>> import torch + >>> @weighted_loss + >>> def l1_loss(pred, target): + >>> return (pred - target).abs() + + >>> pred = torch.Tensor([0, 2, 3]) + >>> target = torch.Tensor([1, 1, 1]) + >>> weight = torch.Tensor([1, 0, 1]) + + >>> l1_loss(pred, target) + tensor(1.3333) + >>> l1_loss(pred, target, weight) + tensor(1.) + >>> l1_loss(pred, target, reduction='none') + tensor([1., 1., 2.]) + >>> l1_loss(pred, target, weight, avg_factor=2) + tensor(1.5000) + """ + + @functools.wraps(loss_func) + def wrapper(pred, target, weight=None, reduction="mean", avg_factor=None, **kwargs): + # get element-wise loss + loss = loss_func(pred, target, **kwargs) + loss = weight_reduce_loss(loss, weight, reduction, avg_factor) + return loss + + return wrapper diff --git a/nanodet/model/module/activation.py b/nanodet/model/module/activation.py new file mode 100644 index 0000000..8047fc8 --- /dev/null +++ b/nanodet/model/module/activation.py @@ -0,0 +1,41 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch.nn as nn + +activations = { + "ReLU": nn.ReLU, + "LeakyReLU": nn.LeakyReLU, + "ReLU6": nn.ReLU6, + "SELU": nn.SELU, + "ELU": nn.ELU, + "GELU": nn.GELU, + "PReLU": nn.PReLU, + "SiLU": nn.SiLU, + "HardSwish": nn.Hardswish, + "Hardswish": nn.Hardswish, + None: nn.Identity, +} + + +def act_layers(name): + assert name in activations.keys() + if name == "LeakyReLU": + return nn.LeakyReLU(negative_slope=0.1, inplace=True) + elif name == "GELU": + return nn.GELU() + elif name == "PReLU": + return nn.PReLU() + else: + return activations[name](inplace=True) diff --git a/nanodet/model/module/conv.py b/nanodet/model/module/conv.py new file mode 100644 index 0000000..f35f0b6 --- /dev/null +++ b/nanodet/model/module/conv.py @@ -0,0 +1,392 @@ +""" +ConvModule refers from MMDetection +RepVGGConvModule refers from RepVGG: Making VGG-style ConvNets Great Again +""" +import warnings + +import numpy as np +import torch +import torch.nn as nn + +from .activation import act_layers +from .init_weights import constant_init, kaiming_init +from .norm import build_norm_layer + + +class ConvModule(nn.Module): + """A conv block that contains conv/norm/activation layers. + + Args: + in_channels (int): Same as nn.Conv2d. + out_channels (int): Same as nn.Conv2d. + kernel_size (int or tuple[int]): Same as nn.Conv2d. + stride (int or tuple[int]): Same as nn.Conv2d. + padding (int or tuple[int]): Same as nn.Conv2d. + dilation (int or tuple[int]): Same as nn.Conv2d. + groups (int): Same as nn.Conv2d. + bias (bool or str): If specified as `auto`, it will be decided by the + norm_cfg. Bias will be set as True if norm_cfg is None, otherwise + False. + conv_cfg (dict): Config dict for convolution layer. + norm_cfg (dict): Config dict for normalization layer. + activation (str): activation layer, "ReLU" by default. + inplace (bool): Whether to use inplace mode for activation. + order (tuple[str]): The order of conv/norm/activation layers. It is a + sequence of "conv", "norm" and "act". Examples are + ("conv", "norm", "act") and ("act", "conv", "norm"). + """ + + def __init__( + self, + in_channels, + out_channels, + kernel_size, + stride=1, + padding=0, + dilation=1, + groups=1, + bias="auto", + conv_cfg=None, + norm_cfg=None, + activation="ReLU", + inplace=True, + order=("conv", "norm", "act"), + ): + super(ConvModule, self).__init__() + assert conv_cfg is None or isinstance(conv_cfg, dict) + assert norm_cfg is None or isinstance(norm_cfg, dict) + assert activation is None or isinstance(activation, str) + self.conv_cfg = conv_cfg + self.norm_cfg = norm_cfg + self.activation = activation + self.inplace = inplace + self.order = order + assert isinstance(self.order, tuple) and len(self.order) == 3 + assert set(order) == {"conv", "norm", "act"} + + self.with_norm = norm_cfg is not None + # if the conv layer is before a norm layer, bias is unnecessary. + if bias == "auto": + bias = False if self.with_norm else True + self.with_bias = bias + + if self.with_norm and self.with_bias: + warnings.warn("ConvModule has norm and bias at the same time") + + # build convolution layer + self.conv = nn.Conv2d( # + in_channels, + out_channels, + kernel_size, + stride=stride, + padding=padding, + dilation=dilation, + groups=groups, + bias=bias, + ) + # export the attributes of self.conv to a higher level for convenience + self.in_channels = self.conv.in_channels + self.out_channels = self.conv.out_channels + self.kernel_size = self.conv.kernel_size + self.stride = self.conv.stride + self.padding = self.conv.padding + self.dilation = self.conv.dilation + self.transposed = self.conv.transposed + self.output_padding = self.conv.output_padding + self.groups = self.conv.groups + + # build normalization layers + if self.with_norm: + # norm layer is after conv layer + if order.index("norm") > order.index("conv"): + norm_channels = out_channels + else: + norm_channels = in_channels + self.norm_name, norm = build_norm_layer(norm_cfg, norm_channels) + self.add_module(self.norm_name, norm) + else: + self.norm_name = None + + # build activation layer + if self.activation: + self.act = act_layers(self.activation) + + # Use msra init by default + self.init_weights() + + @property + def norm(self): + if self.norm_name: + return getattr(self, self.norm_name) + else: + return None + + def init_weights(self): + if self.activation == "LeakyReLU": + nonlinearity = "leaky_relu" + else: + nonlinearity = "relu" + kaiming_init(self.conv, nonlinearity=nonlinearity) + if self.with_norm: + constant_init(self.norm, 1, bias=0) + + def forward(self, x, norm=True): + for layer in self.order: + if layer == "conv": + x = self.conv(x) + elif layer == "norm" and norm and self.with_norm: + x = self.norm(x) + elif layer == "act" and self.activation: + x = self.act(x) + return x + + +class DepthwiseConvModule(nn.Module): + def __init__( + self, + in_channels, + out_channels, + kernel_size, + stride=1, + padding=0, + dilation=1, + bias="auto", + norm_cfg=dict(type="BN"), + activation="ReLU", + inplace=True, + order=("depthwise", "dwnorm", "act", "pointwise", "pwnorm", "act"), + ): + super(DepthwiseConvModule, self).__init__() + assert activation is None or isinstance(activation, str) + self.activation = activation + self.inplace = inplace + self.order = order + assert isinstance(self.order, tuple) and len(self.order) == 6 + assert set(order) == { + "depthwise", + "dwnorm", + "act", + "pointwise", + "pwnorm", + "act", + } + + self.with_norm = norm_cfg is not None + # if the conv layer is before a norm layer, bias is unnecessary. + if bias == "auto": + bias = False if self.with_norm else True + self.with_bias = bias + + if self.with_norm and self.with_bias: + warnings.warn("ConvModule has norm and bias at the same time") + + # build convolution layer + self.depthwise = nn.Conv2d( + in_channels, + in_channels, + kernel_size, + stride=stride, + padding=padding, + dilation=dilation, + groups=in_channels, + bias=bias, + ) + self.pointwise = nn.Conv2d( + in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=bias + ) + + # export the attributes of self.conv to a higher level for convenience + self.in_channels = self.depthwise.in_channels + self.out_channels = self.pointwise.out_channels + self.kernel_size = self.depthwise.kernel_size + self.stride = self.depthwise.stride + self.padding = self.depthwise.padding + self.dilation = self.depthwise.dilation + self.transposed = self.depthwise.transposed + self.output_padding = self.depthwise.output_padding + + # build normalization layers + if self.with_norm: + # norm layer is after conv layer + _, self.dwnorm = build_norm_layer(norm_cfg, in_channels) + _, self.pwnorm = build_norm_layer(norm_cfg, out_channels) + + # build activation layer + if self.activation: + self.act = act_layers(self.activation) + + # Use msra init by default + self.init_weights() + + def init_weights(self): + if self.activation == "LeakyReLU": + nonlinearity = "leaky_relu" + else: + nonlinearity = "relu" + kaiming_init(self.depthwise, nonlinearity=nonlinearity) + kaiming_init(self.pointwise, nonlinearity=nonlinearity) + if self.with_norm: + constant_init(self.dwnorm, 1, bias=0) + constant_init(self.pwnorm, 1, bias=0) + + def forward(self, x, norm=True): + for layer_name in self.order: + if layer_name != "act": + layer = self.__getattr__(layer_name) + x = layer(x) + elif layer_name == "act" and self.activation: + x = self.act(x) + return x + + +class RepVGGConvModule(nn.Module): + """ + RepVGG Conv Block from paper RepVGG: Making VGG-style ConvNets Great Again + https://arxiv.org/abs/2101.03697 + https://github.com/DingXiaoH/RepVGG + """ + + def __init__( + self, + in_channels, + out_channels, + kernel_size=3, + stride=1, + padding=1, + dilation=1, + groups=1, + activation="ReLU", + padding_mode="zeros", + deploy=False, + **kwargs + ): + super(RepVGGConvModule, self).__init__() + assert activation is None or isinstance(activation, str) + self.activation = activation + + self.deploy = deploy + self.groups = groups + self.in_channels = in_channels + + assert kernel_size == 3 + assert padding == 1 + + padding_11 = padding - kernel_size // 2 + + # build activation layer + if self.activation: + self.act = act_layers(self.activation) + + if deploy: + self.rbr_reparam = nn.Conv2d( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + dilation=dilation, + groups=groups, + bias=True, + padding_mode=padding_mode, + ) + + else: + self.rbr_identity = ( + nn.BatchNorm2d(num_features=in_channels) + if out_channels == in_channels and stride == 1 + else None + ) + + self.rbr_dense = nn.Sequential( + nn.Conv2d( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + groups=groups, + bias=False, + ), + nn.BatchNorm2d(num_features=out_channels), + ) + + self.rbr_1x1 = nn.Sequential( + nn.Conv2d( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + stride=stride, + padding=padding_11, + groups=groups, + bias=False, + ), + nn.BatchNorm2d(num_features=out_channels), + ) + print("RepVGG Block, identity = ", self.rbr_identity) + + def forward(self, inputs): + if hasattr(self, "rbr_reparam"): + return self.act(self.rbr_reparam(inputs)) + + if self.rbr_identity is None: + id_out = 0 + else: + id_out = self.rbr_identity(inputs) + + return self.act(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out) + + # This func derives the equivalent kernel and bias in a DIFFERENTIABLE way. + # You can get the equivalent kernel and bias at any time and do whatever you want, + # for example, apply some penalties or constraints during training, just like you + # do to the other models. May be useful for quantization or pruning. + def get_equivalent_kernel_bias(self): + kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense) + kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1) + kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity) + return ( + kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, + bias3x3 + bias1x1 + biasid, + ) + + def _pad_1x1_to_3x3_tensor(self, kernel1x1): + if kernel1x1 is None: + return 0 + else: + return nn.functional.pad(kernel1x1, [1, 1, 1, 1]) + + def _fuse_bn_tensor(self, branch): + if branch is None: + return 0, 0 + if isinstance(branch, nn.Sequential): + kernel = branch[0].weight + running_mean = branch[1].running_mean + running_var = branch[1].running_var + gamma = branch[1].weight + beta = branch[1].bias + eps = branch[1].eps + else: + assert isinstance(branch, nn.BatchNorm2d) + if not hasattr(self, "id_tensor"): + input_dim = self.in_channels // self.groups + kernel_value = np.zeros( + (self.in_channels, input_dim, 3, 3), dtype=np.float32 + ) + for i in range(self.in_channels): + kernel_value[i, i % input_dim, 1, 1] = 1 + self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device) + kernel = self.id_tensor + running_mean = branch.running_mean + running_var = branch.running_var + gamma = branch.weight + beta = branch.bias + eps = branch.eps + std = (running_var + eps).sqrt() + t = (gamma / std).reshape(-1, 1, 1, 1) + return kernel * t, beta - running_mean * gamma / std + + def repvgg_convert(self): + kernel, bias = self.get_equivalent_kernel_bias() + return ( + kernel.detach().cpu().numpy(), + bias.detach().cpu().numpy(), + ) diff --git a/nanodet/model/module/init_weights.py b/nanodet/model/module/init_weights.py new file mode 100644 index 0000000..27da85c --- /dev/null +++ b/nanodet/model/module/init_weights.py @@ -0,0 +1,43 @@ +# Modification 2020 RangiLyu +# Copyright 2018-2019 Open-MMLab. + +import torch.nn as nn + + +def kaiming_init( + module, a=0, mode="fan_out", nonlinearity="relu", bias=0, distribution="normal" +): + assert distribution in ["uniform", "normal"] + if distribution == "uniform": + nn.init.kaiming_uniform_( + module.weight, a=a, mode=mode, nonlinearity=nonlinearity + ) + else: + nn.init.kaiming_normal_( + module.weight, a=a, mode=mode, nonlinearity=nonlinearity + ) + if hasattr(module, "bias") and module.bias is not None: + nn.init.constant_(module.bias, bias) + + +def xavier_init(module, gain=1, bias=0, distribution="normal"): + assert distribution in ["uniform", "normal"] + if distribution == "uniform": + nn.init.xavier_uniform_(module.weight, gain=gain) + else: + nn.init.xavier_normal_(module.weight, gain=gain) + if hasattr(module, "bias") and module.bias is not None: + nn.init.constant_(module.bias, bias) + + +def normal_init(module, mean=0, std=1, bias=0): + nn.init.normal_(module.weight, mean, std) + if hasattr(module, "bias") and module.bias is not None: + nn.init.constant_(module.bias, bias) + + +def constant_init(module, val, bias=0): + if hasattr(module, "weight") and module.weight is not None: + nn.init.constant_(module.weight, val) + if hasattr(module, "bias") and module.bias is not None: + nn.init.constant_(module.bias, bias) diff --git a/nanodet/model/module/nms.py b/nanodet/model/module/nms.py new file mode 100644 index 0000000..e5fa3e2 --- /dev/null +++ b/nanodet/model/module/nms.py @@ -0,0 +1,122 @@ +import torch +from torchvision.ops import nms + + +def multiclass_nms( + multi_bboxes, multi_scores, score_thr, nms_cfg, max_num=-1, score_factors=None +): + """NMS for multi-class bboxes. + + Args: + multi_bboxes (Tensor): shape (n, #class*4) or (n, 4) + multi_scores (Tensor): shape (n, #class), where the last column + contains scores of the background class, but this will be ignored. + score_thr (float): bbox threshold, bboxes with scores lower than it + will not be considered. + nms_thr (float): NMS IoU threshold + max_num (int): if there are more than max_num bboxes after NMS, + only top max_num will be kept. + score_factors (Tensor): The factors multiplied to scores before + applying NMS + + Returns: + tuple: (bboxes, labels), tensors of shape (k, 5) and (k, 1). Labels \ + are 0-based. + """ + num_classes = multi_scores.size(1) - 1 + # exclude background category + if multi_bboxes.shape[1] > 4: + bboxes = multi_bboxes.view(multi_scores.size(0), -1, 4) + else: + bboxes = multi_bboxes[:, None].expand(multi_scores.size(0), num_classes, 4) + scores = multi_scores[:, :-1] + + # filter out boxes with low scores + valid_mask = scores > score_thr + + # We use masked_select for ONNX exporting purpose, + # which is equivalent to bboxes = bboxes[valid_mask] + # we have to use this ugly code + bboxes = torch.masked_select( + bboxes, torch.stack((valid_mask, valid_mask, valid_mask, valid_mask), -1) + ).view(-1, 4) + if score_factors is not None: + scores = scores * score_factors[:, None] + scores = torch.masked_select(scores, valid_mask) + labels = valid_mask.nonzero(as_tuple=False)[:, 1] + + if bboxes.numel() == 0: + bboxes = multi_bboxes.new_zeros((0, 5)) + labels = multi_bboxes.new_zeros((0,), dtype=torch.long) + + if torch.onnx.is_in_onnx_export(): + raise RuntimeError( + "[ONNX Error] Can not record NMS " + "as it has not been executed this time" + ) + return bboxes, labels + + dets, keep = batched_nms(bboxes, scores, labels, nms_cfg) + + if max_num > 0: + dets = dets[:max_num] + keep = keep[:max_num] + + return dets, labels[keep] + + +def batched_nms(boxes, scores, idxs, nms_cfg, class_agnostic=False): + """Performs non-maximum suppression in a batched fashion. + Modified from https://github.com/pytorch/vision/blob + /505cd6957711af790211896d32b40291bea1bc21/torchvision/ops/boxes.py#L39. + In order to perform NMS independently per class, we add an offset to all + the boxes. The offset is dependent only on the class idx, and is large + enough so that boxes from different classes do not overlap. + Arguments: + boxes (torch.Tensor): boxes in shape (N, 4). + scores (torch.Tensor): scores in shape (N, ). + idxs (torch.Tensor): each index value correspond to a bbox cluster, + and NMS will not be applied between elements of different idxs, + shape (N, ). + nms_cfg (dict): specify nms type and other parameters like iou_thr. + Possible keys includes the following. + - iou_thr (float): IoU threshold used for NMS. + - split_thr (float): threshold number of boxes. In some cases the + number of boxes is large (e.g., 200k). To avoid OOM during + training, the users could set `split_thr` to a small value. + If the number of boxes is greater than the threshold, it will + perform NMS on each group of boxes separately and sequentially. + Defaults to 10000. + class_agnostic (bool): if true, nms is class agnostic, + i.e. IoU thresholding happens over all boxes, + regardless of the predicted class. + Returns: + tuple: kept dets and indice. + """ + nms_cfg_ = nms_cfg.copy() + class_agnostic = nms_cfg_.pop("class_agnostic", class_agnostic) + if class_agnostic: + boxes_for_nms = boxes + else: + max_coordinate = boxes.max() + offsets = idxs.to(boxes) * (max_coordinate + 1) + boxes_for_nms = boxes + offsets[:, None] + nms_cfg_.pop("type", "nms") + split_thr = nms_cfg_.pop("split_thr", 10000) + if len(boxes_for_nms) < split_thr: + keep = nms(boxes_for_nms, scores, **nms_cfg_) + boxes = boxes[keep] + scores = scores[keep] + else: + total_mask = scores.new_zeros(scores.size(), dtype=torch.bool) + for id in torch.unique(idxs): + mask = (idxs == id).nonzero(as_tuple=False).view(-1) + keep = nms(boxes_for_nms[mask], scores[mask], **nms_cfg_) + total_mask[mask[keep]] = True + + keep = total_mask.nonzero(as_tuple=False).view(-1) + keep = keep[scores[keep].argsort(descending=True)] + boxes = boxes[keep] + scores = scores[keep] + + return torch.cat([boxes, scores[:, None]], -1), keep diff --git a/nanodet/model/module/norm.py b/nanodet/model/module/norm.py new file mode 100644 index 0000000..b9dd8f4 --- /dev/null +++ b/nanodet/model/module/norm.py @@ -0,0 +1,55 @@ +import torch.nn as nn + +norm_cfg = { + # format: layer_type: (abbreviation, module) + "BN": ("bn", nn.BatchNorm2d), + "SyncBN": ("bn", nn.SyncBatchNorm), + "GN": ("gn", nn.GroupNorm), + # and potentially 'SN' +} + + +def build_norm_layer(cfg, num_features, postfix=""): + """Build normalization layer + + Args: + cfg (dict): cfg should contain: + type (str): identify norm layer type. + layer args: args needed to instantiate a norm layer. + requires_grad (bool): [optional] whether stop gradient updates + num_features (int): number of channels from input. + postfix (int, str): appended into norm abbreviation to + create named layer. + + Returns: + name (str): abbreviation + postfix + layer (nn.Module): created norm layer + """ + assert isinstance(cfg, dict) and "type" in cfg + cfg_ = cfg.copy() + + layer_type = cfg_.pop("type") + if layer_type not in norm_cfg: + raise KeyError("Unrecognized norm type {}".format(layer_type)) + else: + abbr, norm_layer = norm_cfg[layer_type] + if norm_layer is None: + raise NotImplementedError + + assert isinstance(postfix, (int, str)) + name = abbr + str(postfix) + + requires_grad = cfg_.pop("requires_grad", True) + cfg_.setdefault("eps", 1e-5) + if layer_type != "GN": + layer = norm_layer(num_features, **cfg_) + if layer_type == "SyncBN" and hasattr(layer, "_specify_ddp_gpu_num"): + layer._specify_ddp_gpu_num(1) + else: + assert "num_groups" in cfg_ + layer = norm_layer(num_channels=num_features, **cfg_) + + for param in layer.parameters(): + param.requires_grad = requires_grad + + return name, layer diff --git a/nanodet/model/module/scale.py b/nanodet/model/module/scale.py new file mode 100644 index 0000000..2461af8 --- /dev/null +++ b/nanodet/model/module/scale.py @@ -0,0 +1,15 @@ +import torch +import torch.nn as nn + + +class Scale(nn.Module): + """ + A learnable scale parameter + """ + + def __init__(self, scale=1.0): + super(Scale, self).__init__() + self.scale = nn.Parameter(torch.tensor(scale, dtype=torch.float)) + + def forward(self, x): + return x * self.scale diff --git a/nanodet/model/module/transformer.py b/nanodet/model/module/transformer.py new file mode 100644 index 0000000..2856df6 --- /dev/null +++ b/nanodet/model/module/transformer.py @@ -0,0 +1,138 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch.nn as nn + +from nanodet.model.module.activation import act_layers +from nanodet.model.module.conv import ConvModule + + +class MLP(nn.Module): + def __init__( + self, in_dim, hidden_dim=None, out_dim=None, drop=0.0, activation="GELU" + ): + super(MLP, self).__init__() + out_dim = out_dim or in_dim + hidden_dim = hidden_dim or in_dim + self.fc1 = nn.Linear(in_dim, hidden_dim) + self.act = act_layers(activation) + self.fc2 = nn.Linear(hidden_dim, out_dim) + self.drop = nn.Dropout(drop) + + def forward(self, x): + x = self.fc1(x) + x = self.act(x) + x = self.drop(x) + x = self.fc2(x) + x = self.drop(x) + return x + + +class TransformerEncoder(nn.Module): + """ + Encoder layer of transformer + :param dim: feature dimension + :param num_heads: number of attention heads + :param mlp_ratio: hidden layer dimension expand ratio in MLP + :param dropout_ratio: probability of an element to be zeroed + :param activation: activation layer type + :param kv_bias: add bias on key and values + """ + + def __init__( + self, + dim, + num_heads, + mlp_ratio, + dropout_ratio=0.0, + activation="GELU", + kv_bias=False, + ): + super(TransformerEncoder, self).__init__() + self.norm1 = nn.LayerNorm(dim) + + # embed_dim must be divisible by num_heads + assert dim // num_heads * num_heads == dim + self.attn = nn.MultiheadAttention( + embed_dim=dim, + num_heads=num_heads, + dropout=dropout_ratio, + add_bias_kv=kv_bias, + ) + self.norm2 = nn.LayerNorm(dim) + self.mlp = MLP( + in_dim=dim, + hidden_dim=int(dim * mlp_ratio), + drop=dropout_ratio, + activation=activation, + ) + + def forward(self, x): + _x = self.norm1(x) + x = x + self.attn(_x, _x, _x)[0] + x = x + self.mlp(self.norm2(x)) + return x + + +class TransformerBlock(nn.Module): + """ + Block of transformer encoder layers. Used in vision task. + :param in_channels: input channels + :param out_channels: output channels + :param num_heads: number of attention heads + :param num_encoders: number of transformer encoder layers + :param mlp_ratio: hidden layer dimension expand ratio in MLP + :param dropout_ratio: probability of an element to be zeroed + :param activation: activation layer type + :param kv_bias: add bias on key and values + """ + + def __init__( + self, + in_channels, + out_channels, + num_heads, + num_encoders=1, + mlp_ratio=1, + dropout_ratio=0.0, + kv_bias=False, + activation="GELU", + ): + super(TransformerBlock, self).__init__() + + # out_channels must be divisible by num_heads + assert out_channels // num_heads * num_heads == out_channels + + self.conv = ( + nn.Identity() + if in_channels == out_channels + else ConvModule(in_channels, out_channels, 1) + ) + self.linear = nn.Linear(out_channels, out_channels) + encoders = [ + TransformerEncoder( + out_channels, num_heads, mlp_ratio, dropout_ratio, activation, kv_bias + ) + for _ in range(num_encoders) + ] + self.encoders = nn.Sequential(*encoders) + + def forward(self, x, pos_embed): + b, _, h, w = x.shape + x = self.conv(x) + x = x.flatten(2).permute(2, 0, 1) + x = x + pos_embed + x = self.encoders(x) + x = x.permute(1, 2, 0).reshape(b, -1, h, w) + return x diff --git a/nanodet/model/weight_averager/__init__.py b/nanodet/model/weight_averager/__init__.py new file mode 100644 index 0000000..67d649d --- /dev/null +++ b/nanodet/model/weight_averager/__init__.py @@ -0,0 +1,26 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy + +from .ema import ExpMovingAverager + + +def build_weight_averager(cfg, device="cpu"): + cfg = copy.deepcopy(cfg) + name = cfg.pop("name") + if name == "ExpMovingAverager": + return ExpMovingAverager(**cfg, device=device) + else: + raise NotImplementedError(f"{name} is not implemented") diff --git a/nanodet/model/weight_averager/ema.py b/nanodet/model/weight_averager/ema.py new file mode 100644 index 0000000..a2c5fba --- /dev/null +++ b/nanodet/model/weight_averager/ema.py @@ -0,0 +1,80 @@ +# Copyright 2021 RangiLyu. All rights reserved. +# ===================================================================== +# Modified from: https://github.com/facebookresearch/d2go +# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved +# Licensed under the Apache License, Version 2.0 (the "License") +import itertools +import math +from typing import Any, Dict, Optional + +import torch +import torch.nn as nn + + +class ExpMovingAverager(object): + """Exponential Moving Average. + + Args: + decay (float): EMA decay factor, should be in [0, 1]. A decay of 0 corresponds + to always using the latest value (no EMA) and a decay of 1 corresponds to + not updating weights after initialization. Default to 0.9998. + device (str): If not None, move EMA state to device. + """ + + def __init__(self, decay: float = 0.9998, device: Optional[str] = None): + if decay < 0 or decay > 1.0: + raise ValueError(f"Decay should be in [0, 1], {decay} was given.") + self.decay: float = decay + self.state: Dict[str, Any] = {} + self.device: Optional[str] = device + + def load_from(self, model: nn.Module) -> None: + """Load state from the model.""" + self.state.clear() + for name, val in self._get_model_state_iterator(model): + val = val.detach().clone() + self.state[name] = val.to(self.device) if self.device else val + + def has_inited(self) -> bool: + return len(self.state) > 0 + + def apply_to(self, model: nn.Module) -> None: + """Apply EMA state to the model.""" + with torch.no_grad(): + for name, val in self._get_model_state_iterator(model): + assert ( + name in self.state + ), f"Name {name} not exist, available names are {self.state.keys()}" + val.copy_(self.state[name]) + + def state_dict(self) -> Dict[str, Any]: + return self.state + + def load_state_dict(self, state_dict: Dict[str, Any]) -> None: + self.state.clear() + for name, val in state_dict.items(): + self.state[name] = val.to(self.device) if self.device else val + + def to(self, device: torch.device) -> None: + """moves EMA state to device.""" + for name, val in self.state.items(): + self.state[name] = val.to(device) + + def _get_model_state_iterator(self, model: nn.Module): + param_iter = model.named_parameters() + # pyre-fixme[16]: `nn.Module` has no attribute `named_buffers`. + buffer_iter = model.named_buffers() + return itertools.chain(param_iter, buffer_iter) + + def calculate_dacay(self, iteration: int) -> float: + decay = (self.decay) * math.exp(-(1 + iteration) / 2000) + (1 - self.decay) + return decay + + def update(self, model: nn.Module, iteration: int) -> None: + decay = self.calculate_dacay(iteration) + with torch.no_grad(): + for name, val in self._get_model_state_iterator(model): + ema_val = self.state[name] + if self.device: + val = val.to(self.device) + ema_val.copy_(ema_val * (1 - decay) + val * decay) diff --git a/nanodet/optim/__init__.py b/nanodet/optim/__init__.py new file mode 100644 index 0000000..c4974b9 --- /dev/null +++ b/nanodet/optim/__init__.py @@ -0,0 +1,3 @@ +from .builder import build_optimizer + +__all__ = ["build_optimizer"] diff --git a/nanodet/optim/builder.py b/nanodet/optim/builder.py new file mode 100644 index 0000000..afcb114 --- /dev/null +++ b/nanodet/optim/builder.py @@ -0,0 +1,76 @@ +import copy +import logging + +import torch +from torch.nn import GroupNorm, LayerNorm +from torch.nn.modules.batchnorm import _BatchNorm + +NORMS = (GroupNorm, LayerNorm, _BatchNorm) + + +def build_optimizer(model, config): + """Build optimizer from config. + + Supports customised parameter-level hyperparameters. + The config should be like: + >>> optimizer: + >>> name: AdamW + >>> lr: 0.001 + >>> weight_decay: 0.05 + >>> no_norm_decay: True + >>> param_level_cfg: # parameter-level config + >>> backbone: + >>> lr_mult: 0.1 + """ + config = copy.deepcopy(config) + param_dict = {} + no_norm_decay = config.pop("no_norm_decay", False) + no_bias_decay = config.pop("no_bias_decay", False) + param_level_cfg = config.pop("param_level_cfg", {}) + base_lr = config.get("lr", None) + base_wd = config.get("weight_decay", None) + + name = config.pop("name") + optim_cls = getattr(torch.optim, name) + + logger = logging.getLogger("NanoDet") + + # custom param-wise lr and weight_decay + for name, p in model.named_parameters(): + if not p.requires_grad: + continue + param_dict[p] = {"name": name} + + for key in param_level_cfg: + if key in name: + if "lr_mult" in param_level_cfg[key] and base_lr: + param_dict[p].update( + {"lr": base_lr * param_level_cfg[key]["lr_mult"]} + ) + if "decay_mult" in param_level_cfg[key] and base_wd: + param_dict[p].update( + {"weight_decay": base_wd * param_level_cfg[key]["decay_mult"]} + ) + break + if no_norm_decay: + # update norms decay + for name, m in model.named_modules(): + if isinstance(m, NORMS): + param_dict[m.bias].update({"weight_decay": 0}) + param_dict[m.weight].update({"weight_decay": 0}) + if no_bias_decay: + # update bias decay + for name, m in model.named_modules(): + if hasattr(m, "bias"): + param_dict[m.bias].update({"weight_decay": 0}) + + # convert param dict to optimizer's param groups + param_groups = [] + for p, pconfig in param_dict.items(): + name = pconfig.pop("name", None) + if "weight_decay" in pconfig or "lr" in pconfig: + logger.info(f"special optimizer hyperparameter: {name} - {pconfig}") + param_groups += [{"params": p, **pconfig}] + + optimizer = optim_cls(param_groups, **config) + return optimizer diff --git a/nanodet/trainer/__init__.py b/nanodet/trainer/__init__.py new file mode 100644 index 0000000..8eb73d1 --- /dev/null +++ b/nanodet/trainer/__init__.py @@ -0,0 +1,16 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from .task import TrainingTask + +__all__ = ["TrainingTask"] diff --git a/nanodet/trainer/task.py b/nanodet/trainer/task.py new file mode 100644 index 0000000..d6ca89c --- /dev/null +++ b/nanodet/trainer/task.py @@ -0,0 +1,351 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import copy +import json +import os +import warnings +from typing import Any, Dict, List + +import torch +import torch.distributed as dist +from pytorch_lightning import LightningModule +from pytorch_lightning.utilities import rank_zero_only + +from nanodet.data.batch_process import stack_batch_img +from nanodet.optim import build_optimizer +from nanodet.util import convert_avg_params, gather_results, mkdir + +from ..model.arch import build_model +from ..model.weight_averager import build_weight_averager + + +class TrainingTask(LightningModule): + """ + Pytorch Lightning module of a general training task. + Including training, evaluating and testing. + Args: + cfg: Training configurations + evaluator: Evaluator for evaluating the model performance. + """ + + def __init__(self, cfg, evaluator=None): + super(TrainingTask, self).__init__() + self.cfg = cfg + self.model = build_model(cfg.model) + self.evaluator = evaluator + self.save_flag = -10 + self.log_style = "NanoDet" + self.weight_averager = None + if "weight_averager" in cfg.model: + self.weight_averager = build_weight_averager( + cfg.model.weight_averager, device=self.device + ) + self.avg_model = copy.deepcopy(self.model) + + def _preprocess_batch_input(self, batch): + batch_imgs = batch["img"] + if isinstance(batch_imgs, list): + batch_imgs = [img.to(self.device) for img in batch_imgs] + batch_img_tensor = stack_batch_img(batch_imgs, divisible=32) + batch["img"] = batch_img_tensor + return batch + + def forward(self, x): + x = self.model(x) + return x + + @torch.no_grad() + def predict(self, batch, batch_idx=None, dataloader_idx=None): + batch = self._preprocess_batch_input(batch) + preds = self.forward(batch["img"]) + results = self.model.head.post_process(preds, batch) + return results + + def training_step(self, batch, batch_idx): + batch = self._preprocess_batch_input(batch) + preds, loss, loss_states = self.model.forward_train(batch) + + # log train losses + if self.global_step % self.cfg.log.interval == 0: + memory = ( + torch.cuda.memory_reserved() / 1e9 if torch.cuda.is_available() else 0 + ) + lr = self.trainer.optimizers[0].param_groups[0]["lr"] + log_msg = "Train|Epoch{}/{}|Iter{}({}/{})| mem:{:.3g}G| lr:{:.2e}| ".format( + self.current_epoch + 1, + self.cfg.schedule.total_epochs, + self.global_step, + batch_idx + 1, + self.trainer.num_training_batches, + memory, + lr, + ) + self.scalar_summary("Train_loss/lr", "Train", lr, self.global_step) + for loss_name in loss_states: + log_msg += "{}:{:.4f}| ".format( + loss_name, loss_states[loss_name].mean().item() + ) + self.scalar_summary( + "Train_loss/" + loss_name, + "Train", + loss_states[loss_name].mean().item(), + self.global_step, + ) + self.logger.info(log_msg) + + return loss + + def training_epoch_end(self, outputs: List[Any]) -> None: + self.trainer.save_checkpoint(os.path.join(self.cfg.save_dir, "model_last.ckpt")) + + def validation_step(self, batch, batch_idx): + batch = self._preprocess_batch_input(batch) + if self.weight_averager is not None: + preds, loss, loss_states = self.avg_model.forward_train(batch) + else: + preds, loss, loss_states = self.model.forward_train(batch) + + if batch_idx % self.cfg.log.interval == 0: + memory = ( + torch.cuda.memory_reserved() / 1e9 if torch.cuda.is_available() else 0 + ) + lr = self.trainer.optimizers[0].param_groups[0]["lr"] + log_msg = "Val|Epoch{}/{}|Iter{}({}/{})| mem:{:.3g}G| lr:{:.2e}| ".format( + self.current_epoch + 1, + self.cfg.schedule.total_epochs, + self.global_step, + batch_idx + 1, + sum(self.trainer.num_val_batches), + memory, + lr, + ) + for loss_name in loss_states: + log_msg += "{}:{:.4f}| ".format( + loss_name, loss_states[loss_name].mean().item() + ) + self.logger.info(log_msg) + + dets = self.model.head.post_process(preds, batch) + return dets + + def validation_epoch_end(self, validation_step_outputs): + """ + Called at the end of the validation epoch with the + outputs of all validation steps.Evaluating results + and save best model. + Args: + validation_step_outputs: A list of val outputs + + """ + results = {} + for res in validation_step_outputs: + results.update(res) + all_results = ( + gather_results(results) + if dist.is_available() and dist.is_initialized() + else results + ) + if all_results: + eval_results = self.evaluator.evaluate( + all_results, self.cfg.save_dir, rank=self.local_rank + ) + metric = eval_results[self.cfg.evaluator.save_key] + # save best model + if metric > self.save_flag: + self.save_flag = metric + best_save_path = os.path.join(self.cfg.save_dir, "model_best") + mkdir(self.local_rank, best_save_path) + self.trainer.save_checkpoint( + os.path.join(best_save_path, "model_best.ckpt") + ) + self.save_model_state( + os.path.join(best_save_path, "nanodet_model_best.pth") + ) + txt_path = os.path.join(best_save_path, "eval_results.txt") + if self.local_rank < 1: + with open(txt_path, "a") as f: + f.write("Epoch:{}\n".format(self.current_epoch + 1)) + for k, v in eval_results.items(): + f.write("{}: {}\n".format(k, v)) + else: + warnings.warn( + "Warning! Save_key is not in eval results! Only save model last!" + ) + self.logger.log_metrics(eval_results, self.current_epoch + 1) + else: + self.logger.info("Skip val on rank {}".format(self.local_rank)) + + def test_step(self, batch, batch_idx): + dets = self.predict(batch, batch_idx) + return dets + + def test_epoch_end(self, test_step_outputs): + results = {} + for res in test_step_outputs: + results.update(res) + all_results = ( + gather_results(results) + if dist.is_available() and dist.is_initialized() + else results + ) + if all_results: + res_json = self.evaluator.results2json(all_results) + json_path = os.path.join(self.cfg.save_dir, "results.json") + json.dump(res_json, open(json_path, "w")) + + if self.cfg.test_mode == "val": + eval_results = self.evaluator.evaluate( + all_results, self.cfg.save_dir, rank=self.local_rank + ) + txt_path = os.path.join(self.cfg.save_dir, "eval_results.txt") + with open(txt_path, "a") as f: + for k, v in eval_results.items(): + f.write("{}: {}\n".format(k, v)) + else: + self.logger.info("Skip test on rank {}".format(self.local_rank)) + + def configure_optimizers(self): + """ + Prepare optimizer and learning-rate scheduler + to use in optimization. + + Returns: + optimizer + """ + optimizer_cfg = copy.deepcopy(self.cfg.schedule.optimizer) + optimizer = build_optimizer(self.model, optimizer_cfg) + + schedule_cfg = copy.deepcopy(self.cfg.schedule.lr_schedule) + name = schedule_cfg.pop("name") + build_scheduler = getattr(torch.optim.lr_scheduler, name) + scheduler = { + "scheduler": build_scheduler(optimizer=optimizer, **schedule_cfg), + "interval": "epoch", + "frequency": 1, + } + return dict(optimizer=optimizer, lr_scheduler=scheduler) + + def optimizer_step( + self, + epoch=None, + batch_idx=None, + optimizer=None, + optimizer_idx=None, + optimizer_closure=None, + on_tpu=None, + using_native_amp=None, + using_lbfgs=None, + ): + """ + Performs a single optimization step (parameter update). + Args: + epoch: Current epoch + batch_idx: Index of current batch + optimizer: A PyTorch optimizer + optimizer_idx: If you used multiple optimizers this indexes into that list. + optimizer_closure: closure for all optimizers + on_tpu: true if TPU backward is required + using_native_amp: True if using native amp + using_lbfgs: True if the matching optimizer is lbfgs + """ + # warm up lr + if self.trainer.global_step <= self.cfg.schedule.warmup.steps: + if self.cfg.schedule.warmup.name == "constant": + k = self.cfg.schedule.warmup.ratio + elif self.cfg.schedule.warmup.name == "linear": + k = 1 - ( + 1 - self.trainer.global_step / self.cfg.schedule.warmup.steps + ) * (1 - self.cfg.schedule.warmup.ratio) + elif self.cfg.schedule.warmup.name == "exp": + k = self.cfg.schedule.warmup.ratio ** ( + 1 - self.trainer.global_step / self.cfg.schedule.warmup.steps + ) + else: + raise Exception("Unsupported warm up type!") + for pg in optimizer.param_groups: + pg["lr"] = pg["initial_lr"] * k + + # update params + optimizer.step(closure=optimizer_closure) + optimizer.zero_grad() + + def scalar_summary(self, tag, phase, value, step): + """ + Write Tensorboard scalar summary log. + Args: + tag: Name for the tag + phase: 'Train' or 'Val' + value: Value to record + step: Step value to record + + """ + if self.local_rank < 1: + self.logger.experiment.add_scalars(tag, {phase: value}, step) + + def info(self, string): + self.logger.info(string) + + @rank_zero_only + def save_model_state(self, path): + self.logger.info("Saving model to {}".format(path)) + state_dict = ( + self.weight_averager.state_dict() + if self.weight_averager + else self.model.state_dict() + ) + torch.save({"state_dict": state_dict}, path) + + # ------------Hooks----------------- + def on_fit_start(self) -> None: + if "weight_averager" in self.cfg.model: + self.logger.info("Weight Averaging is enabled") + if self.weight_averager and self.weight_averager.has_inited(): + self.weight_averager.to(self.weight_averager.device) + return + self.weight_averager = build_weight_averager( + self.cfg.model.weight_averager, device=self.device + ) + self.weight_averager.load_from(self.model) + + def on_train_epoch_start(self): + self.model.set_epoch(self.current_epoch) + + def on_train_batch_end(self, outputs, batch, batch_idx) -> None: + if self.weight_averager: + self.weight_averager.update(self.model, self.global_step) + + def on_validation_epoch_start(self): + if self.weight_averager: + self.weight_averager.apply_to(self.avg_model) + + def on_test_epoch_start(self) -> None: + if self.weight_averager: + self.on_load_checkpoint({"state_dict": self.state_dict()}) + self.weight_averager.apply_to(self.model) + + def on_load_checkpoint(self, checkpointed_state: Dict[str, Any]) -> None: + if self.weight_averager: + avg_params = convert_avg_params(checkpointed_state) + if len(avg_params) != len(self.model.state_dict()): + self.logger.info( + "Weight averaging is enabled but average state does not" + "match the model" + ) + else: + self.weight_averager = build_weight_averager( + self.cfg.model.weight_averager, device=self.device + ) + self.weight_averager.load_state_dict(avg_params) + self.logger.info("Loaded average state from checkpoint.") diff --git a/nanodet/util/__init__.py b/nanodet/util/__init__.py new file mode 100644 index 0000000..46ccfab --- /dev/null +++ b/nanodet/util/__init__.py @@ -0,0 +1,43 @@ +from .box_transform import bbox2distance, distance2bbox +from .check_point import ( + convert_avg_params, + convert_old_model, + load_model_weight, + save_model, +) +from .config import cfg, load_config +from .flops_counter import get_model_complexity_info +from .logger import AverageMeter, Logger, MovingAverage, NanoDetLightningLogger +from .misc import images_to_levels, multi_apply, unmap +from .path import collect_files, mkdir +from .rank_filter import rank_filter +from .scatter_gather import gather_results, scatter_kwargs +from .util_mixins import NiceRepr +from .visualization import Visualizer, overlay_bbox_cv + +__all__ = [ + "distance2bbox", + "bbox2distance", + "convert_old_model", + "load_model_weight", + "save_model", + "cfg", + "load_config", + "get_model_complexity_info", + "AverageMeter", + "Logger", + "MovingAverage", + "images_to_levels", + "multi_apply", + "unmap", + "mkdir", + "rank_filter", + "gather_results", + "scatter_kwargs", + "NiceRepr", + "Visualizer", + "overlay_bbox_cv", + "collect_files", + "NanoDetLightningLogger", + "convert_avg_params", +] diff --git a/nanodet/util/box_transform.py b/nanodet/util/box_transform.py new file mode 100644 index 0000000..4b82a8c --- /dev/null +++ b/nanodet/util/box_transform.py @@ -0,0 +1,49 @@ +import torch + + +def distance2bbox(points, distance, max_shape=None): + """Decode distance prediction to bounding box. + + Args: + points (Tensor): Shape (n, 2), [x, y]. + distance (Tensor): Distance from the given point to 4 + boundaries (left, top, right, bottom). + max_shape (tuple): Shape of the image. + + Returns: + Tensor: Decoded bboxes. + """ + x1 = points[..., 0] - distance[..., 0] + y1 = points[..., 1] - distance[..., 1] + x2 = points[..., 0] + distance[..., 2] + y2 = points[..., 1] + distance[..., 3] + if max_shape is not None: + x1 = x1.clamp(min=0, max=max_shape[1]) + y1 = y1.clamp(min=0, max=max_shape[0]) + x2 = x2.clamp(min=0, max=max_shape[1]) + y2 = y2.clamp(min=0, max=max_shape[0]) + return torch.stack([x1, y1, x2, y2], -1) + + +def bbox2distance(points, bbox, max_dis=None, eps=0.1): + """Decode bounding box based on distances. + + Args: + points (Tensor): Shape (n, 2), [x, y]. + bbox (Tensor): Shape (n, 4), "xyxy" format + max_dis (float): Upper bound of the distance. + eps (float): a small value to ensure target < max_dis, instead <= + + Returns: + Tensor: Decoded distances. + """ + left = points[:, 0] - bbox[:, 0] + top = points[:, 1] - bbox[:, 1] + right = bbox[:, 2] - points[:, 0] + bottom = bbox[:, 3] - points[:, 1] + if max_dis is not None: + left = left.clamp(min=0, max=max_dis - eps) + top = top.clamp(min=0, max=max_dis - eps) + right = right.clamp(min=0, max=max_dis - eps) + bottom = bottom.clamp(min=0, max=max_dis - eps) + return torch.stack([left, top, right, bottom], -1) diff --git a/nanodet/util/check_point.py b/nanodet/util/check_point.py new file mode 100644 index 0000000..d88c3fa --- /dev/null +++ b/nanodet/util/check_point.py @@ -0,0 +1,111 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from collections import OrderedDict +from typing import Any, Dict + +import pytorch_lightning as pl +import torch + +from .rank_filter import rank_filter + + +def load_model_weight(model, checkpoint, logger): + state_dict = checkpoint["state_dict"].copy() + for k in checkpoint["state_dict"]: + # convert average model weights + if k.startswith("avg_model."): + v = state_dict.pop(k) + state_dict[k[4:]] = v + # strip prefix of state_dict + if list(state_dict.keys())[0].startswith("module."): + state_dict = {k[7:]: v for k, v in state_dict.items()} + if list(state_dict.keys())[0].startswith("model."): + state_dict = {k[6:]: v for k, v in state_dict.items()} + + model_state_dict = ( + model.module.state_dict() if hasattr(model, "module") else model.state_dict() + ) + + # check loaded parameters and created model parameters + for k in state_dict: + if k in model_state_dict: + if state_dict[k].shape != model_state_dict[k].shape: + logger.log( + "Skip loading parameter {}, required shape{}, " + "loaded shape{}.".format( + k, model_state_dict[k].shape, state_dict[k].shape + ) + ) + state_dict[k] = model_state_dict[k] + else: + logger.log("Drop parameter {}.".format(k)) + for k in model_state_dict: + if not (k in state_dict): + logger.log("No param {}.".format(k)) + state_dict[k] = model_state_dict[k] + model.load_state_dict(state_dict, strict=False) + + +@rank_filter +def save_model(model, path, epoch, iter, optimizer=None): + model_state_dict = ( + model.module.state_dict() if hasattr(model, "module") else model.state_dict() + ) + data = {"epoch": epoch, "state_dict": model_state_dict, "iter": iter} + if optimizer is not None: + data["optimizer"] = optimizer.state_dict() + + torch.save(data, path) + + +def convert_old_model(old_model_dict): + if "pytorch-lightning_version" in old_model_dict: + raise ValueError("This model is not old format. No need to convert!") + version = pl.__version__ + epoch = old_model_dict["epoch"] + global_step = old_model_dict["iter"] + state_dict = old_model_dict["state_dict"] + new_state_dict = OrderedDict() + for name, value in state_dict.items(): + new_state_dict["model." + name] = value + + new_checkpoint = { + "epoch": epoch, + "global_step": global_step, + "pytorch-lightning_version": version, + "state_dict": new_state_dict, + "lr_schedulers": [], + } + + if "optimizer" in old_model_dict: + optimizer_states = [old_model_dict["optimizer"]] + new_checkpoint["optimizer_states"] = optimizer_states + + return new_checkpoint + + +def convert_avg_params(checkpoint: Dict[str, Any]) -> Dict[str, Any]: + """Converts average state dict to the format that can be loaded to a model. + Args: + checkpoint: model. + Returns: + Converted average state dict. + """ + state_dict = checkpoint["state_dict"] + avg_weights = {} + for k, v in state_dict.items(): + if "avg_model" in k: + avg_weights[k[10:]] = v + return avg_weights diff --git a/nanodet/util/config.py b/nanodet/util/config.py new file mode 100644 index 0000000..8ff104b --- /dev/null +++ b/nanodet/util/config.py @@ -0,0 +1,39 @@ +from .yacs import CfgNode + +cfg = CfgNode(new_allowed=True) +cfg.save_dir = "./" +# common params for NETWORK +cfg.model = CfgNode(new_allowed=True) +cfg.model.arch = CfgNode(new_allowed=True) +cfg.model.arch.backbone = CfgNode(new_allowed=True) +cfg.model.arch.fpn = CfgNode(new_allowed=True) +cfg.model.arch.head = CfgNode(new_allowed=True) + +# DATASET related params +cfg.data = CfgNode(new_allowed=True) +cfg.data.train = CfgNode(new_allowed=True) +cfg.data.val = CfgNode(new_allowed=True) +cfg.device = CfgNode(new_allowed=True) +# train +cfg.schedule = CfgNode(new_allowed=True) + +# logger +cfg.log = CfgNode() +cfg.log.interval = 50 + +# testing +cfg.test = CfgNode() +# size of images for each device + + +def load_config(cfg, args_cfg): + cfg.defrost() + cfg.merge_from_file(args_cfg) + cfg.freeze() + + +if __name__ == "__main__": + import sys + + with open(sys.argv[1], "w") as f: + print(cfg, file=f) diff --git a/nanodet/util/env_utils.py b/nanodet/util/env_utils.py new file mode 100644 index 0000000..ec332a9 --- /dev/null +++ b/nanodet/util/env_utils.py @@ -0,0 +1,65 @@ +import os +import platform +import warnings + +import torch.multiprocessing as mp + + +def set_multi_processing( + mp_start_method: str = "fork", opencv_num_threads: int = 0, distributed: bool = True +) -> None: + """Set multi-processing related environment. + + This function is refered from https://github.com/open-mmlab/mmengine/blob/main/mmengine/utils/dl_utils/setup_env.py + + Args: + mp_start_method (str): Set the method which should be used to start + child processes. Defaults to 'fork'. + opencv_num_threads (int): Number of threads for opencv. + Defaults to 0. + distributed (bool): True if distributed environment. + Defaults to False. + """ # noqa + # set multi-process start method as `fork` to speed up the training + if platform.system() != "Windows": + current_method = mp.get_start_method(allow_none=True) + if current_method is not None and current_method != mp_start_method: + warnings.warn( + f"Multi-processing start method `{mp_start_method}` is " + f"different from the previous setting `{current_method}`." + f"It will be force set to `{mp_start_method}`. You can " + "change this behavior by changing `mp_start_method` in " + "your config." + ) + mp.set_start_method(mp_start_method, force=True) + + try: + import cv2 + + # disable opencv multithreading to avoid system being overloaded + cv2.setNumThreads(opencv_num_threads) + except ImportError: + pass + + # setup OMP threads + # This code is referred from https://github.com/pytorch/pytorch/blob/master/torch/distributed/run.py # noqa + if "OMP_NUM_THREADS" not in os.environ and distributed: + omp_num_threads = 1 + warnings.warn( + "Setting OMP_NUM_THREADS environment variable for each process" + f" to be {omp_num_threads} in default, to avoid your system " + "being overloaded, please further tune the variable for " + "optimal performance in your application as needed." + ) + os.environ["OMP_NUM_THREADS"] = str(omp_num_threads) + + # setup MKL threads + if "MKL_NUM_THREADS" not in os.environ and distributed: + mkl_num_threads = 1 + warnings.warn( + "Setting MKL_NUM_THREADS environment variable for each process" + f" to be {mkl_num_threads} in default, to avoid your system " + "being overloaded, please further tune the variable for " + "optimal performance in your application as needed." + ) + os.environ["MKL_NUM_THREADS"] = str(mkl_num_threads) diff --git a/nanodet/util/flops_counter.py b/nanodet/util/flops_counter.py new file mode 100644 index 0000000..baddd37 --- /dev/null +++ b/nanodet/util/flops_counter.py @@ -0,0 +1,575 @@ +# Modified from flops-counter.pytorch by Vladislav Sovrasov +# original repo: https://github.com/sovrasov/flops-counter.pytorch + +# MIT License + +# Copyright (c) 2018 Vladislav Sovrasov + +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: + +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. + +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. + +import sys +from functools import partial + +import numpy as np +import torch +import torch.nn as nn + + +def get_model_complexity_info( + model, + input_shape, + print_per_layer_stat=True, + as_strings=True, + input_constructor=None, + flush=False, + ost=sys.stdout, +): + """Get complexity information of a model. + This method can calculate FLOPs and parameter counts of a model with + corresponding input shape. It can also print complexity information for + each layer in a model. + Supported layers are listed as below: + - Convolutions: ``nn.Conv1d``, ``nn.Conv2d``, ``nn.Conv3d``. + - Activations: ``nn.ReLU``, ``nn.PReLU``, ``nn.ELU``, ``nn.LeakyReLU``, + ``nn.ReLU6``. + - Poolings: ``nn.MaxPool1d``, ``nn.MaxPool2d``, ``nn.MaxPool3d``, + ``nn.AvgPool1d``, ``nn.AvgPool2d``, ``nn.AvgPool3d``, + ``nn.AdaptiveMaxPool1d``, ``nn.AdaptiveMaxPool2d``, + ``nn.AdaptiveMaxPool3d``, ``nn.AdaptiveAvgPool1d``, + ``nn.AdaptiveAvgPool2d``, ``nn.AdaptiveAvgPool3d``. + - BatchNorms: ``nn.BatchNorm1d``, ``nn.BatchNorm2d``, + ``nn.BatchNorm3d``. + - Linear: ``nn.Linear``. + - Deconvolution: ``nn.ConvTranspose2d``. + - Upsample: ``nn.Upsample``. + Args: + model (nn.Module): The model for complexity calculation. + input_shape (tuple): Input shape used for calculation. + print_per_layer_stat (bool): Whether to print complexity information + for each layer in a model. Default: True. + as_strings (bool): Output FLOPs and params counts in a string form. + Default: True. + input_constructor (None | callable): If specified, it takes a callable + method that generates input. otherwise, it will generate a random + tensor with input shape to calculate FLOPs. Default: None. + flush (bool): same as that in :func:`print`. Default: False. + ost (stream): same as ``file`` param in :func:`print`. + Default: sys.stdout. + Returns: + tuple[float | str]: If ``as_strings`` is set to True, it will return + FLOPs and parameter counts in a string format. otherwise, it will + return those in a float number format. + """ + assert type(input_shape) is tuple + assert len(input_shape) >= 1 + assert isinstance(model, nn.Module) + flops_model = add_flops_counting_methods(model) + flops_model.eval() + flops_model.start_flops_count() + if input_constructor: + input = input_constructor(input_shape) + _ = flops_model(**input) + else: + try: + batch = torch.ones(()).new_empty( + (1, *input_shape), + dtype=next(flops_model.parameters()).dtype, + device=next(flops_model.parameters()).device, + ) + except StopIteration: + # Avoid StopIteration for models which have no parameters, + # like `nn.Relu()`, `nn.AvgPool2d`, etc. + batch = torch.ones(()).new_empty((1, *input_shape)) + + _ = flops_model(batch) + + flops_count, params_count = flops_model.compute_average_flops_cost() + if print_per_layer_stat: + print_model_with_flops( + flops_model, flops_count, params_count, ost=ost, flush=flush + ) + flops_model.stop_flops_count() + + if as_strings: + return flops_to_string(flops_count), params_to_string(params_count) + + return flops_count, params_count + + +def flops_to_string(flops, units="GFLOPs", precision=2): + """Convert FLOPs number into a string. + Note that Here we take a multiply-add counts as one FLOP. + Args: + flops (float): FLOPs number to be converted. + units (str | None): Converted FLOPs units. Options are None, 'GFLOPs', + 'MFLOPs', 'KFLOPs', 'FLOPs'. If set to None, it will automatically + choose the most suitable unit for FLOPs. Default: 'GFLOPs'. + precision (int): Digit number after the decimal point. Default: 2. + Returns: + str: The converted FLOPs number with units. + Examples: + >>> flops_to_string(1e9) + '1.0 GFLOPs' + >>> flops_to_string(2e5, 'MFLOPs') + '0.2 MFLOPs' + >>> flops_to_string(3e-9, None) + '3e-09 FLOPs' + """ + if units is None: + if flops // 10**9 > 0: + return str(round(flops / 10.0**9, precision)) + " GFLOPs" + elif flops // 10**6 > 0: + return str(round(flops / 10.0**6, precision)) + " MFLOPs" + elif flops // 10**3 > 0: + return str(round(flops / 10.0**3, precision)) + " KFLOPs" + else: + return str(flops) + " FLOPs" + else: + if units == "GFLOPs": + return str(round(flops / 10.0**9, precision)) + " " + units + elif units == "MFLOPs": + return str(round(flops / 10.0**6, precision)) + " " + units + elif units == "KFLOPs": + return str(round(flops / 10.0**3, precision)) + " " + units + else: + return str(flops) + " FLOPs" + + +def params_to_string(num_params, units=None, precision=2): + """Convert parameter number into a string. + Args: + num_params (float): Parameter number to be converted. + units (str | None): Converted FLOPs units. Options are None, 'M', + 'K' and ''. If set to None, it will automatically choose the most + suitable unit for Parameter number. Default: None. + precision (int): Digit number after the decimal point. Default: 2. + Returns: + str: The converted parameter number with units. + Examples: + >>> params_to_string(1e9) + '1000.0 M' + >>> params_to_string(2e5) + '200.0 k' + >>> params_to_string(3e-9) + '3e-09' + """ + if units is None: + if num_params // 10**6 > 0: + return str(round(num_params / 10**6, precision)) + " M" + elif num_params // 10**3: + return str(round(num_params / 10**3, precision)) + " k" + else: + return str(num_params) + else: + if units == "M": + return str(round(num_params / 10.0**6, precision)) + " " + units + elif units == "K": + return str(round(num_params / 10.0**3, precision)) + " " + units + else: + return str(num_params) + + +def print_model_with_flops( + model, + total_flops, + total_params, + units="GFLOPs", + precision=3, + ost=sys.stdout, + flush=False, +): + """Print a model with FLOPs for each layer. + Args: + model (nn.Module): The model to be printed. + total_flops (float): Total FLOPs of the model. + total_params (float): Total parameter counts of the model. + units (str | None): Converted FLOPs units. Default: 'GFLOPs'. + precision (int): Digit number after the decimal point. Default: 3. + ost (stream): same as `file` param in :func:`print`. + Default: sys.stdout. + flush (bool): same as that in :func:`print`. Default: False. + Example: + >>> class ExampleModel(nn.Module): + >>> def __init__(self): + >>> super().__init__() + >>> self.conv1 = nn.Conv2d(3, 8, 3) + >>> self.conv2 = nn.Conv2d(8, 256, 3) + >>> self.conv3 = nn.Conv2d(256, 8, 3) + >>> self.avg_pool = nn.AdaptiveAvgPool2d((1, 1)) + >>> self.flatten = nn.Flatten() + >>> self.fc = nn.Linear(8, 1) + >>> def forward(self, x): + >>> x = self.conv1(x) + >>> x = self.conv2(x) + >>> x = self.conv3(x) + >>> x = self.avg_pool(x) + >>> x = self.flatten(x) + >>> x = self.fc(x) + >>> return x + >>> model = ExampleModel() + >>> x = (3, 16, 16) + to print the complexity inforamtion state for each layer, you can use + >>> get_model_complexity_info(model, x) + or directly use + >>> print_model_with_flops(model, 4579784.0, 37361) + ExampleModel( + 0.037 M, 100.000% Params, 0.005 GFLOPs, 100.000% FLOPs, + (conv1): Conv2d(0.0 M, 0.600% Params, 0.0 GFLOPs, 0.959% FLOPs, 3, 8, kernel_size=(3, 3), stride=(1, 1)) # noqa: E501 + (conv2): Conv2d(0.019 M, 50.020% Params, 0.003 GFLOPs, 58.760% FLOPs, 8, 256, kernel_size=(3, 3), stride=(1, 1)) + (conv3): Conv2d(0.018 M, 49.356% Params, 0.002 GFLOPs, 40.264% FLOPs, 256, 8, kernel_size=(3, 3), stride=(1, 1)) + (avg_pool): AdaptiveAvgPool2d(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.017% FLOPs, output_size=(1, 1)) + (flatten): Flatten(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, ) + (fc): Linear(0.0 M, 0.024% Params, 0.0 GFLOPs, 0.000% FLOPs, in_features=8, out_features=1, bias=True) + ) + """ + + def accumulate_params(self): + if is_supported_instance(self): + return self.__params__ + else: + sum = 0 + for m in self.children(): + sum += m.accumulate_params() + return sum + + def accumulate_flops(self): + if is_supported_instance(self): + return self.__flops__ / model.__batch_counter__ + else: + sum = 0 + for m in self.children(): + sum += m.accumulate_flops() + return sum + + def flops_repr(self): + accumulated_num_params = self.accumulate_params() + accumulated_flops_cost = self.accumulate_flops() + return ", ".join( + [ + params_to_string( + accumulated_num_params, units="M", precision=precision + ), + "{:.3%} Params".format(accumulated_num_params / total_params), + flops_to_string( + accumulated_flops_cost, units=units, precision=precision + ), + "{:.3%} FLOPs".format(accumulated_flops_cost / total_flops), + self.original_extra_repr(), + ] + ) + + def add_extra_repr(m): + m.accumulate_flops = accumulate_flops.__get__(m) + m.accumulate_params = accumulate_params.__get__(m) + flops_extra_repr = flops_repr.__get__(m) + if m.extra_repr != flops_extra_repr: + m.original_extra_repr = m.extra_repr + m.extra_repr = flops_extra_repr + assert m.extra_repr != m.original_extra_repr + + def del_extra_repr(m): + if hasattr(m, "original_extra_repr"): + m.extra_repr = m.original_extra_repr + del m.original_extra_repr + if hasattr(m, "accumulate_flops"): + del m.accumulate_flops + + model.apply(add_extra_repr) + print(model, file=ost, flush=flush) + model.apply(del_extra_repr) + + +def get_model_parameters_number(model): + """Calculate parameter number of a model. + Args: + model (nn.module): The model for parameter number calculation. + Returns: + float: Parameter number of the model. + """ + num_params = sum(p.numel() for p in model.parameters() if p.requires_grad) + return num_params + + +def add_flops_counting_methods(net_main_module): + # adding additional methods to the existing module object, + # this is done this way so that each function has access to self object + net_main_module.start_flops_count = start_flops_count.__get__(net_main_module) + net_main_module.stop_flops_count = stop_flops_count.__get__(net_main_module) + net_main_module.reset_flops_count = reset_flops_count.__get__(net_main_module) + net_main_module.compute_average_flops_cost = compute_average_flops_cost.__get__( + net_main_module + ) # noqa: E501 + + net_main_module.reset_flops_count() + + return net_main_module + + +def compute_average_flops_cost(self): + """Compute average FLOPs cost. + A method to compute average FLOPs cost, which will be available after + `add_flops_counting_methods()` is called on a desired net object. + Returns: + float: Current mean flops consumption per image. + """ + batches_count = self.__batch_counter__ + flops_sum = 0 + for module in self.modules(): + if is_supported_instance(module): + flops_sum += module.__flops__ + params_sum = get_model_parameters_number(self) + return flops_sum / batches_count, params_sum + + +def start_flops_count(self): + """Activate the computation of mean flops consumption per image. + A method to activate the computation of mean flops consumption per image. + which will be available after ``add_flops_counting_methods()`` is called on + a desired net object. It should be called before running the network. + """ + add_batch_counter_hook_function(self) + + def add_flops_counter_hook_function(module): + if is_supported_instance(module): + if hasattr(module, "__flops_handle__"): + return + + else: + handle = module.register_forward_hook(MODULES_MAPPING[type(module)]) + + module.__flops_handle__ = handle + + self.apply(partial(add_flops_counter_hook_function)) + + +def stop_flops_count(self): + """Stop computing the mean flops consumption per image. + A method to stop computing the mean flops consumption per image, which will + be available after ``add_flops_counting_methods()`` is called on a desired + net object. It can be called to pause the computation whenever. + """ + remove_batch_counter_hook_function(self) + self.apply(remove_flops_counter_hook_function) + + +def reset_flops_count(self): + """Reset statistics computed so far. + A method to Reset computed statistics, which will be available after + `add_flops_counting_methods()` is called on a desired net object. + """ + add_batch_counter_variables_or_reset(self) + self.apply(add_flops_counter_variable_or_reset) + + +# ---- Internal functions +def empty_flops_counter_hook(module, input, output): + module.__flops__ += 0 + + +def upsample_flops_counter_hook(module, input, output): + output_size = output[0] + batch_size = output_size.shape[0] + output_elements_count = batch_size + for val in output_size.shape[1:]: + output_elements_count *= val + module.__flops__ += int(output_elements_count) + + +def relu_flops_counter_hook(module, input, output): + active_elements_count = output.numel() + module.__flops__ += int(active_elements_count) + + +def linear_flops_counter_hook(module, input, output): + input = input[0] + output_last_dim = output.shape[ + -1 + ] # pytorch checks dimensions, so here we don't care much + module.__flops__ += int(np.prod(input.shape) * output_last_dim) + + +def pool_flops_counter_hook(module, input, output): + input = input[0] + module.__flops__ += int(np.prod(input.shape)) + + +def bn_flops_counter_hook(module, input, output): + input = input[0] + + batch_flops = np.prod(input.shape) + if module.affine: + batch_flops *= 2 + module.__flops__ += int(batch_flops) + + +def deconv_flops_counter_hook(conv_module, input, output): + # Can have multiple inputs, getting the first one + input = input[0] + + batch_size = input.shape[0] + input_height, input_width = input.shape[2:] + + kernel_height, kernel_width = conv_module.kernel_size + in_channels = conv_module.in_channels + out_channels = conv_module.out_channels + groups = conv_module.groups + + filters_per_channel = out_channels // groups + conv_per_position_flops = ( + kernel_height * kernel_width * in_channels * filters_per_channel + ) + + active_elements_count = batch_size * input_height * input_width + overall_conv_flops = conv_per_position_flops * active_elements_count + bias_flops = 0 + if conv_module.bias is not None: + output_height, output_width = output.shape[2:] + bias_flops = out_channels * batch_size * output_height * output_height + overall_flops = overall_conv_flops + bias_flops + + conv_module.__flops__ += int(overall_flops) + + +def conv_flops_counter_hook(conv_module, input, output): + # Can have multiple inputs, getting the first one + input = input[0] + + batch_size = input.shape[0] + output_dims = list(output.shape[2:]) + + kernel_dims = list(conv_module.kernel_size) + in_channels = conv_module.in_channels + out_channels = conv_module.out_channels + groups = conv_module.groups + + filters_per_channel = out_channels // groups + conv_per_position_flops = ( + int(np.prod(kernel_dims)) * in_channels * filters_per_channel + ) + + active_elements_count = batch_size * int(np.prod(output_dims)) + + overall_conv_flops = conv_per_position_flops * active_elements_count + + bias_flops = 0 + + if conv_module.bias is not None: + + bias_flops = out_channels * active_elements_count + + overall_flops = overall_conv_flops + bias_flops + + conv_module.__flops__ += int(overall_flops) + + +def batch_counter_hook(module, input, output): + batch_size = 1 + if len(input) > 0: + # Can have multiple inputs, getting the first one + input = input[0] + batch_size = len(input) + else: + pass + print( + "Warning! No positional inputs found for a module, " + "assuming batch size is 1." + ) + module.__batch_counter__ += batch_size + + +def add_batch_counter_variables_or_reset(module): + + module.__batch_counter__ = 0 + + +def add_batch_counter_hook_function(module): + if hasattr(module, "__batch_counter_handle__"): + return + + handle = module.register_forward_hook(batch_counter_hook) + module.__batch_counter_handle__ = handle + + +def remove_batch_counter_hook_function(module): + if hasattr(module, "__batch_counter_handle__"): + module.__batch_counter_handle__.remove() + del module.__batch_counter_handle__ + + +def add_flops_counter_variable_or_reset(module): + if is_supported_instance(module): + if hasattr(module, "__flops__") or hasattr(module, "__params__"): + print( + "Warning: variables __flops__ or __params__ are already " + "defined for the module" + + type(module).__name__ + + " ptflops can affect your code!" + ) + module.__flops__ = 0 + module.__params__ = get_model_parameters_number(module) + + +def is_supported_instance(module): + if type(module) in MODULES_MAPPING: + return True + return False + + +def remove_flops_counter_hook_function(module): + if is_supported_instance(module): + if hasattr(module, "__flops_handle__"): + module.__flops_handle__.remove() + del module.__flops_handle__ + + +MODULES_MAPPING = { + # convolutions + nn.Conv1d: conv_flops_counter_hook, + nn.Conv2d: conv_flops_counter_hook, + nn.Conv3d: conv_flops_counter_hook, + # activations + nn.ReLU: relu_flops_counter_hook, + nn.PReLU: relu_flops_counter_hook, + nn.ELU: relu_flops_counter_hook, + nn.LeakyReLU: relu_flops_counter_hook, + nn.ReLU6: relu_flops_counter_hook, + # poolings + nn.MaxPool1d: pool_flops_counter_hook, + nn.AvgPool1d: pool_flops_counter_hook, + nn.AvgPool2d: pool_flops_counter_hook, + nn.MaxPool2d: pool_flops_counter_hook, + nn.MaxPool3d: pool_flops_counter_hook, + nn.AvgPool3d: pool_flops_counter_hook, + nn.AdaptiveMaxPool1d: pool_flops_counter_hook, + nn.AdaptiveAvgPool1d: pool_flops_counter_hook, + nn.AdaptiveMaxPool2d: pool_flops_counter_hook, + nn.AdaptiveAvgPool2d: pool_flops_counter_hook, + nn.AdaptiveMaxPool3d: pool_flops_counter_hook, + nn.AdaptiveAvgPool3d: pool_flops_counter_hook, + # BNs + nn.BatchNorm1d: bn_flops_counter_hook, + nn.BatchNorm2d: bn_flops_counter_hook, + nn.BatchNorm3d: bn_flops_counter_hook, + # FC + nn.Linear: linear_flops_counter_hook, + # Upscale + nn.Upsample: upsample_flops_counter_hook, + # Deconvolution + nn.ConvTranspose2d: deconv_flops_counter_hook, +} diff --git a/nanodet/util/logger.py b/nanodet/util/logger.py new file mode 100644 index 0000000..d726327 --- /dev/null +++ b/nanodet/util/logger.py @@ -0,0 +1,225 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging +import os +import time + +import numpy as np +from pytorch_lightning.loggers import Logger as LightningLoggerBase +from pytorch_lightning.loggers.logger import rank_zero_experiment +from pytorch_lightning.utilities import rank_zero_only +from pytorch_lightning.utilities.cloud_io import get_filesystem +from termcolor import colored + +from .path import mkdir + + +class Logger: + def __init__(self, local_rank, save_dir="./", use_tensorboard=True): + mkdir(local_rank, save_dir) + self.rank = local_rank + fmt = ( + colored("[%(name)s]", "magenta", attrs=["bold"]) + + colored("[%(asctime)s]", "blue") + + colored("%(levelname)s:", "green") + + colored("%(message)s", "white") + ) + logging.basicConfig( + level=logging.INFO, + filename=os.path.join(save_dir, "logs.txt"), + filemode="w", + ) + self.log_dir = os.path.join(save_dir, "logs") + console = logging.StreamHandler() + console.setLevel(logging.INFO) + formatter = logging.Formatter(fmt, datefmt="%m-%d %H:%M:%S") + console.setFormatter(formatter) + logging.getLogger().addHandler(console) + if use_tensorboard: + try: + from torch.utils.tensorboard import SummaryWriter + except ImportError: + raise ImportError( + 'Please run "pip install future tensorboard" to install ' + "the dependencies to use torch.utils.tensorboard " + "(applicable to PyTorch 1.1 or higher)" + ) from None + if self.rank < 1: + logging.info( + "Using Tensorboard, logs will be saved in {}".format(self.log_dir) + ) + self.writer = SummaryWriter(log_dir=self.log_dir) + + def log(self, string): + if self.rank < 1: + logging.info(string) + + def scalar_summary(self, tag, phase, value, step): + if self.rank < 1: + self.writer.add_scalars(tag, {phase: value}, step) + + +class MovingAverage(object): + def __init__(self, val, window_size=50): + self.window_size = window_size + self.reset() + self.push(val) + + def reset(self): + self.queue = [] + + def push(self, val): + self.queue.append(val) + if len(self.queue) > self.window_size: + self.queue.pop(0) + + def avg(self): + return np.mean(self.queue) + + +class AverageMeter(object): + """Computes and stores the average and current value""" + + def __init__(self, val): + self.reset() + self.update(val) + + def reset(self): + self.val = 0 + self.avg = 0 + self.sum = 0 + self.count = 0 + + def update(self, val, n=1): + self.val = val + self.sum += val * n + self.count += n + if self.count > 0: + self.avg = self.sum / self.count + + +class NanoDetLightningLogger(LightningLoggerBase): + def __init__(self, save_dir="./", **kwargs): + super().__init__() + self._name = "NanoDet" + self._version = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + self.log_dir = os.path.join(save_dir, f"logs-{self._version}") + + self._fs = get_filesystem(save_dir) + self._fs.makedirs(self.log_dir, exist_ok=True) + self._init_logger() + + self._experiment = None + self._kwargs = kwargs + + @property + def name(self): + return self._name + + @property + @rank_zero_experiment + def experiment(self): + r""" + Actual tensorboard object. To use TensorBoard features in your + :class:`~pytorch_lightning.core.lightning.LightningModule` do the following. + + Example:: + + self.logger.experiment.some_tensorboard_function() + + """ + if self._experiment is not None: + return self._experiment + + assert rank_zero_only.rank == 0, "tried to init log dirs in non global_rank=0" + + try: + from torch.utils.tensorboard import SummaryWriter + except ImportError: + raise ImportError( + 'Please run "pip install future tensorboard" to install ' + "the dependencies to use torch.utils.tensorboard " + "(applicable to PyTorch 1.1 or higher)" + ) from None + + self._experiment = SummaryWriter(log_dir=self.log_dir, **self._kwargs) + return self._experiment + + @property + def version(self): + return self._version + + @rank_zero_only + def _init_logger(self): + self.logger = logging.getLogger(name=self.name) + self.logger.setLevel(logging.INFO) + + # create file handler + fh = logging.FileHandler(os.path.join(self.log_dir, "logs.txt")) + fh.setLevel(logging.INFO) + # set file formatter + f_fmt = "[%(name)s][%(asctime)s]%(levelname)s: %(message)s" + file_formatter = logging.Formatter(f_fmt, datefmt="%m-%d %H:%M:%S") + fh.setFormatter(file_formatter) + + # create console handler + ch = logging.StreamHandler() + ch.setLevel(logging.INFO) + # set console formatter + c_fmt = ( + colored("[%(name)s]", "magenta", attrs=["bold"]) + + colored("[%(asctime)s]", "blue") + + colored("%(levelname)s:", "green") + + colored("%(message)s", "white") + ) + console_formatter = logging.Formatter(c_fmt, datefmt="%m-%d %H:%M:%S") + ch.setFormatter(console_formatter) + + # add the handlers to the logger + self.logger.addHandler(fh) + self.logger.addHandler(ch) + + @rank_zero_only + def info(self, string): + self.logger.info(string) + + @rank_zero_only + def log(self, string): + self.logger.info(string) + + @rank_zero_only + def dump_cfg(self, cfg_node): + with open(os.path.join(self.log_dir, "train_cfg.yml"), "w") as f: + cfg_node.dump(stream=f) + + @rank_zero_only + def log_hyperparams(self, params): + self.logger.info(f"hyperparams: {params}") + + @rank_zero_only + def log_metrics(self, metrics, step): + self.logger.info(f"Val_metrics: {metrics}") + for k, v in metrics.items(): + self.experiment.add_scalars("Val_metrics/" + k, {"Val": v}, step) + + @rank_zero_only + def save(self): + super().save() + + @rank_zero_only + def finalize(self, status): + self.experiment.flush() + self.experiment.close() + self.save() diff --git a/nanodet/util/misc.py b/nanodet/util/misc.py new file mode 100644 index 0000000..961b77b --- /dev/null +++ b/nanodet/util/misc.py @@ -0,0 +1,52 @@ +# Modification 2020 RangiLyu +# Copyright 2018-2019 Open-MMLab. + +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at + +# http://www.apache.org/licenses/LICENSE-2.0 + +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from functools import partial + +import torch + + +def multi_apply(func, *args, **kwargs): + pfunc = partial(func, **kwargs) if kwargs else func + map_results = map(pfunc, *args) + return tuple(map(list, zip(*map_results))) + + +def images_to_levels(target, num_level_anchors): + """Convert targets by image to targets by feature level. + + [target_img0, target_img1] -> [target_level0, target_level1, ...] + """ + target = torch.stack(target, 0) + level_targets = [] + start = 0 + for n in num_level_anchors: + end = start + n + level_targets.append(target[:, start:end].squeeze(0)) + start = end + return level_targets + + +def unmap(data, count, inds, fill=0): + """Unmap a subset of item (data) back to the original set of items (of + size count)""" + if data.dim() == 1: + ret = data.new_full((count,), fill) + ret[inds.type(torch.bool)] = data + else: + new_size = (count,) + data.size()[1:] + ret = data.new_full(new_size, fill) + ret[inds.type(torch.bool), :] = data + return ret diff --git a/nanodet/util/path.py b/nanodet/util/path.py new file mode 100644 index 0000000..85bfa69 --- /dev/null +++ b/nanodet/util/path.py @@ -0,0 +1,34 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +from .rank_filter import rank_filter + + +@rank_filter +def mkdir(path): + if not os.path.exists(path): + os.makedirs(path) + + +def collect_files(path, exts): + file_paths = [] + for maindir, subdir, filename_list in os.walk(path): + for filename in filename_list: + file_path = os.path.join(maindir, filename) + ext = os.path.splitext(file_path)[1] + if ext in exts: + file_paths.append(file_path) + return file_paths diff --git a/nanodet/util/rank_filter.py b/nanodet/util/rank_filter.py new file mode 100644 index 0000000..2316b2f --- /dev/null +++ b/nanodet/util/rank_filter.py @@ -0,0 +1,23 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +def rank_filter(func): + def func_filter(local_rank=-1, *args, **kwargs): + if local_rank < 1: + return func(*args, **kwargs) + else: + pass + + return func_filter diff --git a/nanodet/util/scatter_gather.py b/nanodet/util/scatter_gather.py new file mode 100644 index 0000000..5660a81 --- /dev/null +++ b/nanodet/util/scatter_gather.py @@ -0,0 +1,97 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pickle + +import torch +import torch.distributed as dist +from torch.autograd import Variable +from torch.nn.parallel._functions import Scatter + + +def list_scatter(input, target_gpus, chunk_sizes): + ret = [] + for idx, size in enumerate(chunk_sizes): + ret.append(input[:size]) + del input[:size] + return tuple(ret) + + +def scatter(inputs, target_gpus, dim=0, chunk_sizes=None): + """ + Slices variables into approximately equal chunks and + distributes them across given GPUs. Duplicates + references to objects that are not variables. Does not + support Tensors. + """ + + def scatter_map(obj): + if isinstance(obj, Variable): + return Scatter.apply(target_gpus, chunk_sizes, dim, obj) + assert not torch.is_tensor(obj), "Tensors not supported in scatter." + if isinstance(obj, list): + return list_scatter(obj, target_gpus, chunk_sizes) + if isinstance(obj, tuple): + return list(zip(*map(scatter_map, obj))) + if isinstance(obj, dict): + return list(map(type(obj), zip(*map(scatter_map, obj.items())))) + return [obj for targets in target_gpus] + + return scatter_map(inputs) + + +def scatter_kwargs(inputs, kwargs, target_gpus, dim=0, chunk_sizes=None): + r"""Scatter with support for kwargs dictionary""" + inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else [] + kwargs = scatter(kwargs, target_gpus, dim, chunk_sizes) if kwargs else [] + if len(inputs) < len(kwargs): + inputs.extend([() for _ in range(len(kwargs) - len(inputs))]) + elif len(kwargs) < len(inputs): + kwargs.extend([{} for _ in range(len(inputs) - len(kwargs))]) + inputs = tuple(inputs) + kwargs = tuple(kwargs) + return inputs, kwargs + + +def gather_results(result_part): + rank = -1 + world_size = 1 + if dist.is_available() and dist.is_initialized(): + rank = dist.get_rank() + world_size = dist.get_world_size() + + # dump result part to tensor with pickle + part_tensor = torch.tensor( + bytearray(pickle.dumps(result_part)), dtype=torch.uint8, device="cuda" + ) + + # gather all result part tensor shape + shape_tensor = torch.tensor(part_tensor.shape, device="cuda") + shape_list = [shape_tensor.clone() for _ in range(world_size)] + dist.all_gather(shape_list, shape_tensor) + + # padding result part tensor to max length + shape_max = torch.tensor(shape_list).max() + part_send = torch.zeros(shape_max, dtype=torch.uint8, device="cuda") + part_send[: shape_tensor[0]] = part_tensor + part_recv_list = [part_tensor.new_zeros(shape_max) for _ in range(world_size)] + + # gather all result dict + dist.all_gather(part_recv_list, part_send) + + if rank < 1: + all_res = {} + for recv, shape in zip(part_recv_list, shape_list): + all_res.update(pickle.loads(recv[: shape[0]].cpu().numpy().tobytes())) + return all_res diff --git a/nanodet/util/util_mixins.py b/nanodet/util/util_mixins.py new file mode 100644 index 0000000..278aa03 --- /dev/null +++ b/nanodet/util/util_mixins.py @@ -0,0 +1,105 @@ +"""This module defines the :class:`NiceRepr` mixin class, which defines a +``__repr__`` and ``__str__`` method that only depend on a custom ``__nice__`` +method, which you must define. This means you only have to overload one +function instead of two. Furthermore, if the object defines a ``__len__`` +method, then the ``__nice__`` method defaults to something sensible, otherwise +it is treated as abstract and raises ``NotImplementedError``. + +To use simply have your object inherit from :class:`NiceRepr` +(multi-inheritance should be ok). + +This code was copied from the ubelt library: https://github.com/Erotemic/ubelt + +Example: + >>> # Objects that define __nice__ have a default __str__ and __repr__ + >>> class Student(NiceRepr): + ... def __init__(self, name): + ... self.name = name + ... def __nice__(self): + ... return self.name + >>> s1 = Student('Alice') + >>> s2 = Student('Bob') + >>> print(f's1 = {s1}') + >>> print(f's2 = {s2}') + s1 = + s2 = + +Example: + >>> # Objects that define __len__ have a default __nice__ + >>> class Group(NiceRepr): + ... def __init__(self, data): + ... self.data = data + ... def __len__(self): + ... return len(self.data) + >>> g = Group([1, 2, 3]) + >>> print(f'g = {g}') + g = +""" +import warnings + + +class NiceRepr(object): + """Inherit from this class and define ``__nice__`` to "nicely" print your + objects. + + Defines ``__str__`` and ``__repr__`` in terms of ``__nice__`` function + Classes that inherit from :class:`NiceRepr` should redefine ``__nice__``. + If the inheriting class has a ``__len__``, method then the default + ``__nice__`` method will return its length. + + Example: + >>> class Foo(NiceRepr): + ... def __nice__(self): + ... return 'info' + >>> foo = Foo() + >>> assert str(foo) == '' + >>> assert repr(foo).startswith('>> class Bar(NiceRepr): + ... pass + >>> bar = Bar() + >>> import pytest + >>> with pytest.warns(None) as record: + >>> assert 'object at' in str(bar) + >>> assert 'object at' in repr(bar) + + Example: + >>> class Baz(NiceRepr): + ... def __len__(self): + ... return 5 + >>> baz = Baz() + >>> assert str(baz) == '' + """ + + def __nice__(self): + """str: a "nice" summary string describing this module""" + if hasattr(self, "__len__"): + # It is a common pattern for objects to use __len__ in __nice__ + # As a convenience we define a default __nice__ for these objects + return str(len(self)) + else: + # In all other cases force the subclass to overload __nice__ + raise NotImplementedError( + f"Define the __nice__ method for {self.__class__!r}" + ) + + def __repr__(self): + """str: the string of the module""" + try: + nice = self.__nice__() + classname = self.__class__.__name__ + return f"<{classname}({nice}) at {hex(id(self))}>" + except NotImplementedError as ex: + warnings.warn(str(ex), category=RuntimeWarning) + return object.__repr__(self) + + def __str__(self): + """str: the string of the module""" + try: + classname = self.__class__.__name__ + nice = self.__nice__() + return f"<{classname}({nice})>" + except NotImplementedError as ex: + warnings.warn(str(ex), category=RuntimeWarning) + return object.__repr__(self) diff --git a/nanodet/util/visualization.py b/nanodet/util/visualization.py new file mode 100644 index 0000000..44badcd --- /dev/null +++ b/nanodet/util/visualization.py @@ -0,0 +1,742 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import matplotlib as mpl +import matplotlib.figure as mplfigure +import numpy as np +import pycocotools.mask as mask_util +from matplotlib.backends.backend_agg import FigureCanvasAgg + +_SMALL_OBJECT_AREA_THRESH = 1000 + + +def overlay_bbox_cv(img, dets, class_names, score_thresh): + all_box = [] + for label in dets: + for bbox in dets[label]: + score = bbox[-1] + if score > score_thresh: + x0, y0, x1, y1 = [int(i) for i in bbox[:4]] + all_box.append([label, x0, y0, x1, y1, score]) + all_box.sort(key=lambda v: v[5]) + for box in all_box: + label, x0, y0, x1, y1, score = box + # color = self.cmap(i)[:3] + color = (_COLORS[label] * 255).astype(np.uint8).tolist() + text = "{}:{:.1f}%".format(class_names[label], score * 100) + txt_color = (0, 0, 0) if np.mean(_COLORS[label]) > 0.5 else (255, 255, 255) + font = cv2.FONT_HERSHEY_SIMPLEX + txt_size = cv2.getTextSize(text, font, 0.5, 2)[0] + cv2.rectangle(img, (x0, y0), (x1, y1), color, 2) + + cv2.rectangle( + img, + (x0, y0 - txt_size[1] - 1), + (x0 + txt_size[0] + txt_size[1], y0 - 1), + color, + -1, + ) + cv2.putText(img, text, (x0, y0 - 1), font, 0.5, txt_color, thickness=1) + return img,all_box + + +def rand_cmap( + nlabels, + type="bright", + first_color_black=False, + last_color_black=False, + verbose=False, +): + """ + Creates a random colormap to be used together with matplotlib. + Useful for segmentation tasks + :param nlabels: Number of labels (size of colormap) + :param type: 'bright' for strong colors, 'soft' for pastel colors + :param first_color_black: Option to use first color as black, True or False + :param last_color_black: Option to use last color as black, True or False + :param verbose: Prints the number of labels and shows the colormap. True or False + :return: colormap for matplotlib + """ + import colorsys + + import numpy as np + from matplotlib.colors import LinearSegmentedColormap + + if type not in ("bright", "soft"): + print('Please choose "bright" or "soft" for type') + return + + if verbose: + print("Number of labels: " + str(nlabels)) + + # Generate color map for bright colors, based on hsv + if type == "bright": + randHSVcolors = [ + ( + np.random.uniform(low=0.0, high=1), + np.random.uniform(low=0.2, high=1), + np.random.uniform(low=0.9, high=1), + ) + for i in range(nlabels) + ] + + # Convert HSV list to RGB + randRGBcolors = [] + for HSVcolor in randHSVcolors: + randRGBcolors.append( + colorsys.hsv_to_rgb(HSVcolor[0], HSVcolor[1], HSVcolor[2]) + ) + + if first_color_black: + randRGBcolors[0] = [0, 0, 0] + + if last_color_black: + randRGBcolors[-1] = [0, 0, 0] + + random_colormap = LinearSegmentedColormap.from_list( + "new_map", randRGBcolors, N=nlabels + ) + + # Generate soft pastel colors, by limiting the RGB spectrum + if type == "soft": + low = 0.6 + high = 0.95 + randRGBcolors = [ + ( + np.random.uniform(low=low, high=high), + np.random.uniform(low=low, high=high), + np.random.uniform(low=low, high=high), + ) + for i in range(nlabels) + ] + + if first_color_black: + randRGBcolors[0] = [0, 0, 0] + + if last_color_black: + randRGBcolors[-1] = [0, 0, 0] + random_colormap = LinearSegmentedColormap.from_list( + "new_map", randRGBcolors, N=nlabels + ) + + return random_colormap + + +class VisImage: + """ + Visualize detection results. + + Modified from Detectron2 + https://github.com/facebookresearch/detectron2 + """ + + def __init__(self, img, scale=1.0): + self.img = img + self.scale = scale + self.width, self.height = img.shape[1], img.shape[0] + self._setup_figure(img) + + def _setup_figure(self, img): + """ + Args: + Same as in :meth:`__init__()`. + + Returns: + fig (matplotlib.pyplot.figure): top level container for all the + image plot elements. + ax (matplotlib.pyplot.Axes): contains figure elements and sets + the coordinate system. + """ + fig = mplfigure.Figure(frameon=False) + self.dpi = fig.get_dpi() + # add a small 1e-2 to avoid precision lost due to matplotlib's truncation + # (https://github.com/matplotlib/matplotlib/issues/15363) + fig.set_size_inches( + (self.width * self.scale + 1e-2) / self.dpi, + (self.height * self.scale + 1e-2) / self.dpi, + ) + self.canvas = FigureCanvasAgg(fig) + # self.canvas = mpl.backends.backend_cairo.FigureCanvasCairo(fig) + ax = fig.add_axes([0.0, 0.0, 1.0, 1.0]) + ax.axis("off") + ax.set_xlim(0.0, self.width) + ax.set_ylim(self.height) + + self.fig = fig + self.ax = ax + + def save(self, filepath): + """ + Args: + filepath (str): a string that contains the absolute path, including + the file name, where the visualized image will be saved. + """ + if filepath.lower().endswith(".jpg") or filepath.lower().endswith(".png"): + # faster than matplotlib's imshow + cv2.imwrite(filepath, self.get_image()[:, :, ::-1]) + else: + # support general formats (e.g. pdf) + self.ax.imshow(self.img, interpolation="nearest") + self.fig.savefig(filepath) + + def get_image(self): + """ + Returns: + ndarray: + the visualized image of shape (H, W, 3) (RGB) in uint8 type. + The shape is scaled w.r.t the input image using the given + `scale` argument. + """ + canvas = self.canvas + s, (width, height) = canvas.print_to_buffer() + if (self.width, self.height) != (width, height): + img = cv2.resize(self.img, (width, height)) + else: + img = self.img + + # buf = io.BytesIO() # works for cairo backend + # canvas.print_rgba(buf) + # width, height = self.width, self.height + # s = buf.getvalue() + + buffer = np.frombuffer(s, dtype="uint8") + + # imshow is slow. blend manually (still quite slow) + img_rgba = buffer.reshape(height, width, 4) + rgb, alpha = np.split(img_rgba, [3], axis=2) + + try: + import numexpr as ne # fuse them with numexpr + + visualized_image = ne.evaluate( + "img * (1 - alpha / 255.0) + rgb * (alpha / 255.0)" + ) + except ImportError: + alpha = alpha.astype("float32") / 255.0 + visualized_image = img * (1 - alpha) + rgb * alpha + + visualized_image = visualized_image.astype("uint8") + + return visualized_image + + +class Visualizer: + def __init__(self, img, dets, class_names, socre_thresh): + self.img = img + self.dets = dets + self.class_names = class_names + self.num_classes = len(self.class_names) + self.score_thresh = socre_thresh + self.viz = VisImage(img=self.img) + self._default_font_size = max( + np.sqrt(self.viz.height * self.viz.width) // 100, 10 + ) + + def mask_to_polygon(self, mask, need_binary=True): + res = cv2.findContours(mask, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE) + hierarchy = res[-1] + if hierarchy is None: # empty mask + return None, None, None + has_holes = (hierarchy.reshape(-1, 4)[:, 3] >= 0).sum() > 0 + res = res[-2] + res = [x.flatten() for x in res] + res = [x for x in res if len(x) >= 6] + + p = mask_util.frPyObjects(res, self.viz.height, self.viz.width) + p = mask_util.merge(p) + bbox = mask_util.toBbox(p) + bbox[2] += bbox[0] + bbox[3] += bbox[1] + + return res, bbox, has_holes + + def draw_box(self, box_coord, alpha=0.5, edge_color="g", line_style="-"): + x0, y0, x1, y1 = box_coord + width = x1 - x0 + height = y1 - y0 + linewidth = max(self._default_font_size / 6, 1) + self.viz.ax.add_patch( + mpl.patches.Rectangle( + (x0, y0), + width, + height, + fill=False, + edgecolor=edge_color, + linewidth=linewidth * self.viz.scale, + alpha=alpha, + linestyle=line_style, + ) + ) + return self.viz + + def draw_polycon(self, mask, color, edge_color, alpha=0.5): + if edge_color is None: + edge_color = color + edge_color = mpl.colors.to_rgb(edge_color) + (1,) + + polygon = mpl.patches.Polygon( + mask, + fill=False, + # facecolor=mpl.colors.to_rgb(color) + (alpha,), + edgecolor=edge_color, + linewidth=max(self._default_font_size // 15 * self.viz.scale, 1), + ) + self.viz.ax.add_patch(polygon) + return self.viz + + def draw_mask(self, mask, polys, color, edge_color, alpha=0.5): + if edge_color is None: + edge_color = color + edge_color = mpl.colors.to_rgb(edge_color) + (1,) + color_mask = np.ones((mask.shape[0], mask.shape[1], 3)) + for i in range(3): + color_mask[:, :, i] = color[i] + self.viz.ax.imshow(np.dstack((color_mask, mask * alpha))) + for ploy in polys: + self.draw_polycon(ploy.reshape(-1, 2), color, edge_color=None, alpha=alpha) + + def _jitter(self, color): + """ + Randomly modifies given color to produce a slightly different color than + the color given. + + Args: + color (tuple[double]): a tuple of 3 elements, containing the RGB + values of the color picked. The values in the list are in the + [0.0, 1.0] range. + + Returns: + jittered_color (tuple[double]): a tuple of 3 elements, containing + the RGB values of the color after being jittered. The values + in the list are in the [0.0, 1.0] range. + """ + color = mpl.colors.to_rgb(color) + vec = np.random.rand(3) + # better to do it in another color space + vec = vec / np.linalg.norm(vec) * 0.5 + res = np.clip(vec + color, 0, 1) + return tuple(res) + + def overlay_bbox(self, alpha=1.0): + for label in self.dets: + for bbox in self.dets[label]: + x0, y0, x1, y1, score = bbox + if score >= self.score_thresh: + # color = self.cmap(i)[:3] + color = _COLORS[label] + text = "{}:{:.1f}%".format(self.class_names[label], score * 100) + self.draw_box(bbox[:4], alpha=1.0, edge_color=color, line_style="-") + text_pos = (x0, y0) + instance_area = (y1 - y0) * (x1 - x0) + if ( + instance_area < _SMALL_OBJECT_AREA_THRESH * self.viz.scale + or y1 - y0 < 40 * self.viz.scale + ): + if y1 >= self.viz.height - 5: + text_pos = (x1, y0) + else: + text_pos = (x0, y1) + + height_ratio = (y1 - y0) / np.sqrt(self.viz.height * self.viz.width) + font_size = ( + np.clip((height_ratio - 0.02) / 0.08 + 1, 1.2, 2) + * 0.5 + * self._default_font_size + ) + + self.draw_text( + text, + text_pos, + color="black", + horizontal_alignment="left", + font_size=font_size, + ) + out = self.viz.get_image() + return out + + def overlay_masks(self, alpha=0.5): + ov = self.img.copy() + im = self.img # .astype(np.float32) + total_ma = np.zeros([im.shape[0], im.shape[1]]) + total_contours = [] + for i, det in enumerate(self.dets[::-1]): + score = det["score"] + if score >= self.score_thresh: + ma = det["mask"] + _, ma = cv2.threshold( + ma, thresh=127, maxval=255, type=cv2.THRESH_BINARY + ) + fg = ( + im * alpha + + np.ones(im.shape) * (1 - alpha) * self.cmap(i)[:3] * 255 + ) + ov[ma == 255] = fg[ma == 255] + total_ma += ma + contours = cv2.findContours( + ma.copy(), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE + )[-2:] + total_contours.append(contours) + for cnt in total_contours: + cv2.drawContours(ov, cnt[0], -1, (0.0, 0.0, 0.0), 1) + ov[total_ma == 0] = im[total_ma == 0] + return ov + + def overlay_instance(self, alpha=0.4): + for i, det in enumerate(self.dets[::-1]): + score = det["score"] + if score >= self.score_thresh: + label = det["label"] + binary_mask = det["mask"] + # color = self.cmap(i)[:3] + color = _COLORS[label] + color = self._jitter(color) + contours, bbox, has_holes = self.mask_to_polygon(binary_mask.copy()) + if not contours: + continue + self.draw_mask( + binary_mask, contours, color, edge_color=None, alpha=alpha + ) + + x0, y0, x1, y1 = bbox + text = "{}:{:.1f}%".format(self.class_names[label], score * 100) + text_pos = np.median(binary_mask.nonzero(), axis=1)[::-1] + instance_area = (y1 - y0) * (x1 - x0) + if ( + instance_area < _SMALL_OBJECT_AREA_THRESH * self.viz.scale + or y1 - y0 < 40 * self.viz.scale + ): + if y1 >= self.viz.height - 5: + text_pos = (x1, y0) + else: + text_pos = (x0, y1) + + height_ratio = (y1 - y0) / np.sqrt(self.viz.height * self.viz.width) + font_size = ( + np.clip((height_ratio - 0.02) / 0.08 + 1, 1.2, 2) + * 0.5 + * self._default_font_size + ) + + self.draw_text( + text, + text_pos, + color="black", + horizontal_alignment="center", + font_size=font_size, + ) + out = self.viz.get_image() + return out + + def draw_text( + self, + text, + position, + *, + font_size=None, + color="g", + horizontal_alignment="center", + rotation=0 + ): + """ + Args: + text (str): class label + position (tuple): a tuple of the x and y coordinates to place text on image. + font_size (int, optional): font of the text. If not provided, a font size + proportional to the image width is calculated and used. + color: color of the text. Refer to `matplotlib.colors` for full list + of formats that are accepted. + horizontal_alignment (str): see `matplotlib.text.Text` + rotation: rotation angle in degrees CCW + + Returns: + output (VisImage): image object with text drawn. + """ + if not font_size: + font_size = self._default_font_size + + # since the text background is dark, we don't want the text to be dark + color = np.maximum(list(mpl.colors.to_rgb(color)), 0.2) + color[np.argmax(color)] = max(0.8, np.max(color)) + + x, y = position + self.viz.ax.text( + x, + y, + text, + size=font_size * self.viz.scale, + family="sans-serif", + bbox={ + "facecolor": (0.5, 0.5, 1.0), + "alpha": 0.8, + "pad": 0.7, + "edgecolor": (0.8, 0.8, 1.0), + }, + verticalalignment="top", + horizontalalignment=horizontal_alignment, + color=color, + zorder=10, + rotation=rotation, + ) + return self.viz + + +_COLORS = ( + np.array( + [ + 0.000, + 0.447, + 0.741, + 0.850, + 0.325, + 0.098, + 0.929, + 0.694, + 0.125, + 0.494, + 0.184, + 0.556, + 0.466, + 0.674, + 0.188, + 0.301, + 0.745, + 0.933, + 0.635, + 0.078, + 0.184, + 0.300, + 0.300, + 0.300, + 0.600, + 0.600, + 0.600, + 1.000, + 0.000, + 0.000, + 1.000, + 0.500, + 0.000, + 0.749, + 0.749, + 0.000, + 0.000, + 1.000, + 0.000, + 0.000, + 0.000, + 1.000, + 0.667, + 0.000, + 1.000, + 0.333, + 0.333, + 0.000, + 0.333, + 0.667, + 0.000, + 0.333, + 1.000, + 0.000, + 0.667, + 0.333, + 0.000, + 0.667, + 0.667, + 0.000, + 0.667, + 1.000, + 0.000, + 1.000, + 0.333, + 0.000, + 1.000, + 0.667, + 0.000, + 1.000, + 1.000, + 0.000, + 0.000, + 0.333, + 0.500, + 0.000, + 0.667, + 0.500, + 0.000, + 1.000, + 0.500, + 0.333, + 0.000, + 0.500, + 0.333, + 0.333, + 0.500, + 0.333, + 0.667, + 0.500, + 0.333, + 1.000, + 0.500, + 0.667, + 0.000, + 0.500, + 0.667, + 0.333, + 0.500, + 0.667, + 0.667, + 0.500, + 0.667, + 1.000, + 0.500, + 1.000, + 0.000, + 0.500, + 1.000, + 0.333, + 0.500, + 1.000, + 0.667, + 0.500, + 1.000, + 1.000, + 0.500, + 0.000, + 0.333, + 1.000, + 0.000, + 0.667, + 1.000, + 0.000, + 1.000, + 1.000, + 0.333, + 0.000, + 1.000, + 0.333, + 0.333, + 1.000, + 0.333, + 0.667, + 1.000, + 0.333, + 1.000, + 1.000, + 0.667, + 0.000, + 1.000, + 0.667, + 0.333, + 1.000, + 0.667, + 0.667, + 1.000, + 0.667, + 1.000, + 1.000, + 1.000, + 0.000, + 1.000, + 1.000, + 0.333, + 1.000, + 1.000, + 0.667, + 1.000, + 0.333, + 0.000, + 0.000, + 0.500, + 0.000, + 0.000, + 0.667, + 0.000, + 0.000, + 0.833, + 0.000, + 0.000, + 1.000, + 0.000, + 0.000, + 0.000, + 0.167, + 0.000, + 0.000, + 0.333, + 0.000, + 0.000, + 0.500, + 0.000, + 0.000, + 0.667, + 0.000, + 0.000, + 0.833, + 0.000, + 0.000, + 1.000, + 0.000, + 0.000, + 0.000, + 0.167, + 0.000, + 0.000, + 0.333, + 0.000, + 0.000, + 0.500, + 0.000, + 0.000, + 0.667, + 0.000, + 0.000, + 0.833, + 0.000, + 0.000, + 1.000, + 0.000, + 0.000, + 0.000, + 0.143, + 0.143, + 0.143, + 0.286, + 0.286, + 0.286, + 0.429, + 0.429, + 0.429, + 0.571, + 0.571, + 0.571, + 0.714, + 0.714, + 0.714, + 0.857, + 0.857, + 0.857, + 0.000, + 0.447, + 0.741, + 0.314, + 0.717, + 0.741, + 0.50, + 0.5, + 0, + ] + ) + .astype(np.float32) + .reshape(-1, 3) +) diff --git a/nanodet/util/yacs.py b/nanodet/util/yacs.py new file mode 100644 index 0000000..1cbe16c --- /dev/null +++ b/nanodet/util/yacs.py @@ -0,0 +1,531 @@ +# Copyright (c) 2018-present, Facebook, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +############################################################################## +"""YACS -- Yet Another Configuration System is designed to be a simple +configuration management system for academic and industrial research +projects. + +See README.md for usage and examples. +""" + +import copy +import io +import logging +import os +import sys +from ast import literal_eval + +import yaml + +# Flag for py2 and py3 compatibility to use when separate code paths are necessary +# When _PY2 is False, we assume Python 3 is in use +_PY2 = sys.version_info.major == 2 + +# Filename extensions for loading configs from files +_YAML_EXTS = {"", ".yaml", ".yml"} +_PY_EXTS = {".py"} + +_FILE_TYPES = (io.IOBase,) + +# CfgNodes can only contain a limited set of valid types +_VALID_TYPES = {tuple, list, str, int, float, bool, type(None)} +# py2 allow for str and unicode +if _PY2: + _VALID_TYPES = _VALID_TYPES.union({unicode}) # noqa: F821 + +# Utilities for importing modules from file paths +if _PY2: + # imp is available in both py2 and py3 for now, but is deprecated in py3 + import imp +else: + import importlib.util + +logger = logging.getLogger(__name__) + + +class CfgNode(dict): + """ + CfgNode represents an internal node in the configuration tree. It's a simple + dict-like container that allows for attribute-based access to keys. + """ + + IMMUTABLE = "__immutable__" + DEPRECATED_KEYS = "__deprecated_keys__" + RENAMED_KEYS = "__renamed_keys__" + NEW_ALLOWED = "__new_allowed__" + + def __init__(self, init_dict=None, key_list=None, new_allowed=False): + """ + Args: + init_dict (dict): the possibly-nested dictionary to initailize the + CfgNode. + key_list (list[str]): a list of names which index this CfgNode from + the root. + Currently only used for logging purposes. + new_allowed (bool): whether adding new key is allowed when merging with + other configs. + """ + # Recursively convert nested dictionaries in init_dict into CfgNodes + init_dict = {} if init_dict is None else init_dict + key_list = [] if key_list is None else key_list + init_dict = self._create_config_tree_from_dict(init_dict, key_list) + super(CfgNode, self).__init__(init_dict) + # Manage if the CfgNode is frozen or not + self.__dict__[CfgNode.IMMUTABLE] = False + # Deprecated options + # If an option is removed from the code and you don't want to break existing + # yaml configs, you can add the full config key as a string to the set below. + self.__dict__[CfgNode.DEPRECATED_KEYS] = set() + # Renamed options + # If you rename a config option, record the mapping from the old name to the + # new name in the dictionary below. Optionally, if the type also changed, you + # can make the value a tuple that specifies first the renamed key and then + # instructions for how to edit the config file. + self.__dict__[CfgNode.RENAMED_KEYS] = { + # 'EXAMPLE.OLD.KEY': 'EXAMPLE.NEW.KEY', # Dummy example to follow + # 'EXAMPLE.OLD.KEY': ( # A more complex example to follow + # 'EXAMPLE.NEW.KEY', + # "Also convert to a tuple, e.g., 'foo' -> ('foo',) or " + # + "'foo:bar' -> ('foo', 'bar')" + # ), + } + + # Allow new attributes after initialisation + self.__dict__[CfgNode.NEW_ALLOWED] = new_allowed + + @classmethod + def _create_config_tree_from_dict(cls, dic, key_list): + """ + Create a configuration tree using the given dict. + Any dict-like objects inside dict will be treated as a new CfgNode. + + Args: + dic (dict): + key_list (list[str]): a list of names which index this CfgNode from + the root. Currently only used for logging purposes. + """ + dic = copy.deepcopy(dic) + for k, v in dic.items(): + if isinstance(v, dict): + # Convert dict to CfgNode + dic[k] = cls(v, key_list=key_list + [k]) + else: + # Check for valid leaf type or nested CfgNode + _assert_with_logging( + _valid_type(v, allow_cfg_node=False), + "Key {} with value {} is not a valid type; valid types: {}".format( + ".".join(key_list + [k]), type(v), _VALID_TYPES + ), + ) + return dic + + def __getattr__(self, name): + if name in self: + return self[name] + else: + raise AttributeError(name) + + def __setattr__(self, name, value): + if self.is_frozen(): + raise AttributeError( + "Attempted to set {} to {}, but CfgNode is immutable".format( + name, value + ) + ) + + _assert_with_logging( + name not in self.__dict__, + "Invalid attempt to modify internal CfgNode state: {}".format(name), + ) + _assert_with_logging( + _valid_type(value, allow_cfg_node=True), + "Invalid type {} for key {}; valid types = {}".format( + type(value), name, _VALID_TYPES + ), + ) + + self[name] = value + + def __str__(self): + def _indent(s_, num_spaces): + s = s_.split("\n") + if len(s) == 1: + return s_ + first = s.pop(0) + s = [(num_spaces * " ") + line for line in s] + s = "\n".join(s) + s = first + "\n" + s + return s + + r = "" + s = [] + for k, v in sorted(self.items()): + seperator = "\n" if isinstance(v, CfgNode) else " " + attr_str = "{}:{}{}".format(str(k), seperator, str(v)) + attr_str = _indent(attr_str, 2) + s.append(attr_str) + r += "\n".join(s) + return r + + def __repr__(self): + return "{}({})".format(self.__class__.__name__, super(CfgNode, self).__repr__()) + + def dump(self, **kwargs): + """Dump to a string.""" + + def convert_to_dict(cfg_node, key_list): + if not isinstance(cfg_node, CfgNode): + _assert_with_logging( + _valid_type(cfg_node), + "Key {} with value {} is not a valid type; valid types: {}".format( + ".".join(key_list), type(cfg_node), _VALID_TYPES + ), + ) + return cfg_node + else: + cfg_dict = dict(cfg_node) + for k, v in cfg_dict.items(): + cfg_dict[k] = convert_to_dict(v, key_list + [k]) + return cfg_dict + + self_as_dict = convert_to_dict(self, []) + return yaml.safe_dump(self_as_dict, **kwargs) + + def merge_from_file(self, cfg_filename): + """Load a yaml config file and merge it this CfgNode.""" + with open(cfg_filename, "r", encoding="utf-8") as f: + cfg = self.load_cfg(f) + self.merge_from_other_cfg(cfg) + + def merge_from_other_cfg(self, cfg_other): + """Merge `cfg_other` into this CfgNode.""" + _merge_a_into_b(cfg_other, self, self, []) + + def merge_from_list(self, cfg_list): + """Merge config (keys, values) in a list (e.g., from command line) into + this CfgNode. For example, `cfg_list = ['FOO.BAR', 0.5]`. + """ + _assert_with_logging( + len(cfg_list) % 2 == 0, + "Override list has odd length: {}; it must be a list of pairs".format( + cfg_list + ), + ) + root = self + for full_key, v in zip(cfg_list[0::2], cfg_list[1::2]): + if root.key_is_deprecated(full_key): + continue + if root.key_is_renamed(full_key): + root.raise_key_rename_error(full_key) + key_list = full_key.split(".") + d = self + for subkey in key_list[:-1]: + _assert_with_logging( + subkey in d, "Non-existent key: {}".format(full_key) + ) + d = d[subkey] + subkey = key_list[-1] + _assert_with_logging(subkey in d, "Non-existent key: {}".format(full_key)) + value = self._decode_cfg_value(v) + value = _check_and_coerce_cfg_value_type(value, d[subkey], subkey, full_key) + d[subkey] = value + + def freeze(self): + """Make this CfgNode and all of its children immutable.""" + self._immutable(True) + + def defrost(self): + """Make this CfgNode and all of its children mutable.""" + self._immutable(False) + + def is_frozen(self): + """Return mutability.""" + return self.__dict__[CfgNode.IMMUTABLE] + + def _immutable(self, is_immutable): + """Set immutability to is_immutable and recursively apply the setting + to all nested CfgNodes. + """ + self.__dict__[CfgNode.IMMUTABLE] = is_immutable + # Recursively set immutable state + for v in self.__dict__.values(): + if isinstance(v, CfgNode): + v._immutable(is_immutable) + for v in self.values(): + if isinstance(v, CfgNode): + v._immutable(is_immutable) + + def clone(self): + """Recursively copy this CfgNode.""" + return copy.deepcopy(self) + + def register_deprecated_key(self, key): + """Register key (e.g. `FOO.BAR`) a deprecated option. When merging deprecated + keys a warning is generated and the key is ignored. + """ + _assert_with_logging( + key not in self.__dict__[CfgNode.DEPRECATED_KEYS], + "key {} is already registered as a deprecated key".format(key), + ) + self.__dict__[CfgNode.DEPRECATED_KEYS].add(key) + + def register_renamed_key(self, old_name, new_name, message=None): + """Register a key as having been renamed from `old_name` to `new_name`. + When merging a renamed key, an exception is thrown alerting to user to + the fact that the key has been renamed. + """ + _assert_with_logging( + old_name not in self.__dict__[CfgNode.RENAMED_KEYS], + "key {} is already registered as a renamed cfg key".format(old_name), + ) + value = new_name + if message: + value = (new_name, message) + self.__dict__[CfgNode.RENAMED_KEYS][old_name] = value + + def key_is_deprecated(self, full_key): + """Test if a key is deprecated.""" + if full_key in self.__dict__[CfgNode.DEPRECATED_KEYS]: + logger.warning("Deprecated config key (ignoring): {}".format(full_key)) + return True + return False + + def key_is_renamed(self, full_key): + """Test if a key is renamed.""" + return full_key in self.__dict__[CfgNode.RENAMED_KEYS] + + def raise_key_rename_error(self, full_key): + new_key = self.__dict__[CfgNode.RENAMED_KEYS][full_key] + if isinstance(new_key, tuple): + msg = " Note: " + new_key[1] + new_key = new_key[0] + else: + msg = "" + raise KeyError( + "Key {} was renamed to {}; please update your config.{}".format( + full_key, new_key, msg + ) + ) + + def is_new_allowed(self): + return self.__dict__[CfgNode.NEW_ALLOWED] + + @classmethod + def load_cfg(cls, cfg_file_obj_or_str): + """ + Load a cfg. + Args: + cfg_file_obj_or_str (str or file): + Supports loading from: + - A file object backed by a YAML file + - A file object backed by a Python source file that exports an attribute + "cfg" that is either a dict or a CfgNode + - A string that can be parsed as valid YAML + """ + _assert_with_logging( + isinstance(cfg_file_obj_or_str, _FILE_TYPES + (str,)), + "Expected first argument to be of type {} or {}, but it was {}".format( + _FILE_TYPES, str, type(cfg_file_obj_or_str) + ), + ) + if isinstance(cfg_file_obj_or_str, str): + return cls._load_cfg_from_yaml_str(cfg_file_obj_or_str) + elif isinstance(cfg_file_obj_or_str, _FILE_TYPES): + return cls._load_cfg_from_file(cfg_file_obj_or_str) + else: + raise NotImplementedError("Impossible to reach here (unless there's a bug)") + + @classmethod + def _load_cfg_from_file(cls, file_obj): + """Load a config from a YAML file or a Python source file.""" + _, file_extension = os.path.splitext(file_obj.name) + if file_extension in _YAML_EXTS: + return cls._load_cfg_from_yaml_str(file_obj.read()) + elif file_extension in _PY_EXTS: + return cls._load_cfg_py_source(file_obj.name) + else: + raise Exception( + "Attempt to load from an unsupported file type {}; " + "only {} are supported".format(file_obj, _YAML_EXTS.union(_PY_EXTS)) + ) + + @classmethod + def _load_cfg_from_yaml_str(cls, str_obj): + """Load a config from a YAML string encoding.""" + cfg_as_dict = yaml.safe_load(str_obj) + return cls(cfg_as_dict) + + @classmethod + def _load_cfg_py_source(cls, filename): + """Load a config from a Python source file.""" + module = _load_module_from_file("yacs.config.override", filename) + _assert_with_logging( + hasattr(module, "cfg"), + "Python module from file {} must have 'cfg' attr".format(filename), + ) + VALID_ATTR_TYPES = {dict, CfgNode} + _assert_with_logging( + type(module.cfg) in VALID_ATTR_TYPES, + "Imported module 'cfg' attr must be in {} but is {} instead".format( + VALID_ATTR_TYPES, type(module.cfg) + ), + ) + return cls(module.cfg) + + @classmethod + def _decode_cfg_value(cls, value): + """ + Decodes a raw config value (e.g., from a yaml config files or command + line argument) into a Python object. + + If the value is a dict, it will be interpreted as a new CfgNode. + If the value is a str, it will be evaluated as literals. + Otherwise it is returned as-is. + """ + # Configs parsed from raw yaml will contain dictionary keys that need to be + # converted to CfgNode objects + if isinstance(value, dict): + return cls(value) + # All remaining processing is only applied to strings + if not isinstance(value, str): + return value + # Try to interpret `value` as a: + # string, number, tuple, list, dict, boolean, or None + try: + value = literal_eval(value) + # The following two excepts allow v to pass through when it represents a + # string. + # + # Longer explanation: + # The type of v is always a string (before calling literal_eval), but + # sometimes it *represents* a string and other times a data structure, like + # a list. In the case that v represents a string, what we got back from the + # yaml parser is 'foo' *without quotes* (so, not '"foo"'). literal_eval is + # ok with '"foo"', but will raise a ValueError if given 'foo'. In other + # cases, like paths (v = 'foo/bar' and not v = '"foo/bar"'), literal_eval + # will raise a SyntaxError. + except ValueError: + pass + except SyntaxError: + pass + return value + + +load_cfg = ( + CfgNode.load_cfg +) # keep this function in global scope for backward compatibility + + +def _valid_type(value, allow_cfg_node=False): + return (type(value) in _VALID_TYPES) or ( + allow_cfg_node and isinstance(value, CfgNode) + ) + + +def _merge_a_into_b(a, b, root, key_list): + """Merge config dictionary a into config dictionary b, clobbering the + options in b whenever they are also specified in a. + """ + _assert_with_logging( + isinstance(a, CfgNode), + "`a` (cur type {}) must be an instance of {}".format(type(a), CfgNode), + ) + _assert_with_logging( + isinstance(b, CfgNode), + "`b` (cur type {}) must be an instance of {}".format(type(b), CfgNode), + ) + + for k, v_ in a.items(): + full_key = ".".join(key_list + [k]) + + v = copy.deepcopy(v_) + v = b._decode_cfg_value(v) + + if k in b: + v = _check_and_coerce_cfg_value_type(v, b[k], k, full_key) + # Recursively merge dicts + if isinstance(v, CfgNode): + try: + _merge_a_into_b(v, b[k], root, key_list + [k]) + except BaseException: + raise + else: + b[k] = v + elif b.is_new_allowed(): + b[k] = v + else: + if root.key_is_deprecated(full_key): + continue + elif root.key_is_renamed(full_key): + root.raise_key_rename_error(full_key) + else: + raise KeyError("Non-existent config key: {}".format(full_key)) + + +def _check_and_coerce_cfg_value_type(replacement, original, key, full_key): + """Checks that `replacement`, which is intended to replace `original` is of + the right type. The type is correct if it matches exactly or is one of a few + cases in which the type can be easily coerced. + """ + original_type = type(original) + replacement_type = type(replacement) + + # The types must match (with some exceptions) + if replacement_type == original_type: + return replacement + + # Cast replacement from from_type to to_type if the replacement and original + # types match from_type and to_type + def conditional_cast(from_type, to_type): + if replacement_type == from_type and original_type == to_type: + return True, to_type(replacement) + else: + return False, None + + # Conditionally casts + # list <-> tuple + casts = [(tuple, list), (list, tuple)] + # For py2: allow converting from str (bytes) to a unicode string + try: + casts.append((str, unicode)) # noqa: F821 + except Exception: + pass + + for (from_type, to_type) in casts: + converted, converted_value = conditional_cast(from_type, to_type) + if converted: + return converted_value + + raise ValueError( + "Type mismatch ({} vs. {}) with values ({} vs. {}) for config " + "key: {}".format( + original_type, replacement_type, original, replacement, full_key + ) + ) + + +def _assert_with_logging(cond, msg): + if not cond: + logger.debug(msg) + assert cond, msg + + +def _load_module_from_file(name, filename): + if _PY2: + module = imp.load_source(name, filename) + else: + spec = importlib.util.spec_from_file_location(name, filename) + module = importlib.util.module_from_spec(spec) + spec.loader.exec_module(module) + return module diff --git a/reference/demo.py b/reference/demo.py new file mode 100644 index 0000000..e38c8f3 --- /dev/null +++ b/reference/demo.py @@ -0,0 +1,157 @@ +import argparse +import os +import time + +import cv2 +import torch + +from nanodet.data.batch_process import stack_batch_img +from nanodet.data.collate import naive_collate +from nanodet.data.transform import Pipeline +from nanodet.model.arch import build_model +from nanodet.util import Logger, cfg, load_config, load_model_weight +from nanodet.util.path import mkdir + +image_ext = [".jpg", ".jpeg", ".webp", ".bmp", ".png"] +video_ext = ["mp4", "mov", "avi", "mkv"] + + +def parse_args(): + parser = argparse.ArgumentParser() + parser.add_argument( + "demo", default="image", help="demo type, eg. image, video and webcam" + ) + parser.add_argument("--config", default="config/00.yml",help="model config file path") + parser.add_argument("--model",default="", help="model file path") + parser.add_argument("--path", default="./demo", help="path to images or video") + parser.add_argument("--camid", type=int, default=0, help="webcam demo camera id") + parser.add_argument( + "--save_result", + action="store_true", + help="whether to save the inference result of image/video", + ) + args = parser.parse_args() + return args + + +class Predictor(object): + def __init__(self, cfg, model_path, logger, device="cpu:0"): + self.cfg = cfg + self.device = device + model = build_model(cfg.model) + ckpt = torch.load(model_path, map_location=lambda storage, loc: storage) + load_model_weight(model, ckpt, logger) + if cfg.model.arch.backbone.name == "RepVGG": + deploy_config = cfg.model + deploy_config.arch.backbone.update({"deploy": True}) + deploy_model = build_model(deploy_config) + from nanodet.model.backbone.repvgg import repvgg_det_model_convert + + model = repvgg_det_model_convert(model, deploy_model) + self.model = model.to(device).eval() + self.pipeline = Pipeline(cfg.data.val.pipeline, cfg.data.val.keep_ratio) + + def inference(self, img): + img_info = {"id": 0} + if isinstance(img, str): + img_info["file_name"] = os.path.basename(img) + img = cv2.imread(img) + else: + img_info["file_name"] = None + + height, width = img.shape[:2] + img_info["height"] = height + img_info["width"] = width + meta = dict(img_info=img_info, raw_img=img, img=img) + meta = self.pipeline(None, meta, self.cfg.data.val.input_size) + meta["img"] = torch.from_numpy(meta["img"].transpose(2, 0, 1)).to(self.device) + meta = naive_collate([meta]) + meta["img"] = stack_batch_img(meta["img"], divisible=32) + with torch.no_grad(): + results = self.model.inference(meta) + return meta, results + + def visualize(self, dets, meta, class_names, score_thres, wait=0): + time1 = time.time() + result_img = self.model.head.show_result( + meta["raw_img"][0], dets, class_names, score_thres=score_thres, show=True + ) + print("viz time: {:.3f}s".format(time.time() - time1)) + return result_img + + +def get_image_list(path): + image_names = [] + for maindir, subdir, file_name_list in os.walk(path): + for filename in file_name_list: + apath = os.path.join(maindir, filename) + ext = os.path.splitext(apath)[1] + if ext in image_ext: + image_names.append(apath) + return image_names + + +def main(): + args = parse_args() + local_rank = 0 + torch.backends.cudnn.enabled = True + torch.backends.cudnn.benchmark = True + + load_config(cfg, args.config) + logger = Logger(local_rank, use_tensorboard=False) + predictor = Predictor(cfg, args.model, logger, device="cpu:0") + logger.log('Press "Esc", "q" or "Q" to exit.') + current_time = time.localtime() + if args.demo == "image": + if os.path.isdir(args.path): + files = get_image_list(args.path) + else: + files = [args.path] + files.sort() + for image_name in files: + meta, res = predictor.inference(image_name) + result_image = predictor.visualize(res[0], meta, cfg.class_names, 0.35) + if args.save_result: + save_folder = os.path.join( + cfg.save_dir, time.strftime("%Y_%m_%d_%H_%M_%S", current_time) + ) + mkdir(local_rank, save_folder) + save_file_name = os.path.join(save_folder, os.path.basename(image_name)) + cv2.imwrite(save_file_name, result_image) + ch = cv2.waitKey(0) + if ch == 27 or ch == ord("q") or ch == ord("Q"): + break + elif args.demo == "video" or args.demo == "webcam": + cap = cv2.VideoCapture(args.path if args.demo == "video" else args.camid) + width = cap.get(cv2.CAP_PROP_FRAME_WIDTH) # float + height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT) # float + fps = cap.get(cv2.CAP_PROP_FPS) + save_folder = os.path.join( + cfg.save_dir, time.strftime("%Y_%m_%d_%H_%M_%S", current_time) + ) + mkdir(local_rank, save_folder) + save_path = ( + os.path.join(save_folder, args.path.replace("\\", "/").split("/")[-1]) + if args.demo == "video" + else os.path.join(save_folder, "camera.mp4") + ) + print(f"save_path is {save_path}") + vid_writer = cv2.VideoWriter( + save_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (int(width), int(height)) + ) + while True: + ret_val, frame = cap.read() + if ret_val: + meta, res = predictor.inference(frame) + result_frame = predictor.visualize(res[0], meta, cfg.class_names, 0.35) + if args.save_result: + vid_writer.write(result_frame) + ch = cv2.waitKey(1) + if ch == 27 or ch == ord("q") or ch == ord("Q"): + break + else: + break + + +if __name__ == "__main__": + main() diff --git a/reference/inference.py b/reference/inference.py new file mode 100644 index 0000000..8f853f3 --- /dev/null +++ b/reference/inference.py @@ -0,0 +1,70 @@ +# Copyright 2021 RangiLyu. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import time + +import cv2 +import torch + +from nanodet.data.transform import Pipeline +from nanodet.model.arch import build_model +from nanodet.util import load_model_weight + + +class Predictor(object): + def __init__(self, cfg, model_path, logger, device="cuda:0"): + self.cfg = cfg + self.device = device + model = build_model(cfg.model) + ckpt = torch.load(model_path, map_location=lambda storage, loc: storage) + load_model_weight(model, ckpt, logger) + if cfg.model.arch.backbone.name == "RepVGG": + deploy_config = cfg.model + deploy_config.arch.backbone.update({"deploy": True}) + deploy_model = build_model(deploy_config) + from nanodet.model.backbone.repvgg import repvgg_det_model_convert + + model = repvgg_det_model_convert(model, deploy_model) + self.model = model.to(device).eval() + self.pipeline = Pipeline(cfg.data.val.pipeline, cfg.data.val.keep_ratio) + + def inference(self, img): + img_info = {} + if isinstance(img, str): + img_info["file_name"] = os.path.basename(img) + img = cv2.imread(img) + else: + img_info["file_name"] = None + + height, width = img.shape[:2] + img_info["height"] = height + img_info["width"] = width + meta = dict(img_info=img_info, raw_img=img, img=img) + meta = self.pipeline(meta, self.cfg.data.val.input_size) + meta["img"] = ( + torch.from_numpy(meta["img"].transpose(2, 0, 1)) + .unsqueeze(0) + .to(self.device) + ) + with torch.no_grad(): + results = self.model.inference(meta) + return meta, results + + def visualize(self, dets, meta, class_names, score_thres, wait=0): + time1 = time.time() + self.model.head.show_result( + meta["raw_img"], dets, class_names, score_thres=score_thres, show=True + ) + print("viz time: {:.3f}s".format(time.time() - time1)) diff --git a/requirements.txt b/requirements.txt index 5bfc357..dd7ce36 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,7 +1,23 @@ -numpy -scipy +Cython matplotlib +numpy +omegaconf>=2.0.1 +onnx +onnx-simplifier +opencv-python +pyaml +pycocotools +pytorch-lightning==1.7.0 +tabulate +tensorboard +termcolor +torch>=1.9 +torchmetrics +torchvision +tqdm + opencv-contrib-python +scipy pandas motmetrics setuptools diff --git a/setup_nanodet.py b/setup_nanodet.py new file mode 100644 index 0000000..d2dccb8 --- /dev/null +++ b/setup_nanodet.py @@ -0,0 +1,27 @@ +#!/usr/bin/env python +from setuptools import find_packages, setup + +from nanodet import __author__, __author_email__, __docs__, __homepage__, __version__ + +if __name__ == "__main__": + setup( + name="nanodet", + version=__version__, + description=__docs__, + url=__homepage__, + author=__author__, + author_email=__author_email__, + keywords="deep learning", + packages=find_packages(exclude=("config", "tools", "demo")), + classifiers=[ + "Development Status :: Beta", + "License :: OSI Approved :: Apache Software License", + "Operating System :: OS Independent", + "Programming Language :: Python :: 3.5", + "Programming Language :: Python :: 3.6", + "Programming Language :: Python :: 3.7", + "Programming Language :: Python :: 3.8", + ], + license="Apache License 2.0", + zip_safe=False, + ) diff --git a/test.avi b/test.avi new file mode 100644 index 0000000..eb22256 Binary files /dev/null and b/test.avi differ diff --git a/tool.py b/tool.py new file mode 100644 index 0000000..97e2e05 --- /dev/null +++ b/tool.py @@ -0,0 +1,8 @@ +def infotrans(all_box): + # 用液滴左上和右下边框上的点的坐标近似计算液滴中心点坐标 + bboxes, confidences, class_ids = [], [], [] + for i in range(len(all_box)): + bboxes.append([all_box[i][1],all_box[i][2],all_box[i][3]-all_box[i][1],all_box[i][4]-all_box[i][2]]) + confidences.append(float(all_box[i][5])) + class_ids.append(all_box[i][0]) + return bboxes , confidences , class_ids diff --git a/weight/LiquidV5.pth b/weight/LiquidV5.pth new file mode 100644 index 0000000..53477de Binary files /dev/null and b/weight/LiquidV5.pth differ