[Object Detection] 3. MMdetection

2024-10-12

Object Detection Library 💡 MMDetection 라이브러리는 Object Detection 작업을 config 설정으로 엔지니어링 할 수 있도록 도와주는 유용한 툴. solver는 detectron2에서 설정할 수 있는 config

Untitled

Backbone : 입력 이미지를 특징 맵으로 변형한다. ResNet, VGG 등
Neck : Backbone 과 head 를 연결하고 Feature map 을 재구성한다.
DenseHead : 특징 맵의 dense location 을 수행하는 부분.
RoIHead : RoI 특징을 입력으로 받아 box 분류, 좌표 회귀(regressor) 등을 예측

Detectron2 라이브러리에서 모델 관련 config로는 solver, RoI_BOX_HEAD, RoI_HEADS, Anchor generator 가 있다. Neck 은 mmdetection 에서 설정할 수 있는 config 로, Detectron2 에서는 동일한 역할을 하는 모듈을 config 에서 FPN 으로 관리.

mmdetection 분석

Untitled

DenseHead 가 RPN Network → localization 부분. RoI 가 나온다.
RoIHead 가 RPN 에서 나온 RoI 가 통과하는 Box head 와 Cls head 이다.

Untitled

config

./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py

  {
    "model": {
      "type": "fasterRCn",
      "backbone": {
        "type": "Resnet",
        "depth": 50,
        "num_stages": 4,
        "out_indices": "(0, 1, 2, 3)",
        "frozen_stages": 1,
        "norm_cfg": {
          "type": "Bn",
          "requires_grad": true
        },
        "norm_eval": true,
        "style": "pytorch",
        "init_cfg": {
          "type": "Pretrained",
          "checkpoint": "torchvision resnet50"
        }
      },
      "neck": {
        "type": "fPn",
        "in_channels": [
          256,
          512,
          1024,
          2048
        ],
        "out_channels": 256,
        "num_outs": 5
      },
      "rpn_head": {
        "type": "RPnHead",
        "in_channels": 256,
        "feat_channels": 256,
        "anchor_generator": {
          "type": "AnchorGenerator",
          "scales": [
            8
          ],
          "ratios": [
            0.5,
            1,
            2
          ],
          "strides": [
            4,
            8,
            16,
            32,
            64
          ]
        },
        "bbox_coder": {
          "type": "DeltaXYWHBBoxCoder",
          "target_means": [
            0,
            0,
            0,
            0
          ],
          "target_stds": [
            1,
            1,
            1,
            1
          ]
        },
        "loss_cls": {
          "type": "CrossEntropyLoss",
          "use_sigmoid": true,
          "loss_weight": 1
        },
        "loss_bbox": {
          "type": "L1Loss",
          "loss_weight": 1
        }
      },
      "roi_head": {
        "type": "StandardRoIHead",
        "bbox_roi_extractor": {
          "type": "SingleRoIExtractor",
          "roi_layer": {
            "type": "RoIAlign",
            "output_size": 7,
            "sampling_ratio": 0
          },
          "out_channels": 256,
          "featmap_strides": [
            4,
            8,
            16,
            32
          ]
        },
        "bbox_head": {
          "type": "Shared2fCBBoxHead",
          "in_channels": 256,
          "fc_out_channels": 1024,
          "roi_feat_size": 7,
          "num_classes": 80,
          "bbox_coder": {
            "type": "DeltaXYWHBBoxCoder",
            "target_means": [
              0,
              0,
              0,
              0
            ],
            "target_stds": [
              0.1,
              0.1,
              0.2,
              0.2
            ]
          },
          "reg_class_agnostic": false,
          "loss_cls": {
            "type": "CrossEntropyLoss",
            "use_sigmoid": false,
            "loss_weight": 1
          },
          "loss_bbox": {
            "type": "L1Loss",
            "loss_weight": 1
          }
        }
      },
      "train_cfg": {
        "rpn": {
          "assigner": {
            "type": "MaxIoUAssigner",
            "pos_iou_thr": 0.7,
            "neg_iou_thr": 0.3,
            "min_pos_iou": 0.3,
            "match_low_quality": true,
            "ignore_iof_thr": -1
          },
          "sampler": {
            "type": "RandomSampler",
            "num": 256,
            "pos_fraction": 0.5,
            "neg_pos_ub": -1,
            "add_gt_as_proposals": false
          },
          "allowed_border": -1,
          "pos_weight": -1,
          "debug": false
        },
        "rpn_proposal": {
          "nms_pre": 2000,
          "max_per_img": 1000,
          "nms": {
            "type": "nms",
            "iou_threshold": 0.7
          },
          "min_bbox_size": 0
        },
        "rcnn": {
          "assigner": {
            "type": "MaxIoUAssigner",
            "pos_iou_thr": 0.5,
            "neg_iou_thr": 0.5,
            "min_pos_iou": 0.5,
            "match_low_quality": false,
            "ignore_iof_thr": -1
          },
          "sampler": {
            "type": "RandomSampler",
            "num": 512,
            "pos_fraction": 0.25,
            "neg_pos_ub": -1,
            "add_gt_as_proposals": true
          },
          "pos_weight": -1,
          "debug": false
        }
      },
      "test_cfg": {
        "rpn": {
          "nms_pre": 1000,
          "max_per_img": 1000,
          "nms": {
            "type": "nms",
            "iou_threshold": 0.7
          },
          "min_bbox_size": 0
        },
        "rcnn": {
          "score_thr": 0.05,
          "nms": {
            "type": "nms",
            "iou_threshold": 0.5
          },
          "max_per_img": 100
        }
      }
    },
    "dataset_type": "CocoDataset",
    "data_root": "data/coco/",
    "img_norm_cfg": {
      "mean": [
        123.675,
        116.28,
        103.53
      ],
      "std": [
        58.395,
        57.12,
        57.375
      ],
      "to_rgb": true
    },
    "train_pipeline": [
      {
        "type": "LoadImagefromfile"
      },
      {
        "type": "LoadAnnotations",
        "with_bbox": true
      },
      {
        "type": "Resize",
        "img_scale": "(1333, 800)",
        "keep_ratio": true
      },
      {
        "type": "Randomflip",
        "flip_ratio": 0.5
      },
      {
        "type": "normalize",
        "mean": [
          123.675,
          116.28,
          103.53
        ],
        "std": [
          58.395,
          57.12,
          57.375
        ],
        "to_rgb": true
      },
      {
        "type": "Pad",
        "size_divisor": 32
      },
      {
        "type": "DefaultformatBundle"
      },
      {
        "type": "Collect",
        "keys": [
          "img",
          "gt_bboxes",
          "gt_labels"
        ]
      }
    ],
    "test_pipeline": [
      {
        "type": "LoadImagefromfile"
      },
      {
        "type": "MultiScaleflipAug",
        "img_scale": "(1333, 800)",
        "flip": false,
        "transforms": [
          {
            "type": "Resize",
            "keep_ratio": true
          },
          {
            "type": "Randomflip"
          },
          {
            "type": "normalize",
            "mean": [
              123.675,
              116.28,
              103.53
            ],
            "std": [
              58.395,
              57.12,
              57.375
            ],
            "to_rgb": true
          },
          {
            "type": "Pad",
            "size_divisor": 32
          },
          {
            "type": "Imagetotensor",
            "keys": [
              "img"
            ]
          },
          {
            "type": "Collect",
            "keys": [
              "img"
            ]
          }
        ]
      }
    ],
    "data": {
      "samples_per_gpu": 2,
      "workers_per_gpu": 2,
      "train": {
        "type": "CocoDataset",
        "ann_file": "data/coco/annotations/instances_train2017.json",
        "img_prefix": "data/coco/train2017/",
        "pipeline": [
          {
            "type": "LoadImagefromfile"
          },
          {
            "type": "LoadAnnotations",
            "with_bbox": true
          },
          {
            "type": "Resize",
            "img_scale": "(1333, 800)",
            "keep_ratio": true
          },
          {
            "type": "Randomflip",
            "flip_ratio": 0.5
          },
          {
            "type": "normalize",
            "mean": [
              123.675,
              116.28,
              103.53
            ],
            "std": [
              58.395,
              57.12,
              57.375
            ],
            "to_rgb": true
          },
          {
            "type": "Pad",
            "size_divisor": 32
          },
          {
            "type": "DefaultformatBundle"
          },
          {
            "type": "Collect",
            "keys": [
              "img",
              "gt_bboxes",
              "gt_labels"
            ]
          }
        ]
      },
      "val": {
        "type": "CocoDataset",
        "ann_file": "data/coco/annotations/instances_val2017.json",
        "img_prefix": "data/coco/val2017/",
        "pipeline": [
          {
            "type": "LoadImagefromfile"
          },
          {
            "type": "MultiScaleflipAug",
            "img_scale": "(1333, 800)",
            "flip": false,
            "transforms": [
              {
                "type": "Resize",
                "keep_ratio": true
              },
              {
                "type": "Randomflip"
              },
              {
                "type": "normalize",
                "mean": [
                  123.675,
                  116.28,
                  103.53
                ],
                "std": [
                  58.395,
                  57.12,
                  57.375
                ],
                "to_rgb": true
              },
              {
                "type": "Pad",
                "size_divisor": 32
              },
              {
                "type": "Imagetotensor",
                "keys": [
                  "img"
                ]
              },
              {
                "type": "Collect",
                "keys": [
                  "img"
                ]
              }
            ]
          }
        ]
      },
      "test": {
        "type": "CocoDataset",
        "ann_file": "data/coco/annotations/instances_val2017.json",
        "img_prefix": "data/coco/val2017/",
        "pipeline": [
          {
            "type": "LoadImagefromfile"
          },
          {
            "type": "MultiScaleflipAug",
            "img_scale": "(1333, 800)",
            "flip": false,
            "transforms": [
              {
                "type": "Resize",
                "keep_ratio": true
              },
              {
                "type": "Randomflip"
              },
              {
                "type": "normalize",
                "mean": [
                  123.675,
                  116.28,
                  103.53
                ],
                "std": [
                  58.395,
                  57.12,
                  57.375
                ],
                "to_rgb": true
              },
              {
                "type": "Pad",
                "size_divisor": 32
              },
              {
                "type": "Imagetotensor",
                "keys": [
                  "img"
                ]
              },
              {
                "type": "Collect",
                "keys": [
                  "img"
                ]
              }
            ]
          }
        ]
      }
    },
    "evaluation": {
      "interval": 1,
      "metric": "bbox"
    },
    "optimizer": {
      "type": "SGD",
      "lr": 0.02,
      "momentum": 0.9,
      "weight_decay": 0.0001
    },
    "optimizer_config": {
      "grad_clip": "none"
    },
    "lr_config": {
      "policy": "step",
      "warmup": "linear",
      "warmup_iters": 500,
      "warmup_ratio": 0.001,
      "step": [
        8,
        11
      ]
    },
    "runner": {
      "type": "EpochBasedRunner",
      "max_epochs": 12
    },
    "checkpoint_config": {
      "interval": 1
    },
    "log_config": {
      "interval": 50,
      "hooks": [
        {
          "type": "textLoggerHook"
        }
      ]
    },
    "custom_hooks": [
      {
        "type": "numClassCheckHook"
      }
    ],
    "dist_params": {
      "backend": "nccl"
    },
    "log_level": "InfO",
    "load_from": "none",
    "resume_from": "none",
    "workflow": [
      "(train, 1)"
    ],
    "opencv_num_threads": 0,
    "mp_start_method": "fork",
    "auto_scale_lr": {
      "enable": false,
      "base_batch_size": 16
    }
  }

./configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_1x_coco.py

  {
    "model": {
      "type": "FasterRCNN",
      "backbone": {
        "type": "ResNeXt",
        "depth": 101,
        "num_stages": 4,
        "out_indices": "(0, 1, 2, 3)",
        "frozen_stages": 1,
        "norm_cfg": {
          "type": "BN",
          "requires_grad": true
        },
        "norm_eval": true,
        "style": "pytorch",
        "init_cfg": {
          "type": "Pretrained",
          "checkpoint": "open-mmlab://resnext101_64x4d"
        },
        "groups": 64,
        "base_width": 4
      },
      "neck": {
        "type": "FPN",
        "in_channels": [
          256,
          512,
          1024,
          2048
        ],
        "out_channels": 256,
        "num_outs": 5
      },
      "rpn_head": {
        "type": "RPNHead",
        "in_channels": 256,
        "feat_channels": 256,
        "anchor_generator": {
          "type": "AnchorGenerator",
          "scales": [
            8
          ],
          "ratios": [
            0.5,
            1,
            2
          ],
          "strides": [
            4,
            8,
            16,
            32,
            64
          ]
        },
        "bbox_coder": {
          "type": "DeltaXYWHBBoxCoder",
          "target_means": [
            0,
            0,
            0,
            0
          ],
          "target_stds": [
            1,
            1,
            1,
            1
          ]
        },
        "loss_cls": {
          "type": "CrossEntropyLoss",
          "use_sigmoid": true,
          "loss_weight": 1
        },
        "loss_bbox": {
          "type": "L1Loss",
          "loss_weight": 1
        }
      },
      "roi_head": {
        "type": "StandardRoIHead",
        "bbox_roi_extractor": {
          "type": "SingleRoIExtractor",
          "roi_layer": {
            "type": "RoIAlign",
            "output_size": 7,
            "sampling_ratio": 0
          },
          "out_channels": 256,
          "featmap_strides": [
            4,
            8,
            16,
            32
          ]
        },
        "bbox_head": {
          "type": "Shared2FCBBoxHead",
          "in_channels": 256,
          "fc_out_channels": 1024,
          "roi_feat_size": 7,
          "num_classes": 10,
          "bbox_coder": {
            "type": "DeltaXYWHBBoxCoder",
            "target_means": [
              0,
              0,
              0,
              0
            ],
            "target_stds": [
              0.1,
              0.1,
              0.2,
              0.2
            ]
          },
          "reg_class_agnostic": false,
          "loss_cls": {
            "type": "CrossEntropyLoss",
            "use_sigmoid": false,
            "loss_weight": 1
          },
          "loss_bbox": {
            "type": "L1Loss",
            "loss_weight": 1
          }
        }
      },
      "train_cfg": {
        "rpn": {
          "assigner": {
            "type": "MaxIoUAssigner",
            "pos_iou_thr": 0.7,
            "neg_iou_thr": 0.3,
            "min_pos_iou": 0.3,
            "match_low_quality": true,
            "ignore_iof_thr": -1
          },
          "sampler": {
            "type": "RandomSampler",
            "num": 256,
            "pos_fraction": 0.5,
            "neg_pos_ub": -1,
            "add_gt_as_proposals": false
          },
          "allowed_border": -1,
          "pos_weight": -1,
          "debug": false
        },
        "rpn_proposal": {
          "nms_pre": 2000,
          "max_per_img": 1000,
          "nms": {
            "type": "nms",
            "iou_threshold": 0.7
          },
          "min_bbox_size": 0
        },
        "rcnn": {
          "assigner": {
            "type": "MaxIoUAssigner",
            "pos_iou_thr": 0.5,
            "neg_iou_thr": 0.5,
            "min_pos_iou": 0.5,
            "match_low_quality": false,
            "ignore_iof_thr": -1
          },
          "sampler": {
            "type": "RandomSampler",
            "num": 512,
            "pos_fraction": 0.25,
            "neg_pos_ub": -1,
            "add_gt_as_proposals": true
          },
          "pos_weight": -1,
          "debug": false
        }
      },
      "test_cfg": {
        "rpn": {
          "nms_pre": 1000,
          "max_per_img": 1000,
          "nms": {
            "type": "nms",
            "iou_threshold": 0.7
          },
          "min_bbox_size": 0
        },
        "rcnn": {
          "score_thr": 0.05,
          "nms": {
            "type": "nms",
            "iou_threshold": 0.5
          },
          "max_per_img": 100
        }
      }
    },
    "dataset_type": "CocoDataset",
    "data_root": "data/coco/",
    "img_norm_cfg": {
      "mean": [
        123.675,
        116.28,
        103.53
      ],
      "std": [
        58.395,
        57.12,
        57.375
      ],
      "to_rgb": true
    },
    "train_pipeline": [
      {
        "type": "LoadImageFromFile"
      },
      {
        "type": "LoadAnnotations",
        "with_bbox": true
      },
      {
        "type": "Resize",
        "img_scale": "(1333, 800)",
        "keep_ratio": true
      },
      {
        "type": "RandomFlip",
        "flip_ratio": 0.5
      },
      {
        "type": "Normalize",
        "mean": [
          123.675,
          116.28,
          103.53
        ],
        "std": [
          58.395,
          57.12,
          57.375
        ],
        "to_rgb": true
      },
      {
        "type": "Pad",
        "size_divisor": 32
      },
      {
        "type": "DefaultFormatBundle"
      },
      {
        "type": "Collect",
        "keys": [
          "img",
          "gt_bboxes",
          "gt_labels"
        ]
      }
    ],
    "test_pipeline": [
      {
        "type": "LoadImageFromFile"
      },
      {
        "type": "MultiScaleFlipAug",
        "img_scale": "(1333, 800)",
        "flip": false,
        "transforms": [
          {
            "type": "Resize",
            "keep_ratio": true
          },
          {
            "type": "RandomFlip"
          },
          {
            "type": "Normalize",
            "mean": [
              123.675,
              116.28,
              103.53
            ],
            "std": [
              58.395,
              57.12,
              57.375
            ],
            "to_rgb": true
          },
          {
            "type": "Pad",
            "size_divisor": 32
          },
          {
            "type": "ImageToTensor",
            "keys": [
              "img"
            ]
          },
          {
            "type": "Collect",
            "keys": [
              "img"
            ]
          }
        ]
      }
    ],
    "data": {
      "samples_per_gpu": 4,
      "workers_per_gpu": 2,
      "train": {
        "type": "CocoDataset",
        "ann_file": "../../dataset/cleaned_train.json",
        "img_prefix": "../../dataset/",
        "pipeline": [
          {
            "type": "LoadImageFromFile"
          },
          {
            "type": "LoadAnnotations",
            "with_bbox": true
          },
          {
            "type": "Resize",
            "img_scale": "(512, 512)",
            "keep_ratio": true
          },
          {
            "type": "RandomFlip",
            "flip_ratio": 0.5
          },
          {
            "type": "Normalize",
            "mean": [
              123.675,
              116.28,
              103.53
            ],
            "std": [
              58.395,
              57.12,
              57.375
            ],
            "to_rgb": true
          },
          {
            "type": "Pad",
            "size_divisor": 32
          },
          {
            "type": "DefaultFormatBundle"
          },
          {
            "type": "Collect",
            "keys": [
              "img",
              "gt_bboxes",
              "gt_labels"
            ]
          }
        ],
        "classes": [
          "General trash",
          "Paper",
          "Paper pack",
          "Metal",
          "Glass",
          "Plastic",
          "Styrofoam",
          "Plastic bag",
          "Battery",
          "Clothing"
        ]
      },
      "val": {
        "type": "CocoDataset",
        "ann_file": "data/coco/annotations/instances_val2017.json",
        "img_prefix": "data/coco/val2017/",
        "pipeline": [
          {
            "type": "LoadImageFromFile"
          },
          {
            "type": "MultiScaleFlipAug",
            "img_scale": "(1333, 800)",
            "flip": false,
            "transforms": [
              {
                "type": "Resize",
                "keep_ratio": true
              },
              {
                "type": "RandomFlip"
              },
              {
                "type": "Normalize",
                "mean": [
                  123.675,
                  116.28,
                  103.53
                ],
                "std": [
                  58.395,
                  57.12,
                  57.375
                ],
                "to_rgb": true
              },
              {
                "type": "Pad",
                "size_divisor": 32
              },
              {
                "type": "ImageToTensor",
                "keys": [
                  "img"
                ]
              },
              {
                "type": "Collect",
                "keys": [
                  "img"
                ]
              }
            ]
          }
        ]
      },
      "test": {
        "type": "CocoDataset",
        "ann_file": "../../dataset/test.json",
        "img_prefix": "../../dataset/",
        "pipeline": [
          {
            "type": "LoadImageFromFile"
          },
          {
            "type": "MultiScaleFlipAug",
            "img_scale": "(512, 512)",
            "flip": false,
            "transforms": [
              {
                "type": "Resize",
                "keep_ratio": true
              },
              {
                "type": "RandomFlip"
              },
              {
                "type": "Normalize",
                "mean": [
                  123.675,
                  116.28,
                  103.53
                ],
                "std": [
                  58.395,
                  57.12,
                  57.375
                ],
                "to_rgb": true
              },
              {
                "type": "Pad",
                "size_divisor": 32
              },
              {
                "type": "ImageToTensor",
                "keys": [
                  "img"
                ]
              },
              {
                "type": "Collect",
                "keys": [
                  "img"
                ]
              }
            ]
          }
        ],
        "classes": [
          "General trash",
          "Paper",
          "Paper pack",
          "Metal",
          "Glass",
          "Plastic",
          "Styrofoam",
          "Plastic bag",
          "Battery",
          "Clothing"
        ]
      }
    },
    "evaluation": {
      "interval": 1,
      "metric": "bbox"
    },
    "optimizer": {
      "type": "SGD",
      "lr": 0.02,
      "momentum": 0.9,
      "weight_decay": 0.0001
    },
    "optimizer_config": {
      "grad_clip": {
        "max_norm": 35,
        "norm_type": 2
      }
    },
    "lr_config": {
      "policy": "step",
      "warmup": "linear",
      "warmup_iters": 500,
      "warmup_ratio": 0.001,
      "step": [
        8,
        11
      ]
    },
    "runner": {
      "type": "EpochBasedRunner",
      "max_epochs": 12
    },
    "checkpoint_config": {
      "max_keep_ckpts": 3,
      "interval": 1
    },
    "log_config": {
      "interval": 50,
      "hooks": [
        {
          "type": "TextLoggerHook"
        }
      ]
    },
    "custom_hooks": [
      {
        "type": "NumClassCheckHook"
      }
    ],
    "dist_params": {
      "backend": "nccl"
    },
    "log_level": "INFO",
    "load_from": "none",
    "resume_from": "none",
    "workflow": [
      "train",
      1
    ],
    "opencv_num_threads": 0,
    "mp_start_method": "fork",
    "auto_scale_lr": {
      "enable": false,
      "base_batch_size": 16
    },
    "seed": 2022,
    "gpu_ids": [
      0
    ],
    "work_dir": "./work_dirs/faster_rcnn_x101_64x4d_fpn_1x_trash",
    "device": "cuda"
  }

mmdetection 의 Config 시스템 분석 & 커스터마이징

Learn about Configs — MMDetection 3.3.0 documentation

불러오는 config 파일은 cfg = Config.fromfile('./configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_1x_coco.py') 이런 식으로 하나의 py 파일을 불러온다.

config 파일(.py)은 아래와 같이 구성되어 있다.

  _base_ = [
      '../_base_/models/faster_rcnn_r50_fpn.py',
      '../_base_/datasets/coco_detection.py',
      '../_base_/schedules/schedule_1x.py',
  		'../_base_/default_runtime.py'
  ]

이 .py 의 config 파일은 여러 config 파일들을 한 군데로 모아주는 역할을 한다.
크게 4가지의 config 로 구성됨. model, dataset, schedule, runtime
- model - 모델 구조 변경
- datasets - 사용할 데이터에 대한 정보 기입
- schedules - epoch 나 lr 수정
- runtime - weight 파일 저장 주기 수정
cfg.keys()

dict_keys([‘model’, ‘dataset_type’, ‘data_root’, ‘img_norm_cfg’, ‘train_pipeline’, ‘test_pipeline’, ‘data’, ‘evaluation’, ‘optimizer’, ‘optimizer_config’, ‘lr_config’, ‘runner’, ‘checkpoint_config’, ‘log_config’, ‘custom_hooks’, ‘dist_params’, ‘log_level’, ‘load_from’, ‘resume_from’, ‘workflow’, ‘opencv_num_threads’, ‘mp_start_method’, ‘auto_scale_lr’])

model

만약 모델을 커스텀을 한다면 아래처럼 할 수 있음.

  _base_ = './faster_rcnn_r50_fpn_1x_coco.py'  # faster_rcnn_r50_fpn_1x_coco.py 에 정의된 model, dataset, schedule, runtime 가져옴.
  # 여기서 모델을 새로 정의. 아니면 configs/_base_/models 안에 .py 파일로 모델을 정의할 수 있음(단 아래처럼 dict 형태로)
  model = dict(
      backbone=dict(
          type='ResNeXt',
          depth=101,
          groups=64,
          base_width=4,
          num_stages=4,
          out_indices=(0, 1, 2, 3),
          frozen_stages=1,
          norm_cfg=dict(type='BN', requires_grad=True),
          style='pytorch',
          init_cfg=dict(
              type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d')))
  # 중요한 것은 커스텀한 model.py 안에 train_cfg, test_cfg 가 들어가야함.(rpn 등의 하이퍼파라미터 조정용)
  train_cfg = dict(  # Config of training hyperparameters for rpn and rcnn
          rpn=dict(  # Training config of rpn
              assigner=dict(  # Config of assigner
                  type='MaxIoUAssigner',  # Type of assigner, MaxIoUAssigner is used for many common detectors. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/assigners/max_iou_assigner.py#L14 for more details.
                  pos_iou_thr=0.7,  # IoU >= threshold 0.7 will be taken as positive samples
                  neg_iou_thr=0.3,  # IoU < threshold 0.3 will be taken as negative samples
                  min_pos_iou=0.3,  # The minimal IoU threshold to take boxes as positive samples
                  match_low_quality=True,  # Whether to match the boxes under low quality (see API doc for more details).
                  ignore_iof_thr=-1),  # IoF threshold for ignoring bboxes
              sampler=dict(  # Config of positive/negative sampler
                  type='RandomSampler',  # Type of sampler, PseudoSampler and other samplers are also supported. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/samplers/random_sampler.py#L14 for implementation details.
                  num=256,  # Number of samples
                  pos_fraction=0.5,  # The ratio of positive samples in the total samples.
                  neg_pos_ub=-1,  # The upper bound of negative samples based on the number of positive samples.
                  add_gt_as_proposals=False),  # Whether add GT as proposals after sampling.
              allowed_border=-1,  # The border allowed after padding for valid anchors.
              pos_weight=-1,  # The weight of positive samples during training.
              debug=False),  # Whether to set the debug mode
          rpn_proposal=dict(  # The config to generate proposals during training
              nms_across_levels=False,  # Whether to do NMS for boxes across levels. Only work in `GARPNHead`, naive rpn does not support do nms cross levels.
              nms_pre=2000,  # The number of boxes before NMS
              nms_post=1000,  # The number of boxes to be kept by NMS. Only work in `GARPNHead`.
              max_per_img=1000,  # The number of boxes to be kept after NMS.
              nms=dict( # Config of NMS
                  type='nms',  # Type of NMS
                  iou_threshold=0.7 # NMS threshold
                  ),
              min_bbox_size=0),  # The allowed minimal box size
          rcnn=dict(  # The config for the roi heads.
              assigner=dict(  # Config of assigner for second stage, this is different for that in rpn
                  type='MaxIoUAssigner',  # Type of assigner, MaxIoUAssigner is used for all roi_heads for now. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/assigners/max_iou_assigner.py#L14 for more details.
                  pos_iou_thr=0.5,  # IoU >= threshold 0.5 will be taken as positive samples
                  neg_iou_thr=0.5,  # IoU < threshold 0.5 will be taken as negative samples
                  min_pos_iou=0.5,  # The minimal IoU threshold to take boxes as positive samples
                  match_low_quality=False,  # Whether to match the boxes under low quality (see API doc for more details).
                  ignore_iof_thr=-1),  # IoF threshold for ignoring bboxes
              sampler=dict(
                  type='RandomSampler',  # Type of sampler, PseudoSampler and other samplers are also supported. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/samplers/random_sampler.py#L14 for implementation details.
                  num=512,  # Number of samples
                  pos_fraction=0.25,  # The ratio of positive samples in the total samples.
                  neg_pos_ub=-1,  # The upper bound of negative samples based on the number of positive samples.
                  add_gt_as_proposals=True
              ),  # Whether add GT as proposals after sampling.
              mask_size=28,  # Size of mask
              pos_weight=-1,  # The weight of positive samples during training.
              debug=False)),  # Whether to set the debug mode

dataset

‘dataset_type’, ‘data_root’, ‘img_norm_cfg’, ‘train_pipeline’, ‘test_pipeline’, ‘data’ 에 해당하는 키

데이터셋 만드는 부분

  datasets = [build_dataset(cfg.data.train)]
                    
  "data": {
      "samples_per_gpu": 2,
      "workers_per_gpu": 2,
      "train": {
        "type": "CocoDataset",
        "ann_file": "data/coco/annotations/instances_train2017.json",
        "img_prefix": "data/coco/train2017/",
        "pipeline": [
          {
            "type": "LoadImagefromfile"
          },
          {
            "type": "LoadAnnotations",
            "with_bbox": true
          },
          {
            "type": "Resize",
            "img_scale": "(1333, 800)",
            "keep_ratio": true
          },
          {
            "type": "Randomflip",
            "flip_ratio": 0.5
          },
          {
            "type": "normalize",
            "mean": [
              123.675,
              116.28,
              103.53
            ],
            "std": [
              58.395,
              57.12,
              57.375
            ],
            "to_rgb": true
          },
          {
            "type": "Pad",
            "size_divisor": 32
          },
          {
            "type": "DefaultformatBundle"
          },
          {
            "type": "Collect",
            "keys": [
              "img",
              "gt_bboxes",
              "gt_labels"
            ]
          }
        ]
      },
                    

  def build_dataset(cfg, default_args=None):
      from .dataset_wrappers import (ClassBalancedDataset, ConcatDataset,
                                     MultiImageMixDataset, RepeatDataset)
      if isinstance(cfg, (list, tuple)):
          dataset = ConcatDataset([build_dataset(c, default_args) for c in cfg])
      elif cfg['type'] == 'ConcatDataset':
          dataset = ConcatDataset(
              [build_dataset(c, default_args) for c in cfg['datasets']],
              cfg.get('separate_eval', True))
      elif cfg['type'] == 'RepeatDataset':
          dataset = RepeatDataset(
              build_dataset(cfg['dataset'], default_args), cfg['times'])
      elif cfg['type'] == 'ClassBalancedDataset':
          dataset = ClassBalancedDataset(
              build_dataset(cfg['dataset'], default_args), cfg['oversample_thr'])
      elif cfg['type'] == 'MultiImageMixDataset':
          cp_cfg = copy.deepcopy(cfg)
          cp_cfg['dataset'] = build_dataset(cp_cfg['dataset'])
          cp_cfg.pop('type')
          dataset = MultiImageMixDataset(**cp_cfg)
      elif isinstance(cfg.get('ann_file'), (list, tuple)):
          dataset = _concat_dataset(cfg, default_args)
      else:
          dataset = build_from_cfg(cfg, DATASETS, default_args)
                    
      return dataset

mmdetection/mmdet/datasets/builder.py 내에서 mmcv.utils 의 build_from_cfg 를 import

그리고 mmcv 에서는 Registry 를 쓰는데, 이는 문자열을 주면 mapping 방식으로 해당 클래스로 찾아가게 하는 것.

DATASETS = Registry('dataset') 이걸로 인해서 아래처럼 @DATASETS.register_module() 을 찾아갈 수 있음.

Untitled

위 cfg 에서 type이 CoCoDataset 이었기 때문에, mmdetection/mmdet/datasets/coco.py 로 가고, 여기서 CoCoDataset 은 CustomDataset 을 상속

Untitled

CustomDataset 은 torch.utils.data 의 Dataset 을 상속

Untitled

  # filter images too small and containing no annotations
  if not test_mode:
      valid_inds = self._filter_imgs()   # 너무 작은 img 들을 필터링함
      self.data_infos = [self.data_infos[i] for i in valid_inds]   # 여기서 이미지 정보, annotation 정보를 업데이트
      if self.proposals is not None:
          self.proposals = [self.proposals[i] for i in valid_inds]
      # set group flag for the sampler
      self._set_group_flag()
                
  # processing pipeline
  self.pipeline = Compose(pipeline)

mmcv 의 Compose

  class Compose:
      """Compose multiple transforms sequentially.
                    
      Args:
          transforms (Sequence[dict | callable]): Sequence of transform object or
              config dict to be composed.
      """
                    
      def __init__(self, transforms):
          assert isinstance(transforms, collections.abc.Sequence)
          self.transforms = []
          for transform in transforms:
              if isinstance(transform, dict):
                  transform = build_from_cfg(transform, PIPELINES)
                  self.transforms.append(transform)
              elif callable(transform):
                  self.transforms.append(transform)
              else:
                  raise TypeError('transform must be callable or a dict')
                    
      def __call__(self, data):
          """Call function to apply transforms sequentially.
                    
          Args:
              data (dict): A result dict contains the data to transform.
                    
          Returns:
             dict: Transformed data.
          """
                    
          for t in self.transforms:
              data = t(data)
              if data is None:
                  return None
          return data
                    
      def __repr__(self):
          format_string = self.__class__.__name__ + '('
          for t in self.transforms:
              str_ = t.__repr__()
              if 'Compose(' in str_:
                  str_ = str_.replace('\n', '\n    ')
              format_string += '\n'
              format_string += f'    {str_}'
          format_string += '\n)'
          return format_string

→ 위 과정에서 data augmentation 이 적용됨. 위 cfg 내에 pipeline 참고

  def pre_pipeline(self, results):
      """Prepare results dict for pipeline."""
      results['img_prefix'] = self.img_prefix
      results['seg_prefix'] = self.seg_prefix
      results['proposal_file'] = self.proposal_file
      results['bbox_fields'] = []
      results['mask_fields'] = []
      results['seg_fields'] = []
                
  def __getitem__(self, idx):
      """Get training/test data after pipeline.
                
      Args:
          idx (int): Index of data.
                
      Returns:
          dict: Training/test data (with annotation if `test_mode` is set \
              True).
      """
                
      if self.test_mode:
          return self.prepare_test_img(idx)
      while True:
          data = self.prepare_train_img(idx)
          if data is None:
              idx = self._rand_another(idx)
              continue
          return data
                
  def prepare_train_img(self, idx):
      """Get training data and annotations after pipeline.
                
      Args:
          idx (int): Index of data.
                
      Returns:
          dict: Training data and annotation after pipeline with new keys \
              introduced by pipeline.
      """
                
      img_info = self.data_infos[idx]
      ann_info = self.get_ann_info(idx)
      results = dict(img_info=img_info, ann_info=ann_info)
      if self.proposals is not None:
          results['proposals'] = self.proposals[idx]
      self.pre_pipeline(results)
      return self.pipeline(results)

mmdetection/base/datasets/coco_detection.py 를 수정하거나, train.py 를 하나 만들어서 거기서 cfg.data.train 의 인자를 변경해주는 식으로 커스텀 할 수 있음.

coco_detection.py

  # dataset settings
  dataset_type = 'CocoDataset'
  data_root = 'data/coco/'
  img_norm_cfg = dict(
      mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
  train_pipeline = [
      dict(type='LoadImageFromFile'),
      dict(type='LoadAnnotations', with_bbox=True),
      dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
      dict(type='RandomFlip', flip_ratio=0.5),
      dict(type='Normalize', **img_norm_cfg),
      dict(type='Pad', size_divisor=32),
      dict(type='DefaultFormatBundle'),
      dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
  ]
  test_pipeline = [
      dict(type='LoadImageFromFile'),
      dict(
          type='MultiScaleFlipAug',
          img_scale=(1333, 800),
          flip=False,
          transforms=[
              dict(type='Resize', keep_ratio=True),
              dict(type='RandomFlip'),
              dict(type='Normalize', **img_norm_cfg),
              dict(type='Pad', size_divisor=32),
              dict(type='ImageToTensor', keys=['img']),
              dict(type='Collect', keys=['img']),
          ])
  ]
  data = dict(
      samples_per_gpu=2,
      workers_per_gpu=2,
      train=dict(
          type=dataset_type,
          ann_file=data_root + 'annotations/instances_train2017.json',
          img_prefix=data_root + 'train2017/',
          pipeline=train_pipeline),
      val=dict(
          type=dataset_type,
          ann_file=data_root + 'annotations/instances_val2017.json',
          img_prefix=data_root + 'val2017/',
          pipeline=test_pipeline),
      test=dict(
          type=dataset_type,
          ann_file=data_root + 'annotations/instances_val2017.json',
          img_prefix=data_root + 'val2017/',
          pipeline=test_pipeline))
  evaluation = dict(interval=1, metric='bbox')

dataset config 수정

  # dataset config 수정
  cfg.data.train.classes = classes
  cfg.data.train.img_prefix = root
  cfg.data.train.ann_file = root + 'cleaned_train.json' # train json 정보
  cfg.data.train.pipeline[2]['img_scale'] = (512,512) # Resize
                        
  cfg.data.test.classes = classes
  cfg.data.test.img_prefix = root
  cfg.data.test.ann_file = root + 'test.json' # test json 정보
  cfg.data.test.pipeline[1]['img_scale'] = (512,512) # Resize

데이터 처리 파이프라인

Untitled

dataloader

베이스라인에서는 딱히 loader 를 지정해주지 않음.

from mmdet.apis import train_detector 에서 from mmdet.datasets import (build_dataloader, build_dataset, replace_ImageToTensor) 로 build_dataloader 를 불러오고,

bulid_dataloader

  def build_dataloader(dataset,
                       samples_per_gpu,
                       workers_per_gpu,
                       num_gpus=1,
                       dist=True,
                       shuffle=True,
                       seed=None,
                       runner_type='EpochBasedRunner',
                       persistent_workers=False,
                       class_aware_sampler=None,
                       **kwargs):
      """Build PyTorch DataLoader.
                        
      In distributed training, each GPU/process has a dataloader.
      In non-distributed training, there is only one dataloader for all GPUs.
                        
      Args:
          dataset (Dataset): A PyTorch dataset.
          samples_per_gpu (int): Number of training samples on each GPU, i.e.,
              batch size of each GPU.
          workers_per_gpu (int): How many subprocesses to use for data loading
              for each GPU.
          num_gpus (int): Number of GPUs. Only used in non-distributed training.
          dist (bool): Distributed training/test or not. Default: True.
          shuffle (bool): Whether to shuffle the data at every epoch.
              Default: True.
          seed (int, Optional): Seed to be used. Default: None.
          runner_type (str): Type of runner. Default: `EpochBasedRunner`
          persistent_workers (bool): If True, the data loader will not shutdown
              the worker processes after a dataset has been consumed once.
              This allows to maintain the workers `Dataset` instances alive.
              This argument is only valid when PyTorch>=1.7.0. Default: False.
          class_aware_sampler (dict): Whether to use `ClassAwareSampler`
              during training. Default: None.
          kwargs: any keyword argument to be used to initialize DataLoader
                        
      Returns:
          DataLoader: A PyTorch dataloader.
      """
      rank, world_size = get_dist_info()
                        
      if dist:
          # When model is :obj:`DistributedDataParallel`,
          # `batch_size` of :obj:`dataloader` is the
          # number of training samples on each GPU.
          batch_size = samples_per_gpu
          num_workers = workers_per_gpu
      else:
          # When model is obj:`DataParallel`
          # the batch size is samples on all the GPUS
          batch_size = num_gpus * samples_per_gpu
          num_workers = num_gpus * workers_per_gpu
                        
      if runner_type == 'IterBasedRunner':
          # this is a batch sampler, which can yield
          # a mini-batch indices each time.
          # it can be used in both `DataParallel` and
          # `DistributedDataParallel`
          if shuffle:
              batch_sampler = InfiniteGroupBatchSampler(
                  dataset, batch_size, world_size, rank, seed=seed)
          else:
              batch_sampler = InfiniteBatchSampler(
                  dataset,
                  batch_size,
                  world_size,
                  rank,
                  seed=seed,
                  shuffle=False)
          batch_size = 1
          sampler = None
      else:
          if class_aware_sampler is not None:
              # ClassAwareSampler can be used in both distributed and
              # non-distributed training.
              num_sample_class = class_aware_sampler.get('num_sample_class', 1)
              sampler = ClassAwareSampler(
                  dataset,
                  samples_per_gpu,
                  world_size,
                  rank,
                  seed=seed,
                  num_sample_class=num_sample_class)
          elif dist:
              # DistributedGroupSampler will definitely shuffle the data to
              # satisfy that images on each GPU are in the same group
              if shuffle:
                  sampler = DistributedGroupSampler(
                      dataset, samples_per_gpu, world_size, rank, seed=seed)
              else:
                  sampler = DistributedSampler(
                      dataset, world_size, rank, shuffle=False, seed=seed)
          else:
              sampler = GroupSampler(dataset,
                                     samples_per_gpu) if shuffle else None
          batch_sampler = None
                        
      init_fn = partial(
          worker_init_fn, num_workers=num_workers, rank=rank,
          seed=seed) if seed is not None else None
                        
      if (TORCH_VERSION != 'parrots'
              and digit_version(TORCH_VERSION) >= digit_version('1.7.0')):
          kwargs['persistent_workers'] = persistent_workers
      elif persistent_workers is True:
          warnings.warn('persistent_workers is invalid because your pytorch '
                        'version is lower than 1.7.0')
                        
      data_loader = DataLoader(
          dataset,
          batch_size=batch_size,
          sampler=sampler,
          num_workers=num_workers,
          batch_sampler=batch_sampler,
          collate_fn=partial(collate, samples_per_gpu=samples_per_gpu),
          pin_memory=kwargs.pop('pin_memory', False),
          worker_init_fn=init_fn,
          **kwargs)
                        
      return data_loader

기존의 파이토치 DataLoader 처럼 불러들임. 베이스라인에서는 sampler 나 이런 것들은 지정되어 있지 않고, cfg.data.samples_per_gpu = 4 로 batch_size 를 4로 주고 있음. (batch_size = num_gpus * samples_per_gpu)

custom

위 Dataset 과 함께 Custom 가능. 역시 dict 형식으로 주어야 함.

  dataset_type = 'CocoDataset'  # Dataset type, this will be used to define the dataset
  data_root = 'data/coco/'  # Root path of data
  backend_args = None # Arguments to instantiate the corresponding file backend
                    
  train_pipeline = [  # Training data processing pipeline
      dict(type='LoadImageFromFile', backend_args=backend_args),  # First pipeline to load images from file path
      dict(
          type='LoadAnnotations',  # Second pipeline to load annotations for current image
          with_bbox=True,  # Whether to use bounding box, True for detection
          with_mask=True,  # Whether to use instance mask, True for instance segmentation
          poly2mask=True),  # Whether to convert the polygon mask to instance mask, set False for acceleration and to save memory
      dict(
          type='Resize',  # Pipeline that resizes the images and their annotations
          scale=(1333, 800),  # The largest scale of the images
          keep_ratio=True  # Whether to keep the ratio between height and width
          ),
      dict(
          type='RandomFlip',  # Augmentation pipeline that flips the images and their annotations
          prob=0.5),  # The probability to flip
      dict(type='PackDetInputs')  # Pipeline that formats the annotation data and decides which keys in the data should be packed into data_samples
  ]
  test_pipeline = [  # Testing data processing pipeline
      dict(type='LoadImageFromFile', backend_args=backend_args),  # First pipeline to load images from file path
      dict(type='Resize', scale=(1333, 800), keep_ratio=True),  # Pipeline that resizes the images
      dict(
          type='PackDetInputs',  # Pipeline that formats the annotation data and decides which keys in the data should be packed into data_samples
          meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                     'scale_factor'))
  ]
  train_dataloader = dict(   # Train dataloader config
      batch_size=2,  # Batch size of a single GPU
      num_workers=2,  # Worker to pre-fetch data for each single GPU
      persistent_workers=True,  # If ``True``, the dataloader will not shut down the worker processes after an epoch end, which can accelerate training speed.
      sampler=dict(  # training data sampler
          type='DefaultSampler',  # DefaultSampler which supports both distributed and non-distributed training. Refer to https://mmengine.readthedocs.io/en/latest/api/generated/mmengine.dataset.DefaultSampler.html#mmengine.dataset.DefaultSampler
          shuffle=True),  # randomly shuffle the training data in each epoch
      batch_sampler=dict(type='AspectRatioBatchSampler'),  # Batch sampler for grouping images with similar aspect ratio into a same batch. It can reduce GPU memory cost.
      dataset=dict(  # Train dataset config
          type=dataset_type,
          data_root=data_root,
          ann_file='annotations/instances_train2017.json',  # Path of annotation file
          data_prefix=dict(img='train2017/'),  # Prefix of image path
          filter_cfg=dict(filter_empty_gt=True, min_size=32),  # Config of filtering images and annotations
          pipeline=train_pipeline,
          backend_args=backend_args))
  val_dataloader = dict(  # Validation dataloader config
      batch_size=1,  # Batch size of a single GPU. If batch-size > 1, the extra padding area may influence the performance.
      num_workers=2,  # Worker to pre-fetch data for each single GPU
      persistent_workers=True,  # If ``True``, the dataloader will not shut down the worker processes after an epoch end, which can accelerate training speed.
      drop_last=False,  # Whether to drop the last incomplete batch, if the dataset size is not divisible by the batch size
      sampler=dict(
          type='DefaultSampler',
          shuffle=False),  # not shuffle during validation and testing
      dataset=dict(
          type=dataset_type,
          data_root=data_root,
          ann_file='annotations/instances_val2017.json',
          data_prefix=dict(img='val2017/'),
          test_mode=True,  # Turn on the test mode of the dataset to avoid filtering annotations or images
          pipeline=test_pipeline,
          backend_args=backend_args))
  test_dataloader = val_dataloader  # Testing dataloader config

evaluation

validation 과정에 metric 을 줄 수 있음.

  val_evaluator = dict(  # Validation evaluator config
      type='CocoMetric',  # The coco metric used to evaluate AR, AP, and mAP for detection and instance segmentation
      ann_file=data_root + 'annotations/instances_val2017.json',  # Annotation file path
      metric=['bbox', 'segm'],  # Metrics to be evaluated, `bbox` for detection and `segm` for instance segmentation
      format_only=False,
      backend_args=backend_args)
  test_evaluator = val_evaluator  # Testing evaluator config

test 과정에서 evaluation은 validation 과 똑같이 거치고, 그 결과를 저장하게 할 수 있음.

  # inference on test dataset and
  # format the output results for submission.
  test_dataloader = dict(
      batch_size=1,
      num_workers=2,
      persistent_workers=True,
      drop_last=False,
      sampler=dict(type='DefaultSampler', shuffle=False),
      dataset=dict(
          type=dataset_type,
          data_root=data_root,
          ann_file=data_root + 'annotations/image_info_test-dev2017.json',
          data_prefix=dict(img='test2017/'),
          test_mode=True,
          pipeline=test_pipeline))
  test_evaluator = dict(
      type='CocoMetric',
      ann_file=data_root + 'annotations/image_info_test-dev2017.json',
      metric=['bbox', 'segm'],  # Metrics to be evaluated
      format_only=True,  # Only format and save the results to coco json file
      outfile_prefix='./work_dirs/coco_detection/test')  # The prefix of output json files

train

mmdet.apis.train.py 에서 실행됨. 이 때 dataset, dataloader, optimizer, scheduler 등이 모두 불러와짐.

train 은 mmcv 의 runner 에서 실행됨.

optimizer, scheduler

schedule_1x.py 에서 정의됨.

  # optimizer
  optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
  optimizer_config = dict(grad_clip=None)
  # learning policy
  lr_config = dict(
      policy='step',
      warmup='linear',
      warmup_iters=500,
      warmup_ratio=0.001,
      step=[8, 11])
  runner = dict(type='EpochBasedRunner', max_epochs=12)

1x는 12 epoch, 2x는 24 epoch이며 8/16번째와 11/22번째 epoch 에서 lr이 10분의 1이 된다. 20e는 cascade 모델에서 사용되는 것으로 20 epoch으로 10분의 1이 되는 시점은 16/19번째이다.
아래 처럼 커스텀할 수 있음.

  optim_wrapper = dict(  # Optimizer wrapper config
      type='OptimWrapper',  # Optimizer wrapper type, switch to AmpOptimWrapper to enable mixed precision training.
      optimizer=dict(  # Optimizer config. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms
          type='SGD',  # Stochastic gradient descent optimizer
          lr=0.02,  # The base learning rate
          momentum=0.9,  # Stochastic gradient descent with momentum
          weight_decay=0.0001),  # Weight decay of SGD
      clip_grad=None,  # Gradient clip option. Set None to disable gradient clip. Find usage in https://mmengine.readthedocs.io/en/latest/tutorials/optimizer.html
      )
  # gradient를 clip 해서 볼 수 있는 옵션도 있는 걸 보니 활용하면 좋을 것 같네요.

optim_wrapper 를 살펴보면, 이것 또한 mmcv 에서 만들어짐

  # Copyright (c) OpenMMLab. All rights reserved.
  import copy
                
  from mmcv.runner.optimizer import OPTIMIZER_BUILDERS as MMCV_OPTIMIZER_BUILDERS
  from mmcv.utils import Registry, build_from_cfg
                
  OPTIMIZER_BUILDERS = Registry(
      'optimizer builder', parent=MMCV_OPTIMIZER_BUILDERS)
                
  def build_optimizer_constructor(cfg):
      constructor_type = cfg.get('type')
      if constructor_type in OPTIMIZER_BUILDERS:
          return build_from_cfg(cfg, OPTIMIZER_BUILDERS)
      elif constructor_type in MMCV_OPTIMIZER_BUILDERS:
          return build_from_cfg(cfg, MMCV_OPTIMIZER_BUILDERS)
      else:
          raise KeyError(f'{constructor_type} is not registered '
                         'in the optimizer builder registry.')
                
  def build_optimizer(model, cfg):
      optimizer_cfg = copy.deepcopy(cfg)
      constructor_type = optimizer_cfg.pop('constructor',
                                           'DefaultOptimizerConstructor')
      paramwise_cfg = optimizer_cfg.pop('paramwise_cfg', None)
      optim_constructor = build_optimizer_constructor(
          dict(
              type=constructor_type,
              optimizer_cfg=optimizer_cfg,
              paramwise_cfg=paramwise_cfg))
      optimizer = optim_constructor(model)
      return optimizer

mmcv runner optimizer

  # Copyright (c) OpenMMLab. All rights reserved.
  import copy
  import inspect
  from typing import Dict, List
                    
  import torch
                    
  from ...utils import Registry, build_from_cfg
                    
  OPTIMIZERS = Registry('optimizer')
  OPTIMIZER_BUILDERS = Registry('optimizer builder')
                    
  def register_torch_optimizers() -> List:
      torch_optimizers = []
      for module_name in dir(torch.optim):
          if module_name.startswith('__'):
              continue
          _optim = getattr(torch.optim, module_name)
          if inspect.isclass(_optim) and issubclass(_optim,
                                                    torch.optim.Optimizer):
              OPTIMIZERS.register_module()(_optim)
              torch_optimizers.append(module_name)
      return torch_optimizers
                    
  TORCH_OPTIMIZERS = register_torch_optimizers()
                    
  def build_optimizer_constructor(cfg: Dict):
      return build_from_cfg(cfg, OPTIMIZER_BUILDERS)
                    
  def build_optimizer(model, cfg: Dict):
      optimizer_cfg = copy.deepcopy(cfg)
      constructor_type = optimizer_cfg.pop('constructor',
                                           'DefaultOptimizerConstructor')
      paramwise_cfg = optimizer_cfg.pop('paramwise_cfg', None)
      optim_constructor = build_optimizer_constructor(
          dict(
              type=constructor_type,
              optimizer_cfg=optimizer_cfg,
              paramwise_cfg=paramwise_cfg))
      optimizer = optim_constructor(model)
      return optimizer

위 코드를 보면, torch 에서 optimizer 를 가져옴. optimizer 를 여러개 시험할 때, torch 의 각 Optimizer 가 가지는 argument 들을 주면 될 것으로 예상.

mmcv의 build_runner 에서 scheduler 또한 정의함. (파이토치)
커스텀

  param_scheduler = [
      # Linear learning rate warm-up scheduler
      dict(
          type='LinearLR',  # Use linear policy to warmup learning rate
          start_factor=0.001, # The ratio of the starting learning rate used for warmup
          by_epoch=False,  # The warmup learning rate is updated by iteration
          begin=0,  # Start from the first iteration
          end=500),  # End the warmup at the 500th iteration
      # The main LRScheduler
      dict(
          type='MultiStepLR',  # Use multi-step learning rate policy during training
          by_epoch=True,  # The learning rate is updated by epoch
          begin=0,   # Start from the first epoch
          end=12,  # End at the 12th epoch
          milestones=[8, 11],  # Epochs to decay the learning rate
          gamma=0.1)  # The learning rate decay ratio
  ]

custom_hooks

hook 을 따로 주지 않으면 default_hook 이 적용됨.

  default_hooks = dict(
      timer=dict(type='IterTimerHook'),  # Update the time spent during iteration into message hub
      logger=dict(type='LoggerHook', interval=50),  # Collect logs from different components of Runner and write them to terminal, JSON file, tensorboard and wandb .etc
      param_scheduler=dict(type='ParamSchedulerHook'), # update some hyper-parameters of optimizer
      checkpoint=dict(type='CheckpointHook', interval=1), # Save checkpoints periodically
      sampler_seed=dict(type='DistSamplerSeedHook'),  # Ensure distributed Sampler shuffle is active
      visualization=dict(type='DetVisualizationHook'))  # Detection Visualization Hook. Used to visualize validation and testing process prediction results

custom hook 을 주고, print(cfg.text) 로 확인하면 깔끔하게 cofing를 확인 가능
default_runtime 에서 log_config 가 수정된다.

  checkpoint_config = dict(interval=1)
  # yapf:disable
  log_config = dict(
      interval=50,
      hooks=[
          dict(type='TextLoggerHook'),
          # dict(type='TensorboardLoggerHook')
      ])
  # yapf:enable
  custom_hooks = [dict(type='NumClassCheckHook')]
                
  dist_params = dict(backend='nccl')
  log_level = 'INFO'
  load_from = None
  resume_from = None
  workflow = [('train', 1)]
                
  # disable opencv multithreading to avoid system being overloaded
  opencv_num_threads = 0
  # set multi-process start method as `fork` to speed up the training
  mp_start_method = 'fork'
                
  # Default setting for scaling LR automatically
  #   - `enable` means enable scaling LR automatically
  #       or not by default.
  #   - `base_batch_size` = (8 GPUs) x (2 samples per GPU).
  auto_scale_lr = dict(enable=False, base_batch_size=16)

Ignore some fields in the base configs

_delete_=True 를 주면 base config 의 일정 부분을 덮어쓸 수 있음.

  model = dict(
      type='MaskRCNN',
      backbone=dict(
          type='ResNet',
          depth=50,
          num_stages=4,
          out_indices=(0, 1, 2, 3),
          frozen_stages=1,
          norm_cfg=dict(type='BN', requires_grad=True),
          norm_eval=True,
          style='pytorch',
          init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
      neck=dict(...),
      rpn_head=dict(...),
      roi_head=dict(...))

와

  _base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py'
  model = dict(
      backbone=dict(
          **_delete_=True,**
          type='HRNet',
          extra=dict(
              stage1=dict(
                  num_modules=1,
                  num_branches=1,
                  block='BOTTLENECK',
                  num_blocks=(4, ),
                  num_channels=(64, )),
              stage2=dict(
                  num_modules=1,
                  num_branches=2,
                  block='BASIC',
                  num_blocks=(4, 4),
                  num_channels=(32, 64)),
              stage3=dict(
                  num_modules=4,
                  num_branches=3,
                  block='BASIC',
                  num_blocks=(4, 4, 4),
                  num_channels=(32, 64, 128)),
              stage4=dict(
                  num_modules=3,
                  num_branches=4,
                  block='BASIC',
                  num_blocks=(4, 4, 4, 4),
                  num_channels=(32, 64, 128, 256))),
          init_cfg=dict(type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w32')),
      neck=dict(...))

이렇게 하면 덮어씌워진다! 즉 _base_에서 mask-rcnn_r50_fpn_1x 를 불러왔지만, 모델 config 를 HRNet 을 backbone 으로 하는 모델로 교체하는 것!

모델 파인튜닝

  # optimizer
  # lr is set for a batch size of 8
  optim_wrapper = dict(optimizer=dict(lr=0.01))
                
  # learning rate
  param_scheduler = [
      dict(
          type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500),
      dict(
          type='MultiStepLR',
          begin=0,
          end=8,
          by_epoch=True,
          milestones=[7],
          gamma=0.1)
  ]
                
  # max_epochs
  train_cfg = dict(max_epochs=8)
                
  # log config
  default_hooks = dict(logger=dict(interval=100)),

→ 위는 공식문서에서 가져온 것인데, 저렇게 LR 을 두 개 주면, backbone, head 다르게 적용되는지는 아직 모르겠음.

      Example 1:
          >>> model = torch.nn.modules.Conv1d(1, 1, 1)
          >>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
          >>>                      weight_decay=0.0001)
          >>> paramwise_cfg = dict(norm_decay_mult=0.)
          >>> optim_builder = DefaultOptimizerConstructor(
          >>>     optimizer_cfg, paramwise_cfg)
          >>> optimizer = optim_builder(model)
      Example 2:
          >>> # assume model have attribute model.backbone and model.cls_head
          >>> optimizer_cfg = dict(type='SGD', lr=0.01, weight_decay=0.95)
          >>> paramwise_cfg = dict(custom_keys={
                  '.backbone': dict(lr_mult=0.1, decay_mult=0.9)})
          >>> optim_builder = DefaultOptimizerConstructor(
          >>>     optimizer_cfg, paramwise_cfg)
          >>> optimizer = optim_builder(model)
          >>> # Then the `lr` and `weight_decay` for model.backbone is
          >>> # (0.01 * 0.1, 0.95 * 0.9). `lr` and `weight_decay` for
          >>> # model.cls_head is (0.01, 0.95).

→ 이건 mmcv.runner.optimizer.default_constructor.py 에서 가져온 doc string. 음영 처리된 부분이 backbone, head 다르게 lr 을 줄 수 있는 것처럼 보임.

config 파일 구조

Untitled

Detectron2

맨 위로 이동 ↑

Bkkhyunn

[Object Detection] 3. MMdetection

댓글 남기기

관련 게시글

[Deep Learning] MLP 구현 (3)

[Deep Learning, d2l] MLP 구현 (2)

[Deep Learning, d2l] MLP 구현 (1)

[Deep Learning, Optimization] Bias 와 Variance