Object Detection Base Model
-
Faster R-CNN FPN ResNet50 A two-stage object detection model that uses a Feature Pyramid Network (FPN) with ResNet50 backbone for multi-scale feature extraction, delivering high accuracy via refined region proposals and robust object classification.
-
Faster R-CNN FPN ResNet101 Similar to the ResNet50 variant but with a deeper backbone (ResNet101), providing improved accuracy at the cost of additional computational overhead.
-
RetinaNet R50 NAS A single-stage detector using ResNet50 backbone and focal loss to address class imbalance, enhanced with Neural Architecture Search (NAS) for optimized performance.
-
FCOS R50 A fully convolutional, anchor-free object detector that predicts object locations at each pixel, reducing complexity and improving performance in dense detection tasks.
-
RTMDet-large A real-time object detector optimized for high accuracy and throughput, leveraging architectural efficiency suitable for large-scale deployments.
-
RTMDet-x-large An extended version of RTMDet with increased capacity, offering superior accuracy for large-scale object detection tasks where compute resources are sufficient.
-
RTMDet-tiny A lightweight version of RTMDet, ideal for edge devices or low-resource environments, balancing speed and precision.
-
YOLOX-large An advanced single-stage model based on YOLO architecture, incorporating decoupled heads and dynamic label assignment for improved training stability and detection accuracy.
-
VFNet R50 A dense object detector with a novel IoU-aware classification score and improved feature refinement, enhancing both localization and classification quality.
Best Overall Model (Top Performer)
RTMDet-x-large
- Use when: You have large datasets, and want best accuracy.
- Why: Offers excellent performance across COCO-style benchmarks; optimized for both speed and scale.
Recommendations by Dataset Size
-
Large Dataset (≥ 10k images)
✅ RTMDet-x-large
✅ YOLOX-large
✅ VFNet R50
✅ Faster R-CNN FPN ResNet101
✅ RetinaNet R50 NAS
-
Why: These models scale well with data and benefit from large training sets.
-
Medium Dataset (2k–10k images)
✅ RTMDet-large
✅ YOLOX-large
✅ VFNet R50
✅ Faster R-CNN FPN ResNet50
-
Why: Balanced performance and resource efficiency. RTMDet-large is a sweet spot.
-
Small Dataset (≤ 2k images)
✅ RTMDet-tiny
✅ FCOS R50
✅ Faster R-CNN FPN ResNet50
-
Why: These are lightweight or stable two-stage models that avoid overfitting and work with limited data