CTRL K

Object Detection Base Model

Faster R-CNN FPN ResNet50 A two-stage object detection model that uses a Feature Pyramid Network (FPN) with ResNet50 backbone for multi-scale feature extraction, delivering high accuracy via refined region proposals and robust object classification.
Faster R-CNN FPN ResNet101 Similar to the ResNet50 variant but with a deeper backbone (ResNet101), providing improved accuracy at the cost of additional computational overhead.
RetinaNet R50 NAS A single-stage detector using ResNet50 backbone and focal loss to address class imbalance, enhanced with Neural Architecture Search (NAS) for optimized performance.
FCOS R50 A fully convolutional, anchor-free object detector that predicts object locations at each pixel, reducing complexity and improving performance in dense detection tasks.
RTMDet-large A real-time object detector optimized for high accuracy and throughput, leveraging architectural efficiency suitable for large-scale deployments.
RTMDet-x-large An extended version of RTMDet with increased capacity, offering superior accuracy for large-scale object detection tasks where compute resources are sufficient.
RTMDet-tiny A lightweight version of RTMDet, ideal for edge devices or low-resource environments, balancing speed and precision.
YOLOX-large An advanced single-stage model based on YOLO architecture, incorporating decoupled heads and dynamic label assignment for improved training stability and detection accuracy.
VFNet R50 A dense object detector with a novel IoU-aware classification score and improved feature refinement, enhancing both localization and classification quality.

Best Overall Model (Top Performer)

RTMDet-x-large

Use when: You have large datasets, and want best accuracy.
Why: Offers excellent performance across COCO-style benchmarks; optimized for both speed and scale.

Recommendations by Dataset Size

Large Dataset (≥ 10k images)

✅ RTMDet-x-large

✅ YOLOX-large

✅ VFNet R50

✅ Faster R-CNN FPN ResNet101

✅ RetinaNet R50 NAS
Why: These models scale well with data and benefit from large training sets.
Medium Dataset (2k–10k images)

✅ RTMDet-large

✅ YOLOX-large

✅ VFNet R50

✅ Faster R-CNN FPN ResNet50
Why: Balanced performance and resource efficiency. RTMDet-large is a sweet spot.
Small Dataset (≤ 2k images)

✅ RTMDet-tiny

✅ FCOS R50

✅ Faster R-CNN FPN ResNet50
Why: These are lightweight or stable two-stage models that avoid overfitting and work with limited data