Instance Segmentation Base Model
-
DeepLavV3+
A robust baseline model for semantic segmentation, DeepLabV3+ with ResNet-50 strikes a good balance between speed and accuracy. Suitable for medium-scale applications where both performance and resource efficiency are important.
-
SegFormer (MiT-B0 backbone)
A transformer-based semantic segmentation model that integrates hierarchical attention mechanisms and lightweight decoders, achieving efficient and accurate segmentation across diverse image resolutions and
scales
-
SegFormer-large (MiT-B4 backbone)
Offers significantly better accuracy than MiT-B0 by using a deeper backbone. A great choice for scenarios requiring high-quality segmentation results on complex scenes, while still being manageable on standard GPUs (12–16 GB).
-
SegFormer-x-large (MiT-B5 backbone)
The most accurate SegFormer variant, especially effective on a diverse and densely annotated dataset. Best suited for large-scale training or inference tasks where precision is critical. Requires high-end GPUs with larger memory (≥24 GB).
Best Overall Model (Top Performer)
Segformer x large :
- Use when: You have large datasets, and want best accuracy.
- Why: Offers excellent performance across COCO-style benchmarks; optimized for both speed and scale.
Recommendations by Dataset Size
-
Large Dataset (≥ 10k images)
✅ SeFormer-x large (MiT-B5)
✅ SegFormer-large (MiT-B4)
✅ DeepLabV3+ (ResNet-101)
-
Why: These models scale well with large datasets and offer high accuracy on complex multi-class segmentation like ADE20K.
-
Medium Dataset (2k–10k images)
✅ SegFormer-large (MiT-B4)
✅ DeepLabV3+ (ResNet-50)
-
Why: Balanced models offering a good trade-off between accuracy and training time on medium-scale datasets.
-
Small Dataset (≤ 2k images)
✅ SegFormer (MiT-B0)
✅ DeepLabV3+ (ResNet-50)
-
Why: These are lightweight and stable models that avoid overfitting and work well with limited data and compute.