CTRL K

Auto Annotation

Grounding DINO – Object Detection (OD)
- Used for automated object detection annotations.
- Provides bounding box generation based on visual grounding and textual prompts.
- Supports multi-object detection in complex scenes.
- Improves annotation speed and accuracy with minimal manual intervention.
- Integrates seamlessly with QpiAI’s auto-annotation pipeline.
- Ideal for datasets requiring precise localization of multiple objects.
Grounding SAM – Segmentation (SAM)
- Used for instance segmentation annotations.
- Converts object boundaries into pixel-level masks for detailed labeling.
- Enables high-quality annotations for tasks like medical imaging, autonomous vision, and industrial inspection.
- Reduces manual polygon drawing effort significantly.
Grounding DINO-Large – Advanced Object Detection (OD)
- A more powerful version of Grounding DINO for complex and large-scale object detection tasks.
- Handles dense scenes, small objects, and multiple overlapping instances effectively.
- Suitable for datasets where standard DINO might underperform due to complexity.
- Improves recall and precision for fine-grained categories.
- Provides robust support for domain adaptation and custom vocabulary grounding.
- Enables pose estimation and landmark detection tasks.
- Supports body, hand, and facial keypoints depending on dataset requirements.
- Can be used alongside DINO or SAM outputs for composite annotation workflows.
- Facilitates downstream applications such as gesture recognition, motion tracking, and activity analysis.
- Upcoming integration with hybrid annotation pipelines in QpiAI Pro.
Qwen3-vl-4b (OD)
- Integrates object recognition, unified multimodal analysis.
- Handles dense captions, referring expression comprehension, and visual question answering (VQA) with high accuracy.
- Supports complex scene parsing and alignment across images and textual descriptions.
- Enables interactive multimodal chat, annotation guidance, and context-aware labeling within QpiAI Pro.
- Compatible with Grounding DINO, SAM, and RT-DETR for composite detection–captioning pipelines.
- Upcoming extensions include video-grounded dialogue, temporal reasoning, and dynamic visual tracking capabilities.