Object Detection

Definition

A computer vision task that identifies and locates multiple objects within an image or video, typically outputting both a class label and a bounding box for each detected instance.

In Depth

Object Detection goes beyond image classification — instead of assigning one label to an entire image, it answers: what objects are in this image, and exactly where are they? For each detected object, the model outputs a class label (car, person, dog) and a bounding box specifying the object's location and size. This spatial, instance-level understanding is what makes object detection essential for applications that need to reason about the physical layout of a scene.

Two dominant paradigms exist. Two-stage detectors (Faster R-CNN, Mask R-CNN) first propose candidate regions that might contain objects, then classify and refine each proposal — accurate but relatively slow. Single-stage detectors (YOLO, SSD, DETR) predict class labels and bounding boxes directly in a single pass over the image — faster and suitable for real-time applications. DETR (Detection Transformer) replaced convolutional stages with attention, demonstrating Transformer applicability to detection tasks.

Object detection accuracy is measured using metrics like mAP (mean Average Precision), which evaluates both classification accuracy and localization precision across multiple IoU (Intersection over Union) thresholds. Modern models like YOLOv8 and DINO achieve real-time performance on standard hardware while handling complex scenes with dozens of overlapping objects. The frontier challenge is open-vocabulary detection — identifying objects of any class based on text descriptions, even categories unseen during training.

Key Takeaway

Object detection transforms images from visual scenes into structured inventories — telling a machine not just what is present, but where each object is, enabling spatial awareness in AI systems.

Real-World Applications

01 Autonomous vehicles: detecting and tracking pedestrians, cars, cyclists, and road signs from camera and lidar streams in real time.

02 Security surveillance: identifying unauthorized individuals, unattended objects, or dangerous situations in CCTV footage.

03 Retail analytics: counting customers, tracking product movement, and detecting out-of-stock shelves from store cameras.

04 Sports analysis: tracking player positions, ball trajectories, and tactical formations for real-time analytics.

05 Medical imaging: detecting and measuring lesions, nodules, and anatomical structures in X-rays and CT scans.

In Depth

Real-World Applications

Related Concepts