Viqus Logo Viqus Logo
Home
Categories
Language Models Generative Imagery Hardware & Chips Business & Funding Ethics & Society Science & Robotics
Resources
AI Glossary Academy CLI Tool Labs
About Contact
Back to Glossary
Deep Learning Intermediate Also: CNN, ConvNet

Convolutional Neural Network (CNN)

Definition

A neural network architecture specialized for grid-structured data (especially images) that uses learned filters to detect local features — edges, textures, shapes — in a hierarchical, spatially-aware manner.

In Depth

A Convolutional Neural Network is purpose-built for data with spatial structure — most famously images. Its core operation is the convolution: a small learned filter (kernel) slides across the input, computing a dot product at each position. This produces a feature map that highlights where that filter's pattern appears in the image. Early filters detect simple features like horizontal or vertical edges; deeper layers combine these into progressively complex patterns — textures, parts, objects.

CNNs have three key architectural advantages over plain neural networks for image tasks. First, weight sharing: the same filter is applied everywhere in the image, dramatically reducing the number of parameters compared to a fully connected network. Second, spatial invariance: a filter that detects an eye fires whether the eye appears in the top-left or bottom-right of the image. Third, local connectivity: neurons connect only to a small region of the input, reflecting the local structure of visual patterns.

The CNN revolution began with AlexNet's ImageNet victory in 2012, and architectures like VGG, ResNet, and EfficientNet have pushed the boundaries ever since. While Transformers (specifically Vision Transformers) are increasingly competitive for image tasks, CNNs remain the dominant choice for embedded systems, mobile applications, and scenarios where computational efficiency is critical. They also remain central to video analysis, medical imaging, satellite imagery, and any domain with spatial data.

Key Takeaway

CNNs process images the way human vision does — hierarchically, detecting local features first and combining them into complex objects — making them extraordinarily efficient at any task involving spatial or visual data.

Real-World Applications

01 Image classification: identifying objects, scenes, and species in photos with human-level accuracy (ImageNet benchmark).
02 Medical imaging: detecting tumors in MRI scans, diabetic retinopathy in fundus images, and COVID-19 in chest X-rays.
03 Autonomous vehicles: real-time detection and tracking of pedestrians, vehicles, signs, and lane markings from camera feeds.
04 Quality control in manufacturing: spotting defects, scratches, or assembly errors in product images at production-line speed.
05 Satellite imagery analysis: detecting deforestation, flood damage, crop health, and infrastructure changes from aerial photos.

Frequently Asked Questions

How does a CNN process an image?

A CNN applies learned filters (small grids of weights) that slide across the image, detecting patterns at each position. Early layers detect simple features like edges and corners. Middle layers combine these into textures and shapes. Deep layers recognize complex objects like faces or cars. Pooling layers progressively reduce spatial dimensions, and a final classification layer maps features to predictions.

What is the difference between a CNN and a regular neural network?

A regular (fully connected) neural network connects every neuron to every neuron in the next layer. For a 1000×1000 image, that's 1 million inputs per neuron — billions of parameters. CNNs use weight sharing (the same filter applied everywhere) and local connectivity (each neuron sees only a small region), reducing parameters by orders of magnitude while preserving spatial understanding.

Are CNNs still relevant with Vision Transformers?

Yes. Vision Transformers (ViTs) have shown competitive or superior accuracy on large datasets, but CNNs remain preferred for edge deployment (mobile, IoT) due to their efficiency, for small datasets where they generalize better, and in many production systems already built around CNN architectures. Many state-of-the-art systems are hybrids combining CNN and Transformer elements.