Tensor

Definition

The fundamental data structure of deep learning — a multidimensional array of numbers that generalizes scalars, vectors, and matrices to arbitrary dimensions, and on which all neural network computations operate.

In Depth

In the context of deep learning, a tensor is a multidimensional array of numbers — the container that holds all data flowing through a neural network. A scalar (single number) is a 0-dimensional tensor. A vector is a 1D tensor. A matrix is a 2D tensor. An RGB image is a 3D tensor (height × width × 3 color channels). A batch of images is a 4D tensor (batch size × height × width × channels). Neural networks process data by performing mathematical operations — addition, multiplication, activation functions — on these tensors as they flow through layers.

The two dominant deep learning frameworks — TensorFlow (which takes its name from this concept) and PyTorch — are fundamentally tensor computation libraries. They provide optimized operations for creating, transforming, and computing on tensors, with automatic differentiation (the ability to compute gradients for backpropagation) built in. Critically, both frameworks execute tensor operations on GPUs, which are designed for massively parallel numerical computation — this is what makes training large neural networks feasible.

Understanding tensor shapes and dimensions is essential for deep learning practitioners. A common class of bugs involves shape mismatches — trying to multiply tensors with incompatible dimensions. Broadcasting rules govern how tensors of different shapes can interact. Reshaping, transposing, and slicing tensors are everyday operations. While the mathematical concept of a tensor in physics and differential geometry is more abstract, in deep learning the term simply means 'a multi-dimensional grid of numbers that flows through computational graphs.'

Key Takeaway

Tensors are multidimensional arrays of numbers — the universal data structure of deep learning. Every piece of data and every computation in neural networks is expressed as tensor operations.

Real-World Applications

01 Image processing: images are represented as 3D tensors (height × width × channels) and batches of images as 4D tensors for efficient parallel processing.

02 Natural language processing: text is tokenized and embedded into 2D tensors (sequence length × embedding dimension) for input to Transformer models.

03 Video analysis: video data is represented as 5D tensors (batch × time × height × width × channels) for temporal and spatial processing.

04 GPU-accelerated training: tensor operations are parallelized across thousands of GPU cores, enabling deep learning models to train on massive datasets.

05 Model weights: all learned parameters of a neural network are stored as tensors — a language model with billions of parameters stores them across many weight tensors.

In Depth

Real-World Applications

Related Concepts