The last decade has witnessed an experimental revolution in data science andmachine learning, epitomised by deep learning methods. Indeed, manyhigh-dimensional learning tasks previously thought to be beyond reach -- suchas computer vision, playing Go, or protein folding -- are in fact feasible withappropriate computational scale. Remarkably, the essence of deep learning isbuilt from two simple algorithmic principles: first, the notion ofrepresentation or feature learning, whereby adapted, often hierarchical,features capture the appropriate notion of regularity for each task, andsecond, learning by local gradient-descent type methods, typically implementedas backpropagation. While learning generic functions in high dimensions is a cursed estimationproblem, most tasks of interest are not generic, and come with essentialpre-defined regularities arising from the underlying low-dimensionality andstructure of the physical world. This text is concerned with exposing theseregularities through unified geometric principles that can be appliedthroughout a wide spectrum of applications. Such a 'geometric unification' endeavour, in the spirit of Felix Klein'sErlangen Program, serves a dual purpose: on one hand, it provides a commonmathematical framework to study the most successful neural networkarchitectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand,it gives a constructive procedure to incorporate prior physical knowledge intoneural architectures and provide principled way to build future architecturesyet to be invented.