We propose to meta-learn causal structures based on how fast a learner adaptsto new distributions arising from sparse distributional changes, e.g. due tointerventions, actions of agents and other sources of non-stationarities. Weshow that under this assumption, the correct causal structural choices lead tofaster adaptation to modified distributions because the changes areconcentrated in one or just a few mechanisms when the learned knowledge ismodularized appropriately. This leads to sparse expected gradients and a lowereffective number of degrees of freedom needing to be relearned while adaptingto the change. It motivates using the speed of adaptation to a modifieddistribution as a meta-learning objective. We demonstrate how this can be usedto determine the cause-effect relationship between two observed variables. Thedistributional changes do not need to correspond to standard interventions(clamping a variable), and the learner has no direct knowledge of theseinterventions. We show that causal structures can be parameterized viacontinuous variables and learned end-to-end. We then explore how these ideascould be used to also learn an encoder that would map low-level observedvariables to unobserved causal variables leading to faster adaptationout-of-distribution, learning a representation space where one can satisfy theassumptions of independent mechanisms and of small and sparse changes in thesemechanisms due to actions and non-stationarities.