In this theory paper, we investigate training deep neural networks (DNNs) forclassification via minimizing the information bottleneck (IB) functional. Weshow that the resulting optimization problem suffers from two severe issues:First, for deterministic DNNs, either the IB functional is infinite for almostall values of network parameters, making the optimization problem ill-posed, orit is piecewise constant, hence not admitting gradient-based optimizationmethods. Second, the invariance of the IB functional under bijections preventsit from capturing properties of the learned representation that are desirablefor classification, such as robustness and simplicity. We argue that theseissues are partly resolved for stochastic DNNs, DNNs that include a (hard orsoft) decision rule, or by replacing the IB functional with related, but morewell-behaved cost functions. We conclude that recent successes reported abouttraining DNNs using the IB framework must be attributed to such solutions. As aside effect, our results indicate limitations of the IB framework for theanalysis of DNNs. We also note that rather than trying to repair the inherentproblems in the IB functional, a better approach may be to design regularizerson latent representation enforcing the desired properties directly.