Structural information is important in natural language understanding.Although some current neural net-based models have a limited ability to takelocal syntactic information, they fail to use high-level and large-scalestructures of documents. This information is valuable for text understandingsince it contains the author's strategy to express information, in building aneffective representation and forming appropriate output modes. We propose aneural net-based model, Zooming Network, capable of representing and leveragingtext structure of long document and developing its own analyzing rhythm toextract critical information. Generally, ZN consists of an encoding neural netthat can build a hierarchical representation of a document, and an interpretingneural model that can read the information at multi-levels and issuing labelingactions through a policy-net. Our model is trained with a hybrid paradigm ofsupervised learning (distinguishing right and wrong decision) and reinforcementlearning (determining the goodness among multiple right paths). We applied theproposed model to long text sequence labeling tasks, with performance exceedingbaseline model (biLSTM-crf) by 10 F1-measure.