Weakly-supervised semantic segmentation under image tags supervision is achallenging task as it directly associates high-level semantic to low-levelappearance. To bridge this gap, in this paper, we propose an iterativebottom-up and top-down framework which alternatively expands object regions andoptimizes segmentation network. We start from initial localization produced byclassification networks. While classification networks are only responsive tosmall and coarse discriminative object regions, we argue that, these regionscontain significant common features about objects. So in the bottom-up step, wemine common object features from the initial localization and expand objectregions with the mined features. To supplement non-discriminative regions,saliency maps are then considered under Bayesian framework to refine the objectregions. Then in the top-down step, the refined object regions are used assupervision to train the segmentation network and to predict object masks.These object masks provide more accurate localization and contain more regionsof object. Further, we take these object masks as initial localization and minecommon object features from them. These processes are conducted iteratively toprogressively produce fine object masks and optimize segmentation networks.Experimental results on Pascal VOC 2012 dataset demonstrate that the proposedmethod outperforms previous state-of-the-art methods by a large margin.