We study the problem of recovering an underlying 3D shape from a set ofimages. Existing learning based approaches usually resort to recurrent neuralnets, e.g., GRU, or intuitive pooling operations, e.g., max/mean pooling, tofuse multiple deep features encoded from input images. However, GRU basedapproaches are unable to consistently estimate 3D shapes given the same set ofinput images as the recurrent unit is permutation variant. It is also unlikelyto refine the 3D shape given more images due to the long-term memory loss ofGRU. The widely used pooling approaches are limited to capturing only the firstorder/moment information, ignoring other valuable features. In this paper, wepresent a new feed-forward neural module, named AttSets, together with adedicated training algorithm, named JTSO, to attentionally aggregate anarbitrary sized deep feature set for multi-view 3D reconstruction. AttSets ispermutation invariant, computationally efficient, flexible and robust tomultiple input images. We thoroughly evaluate various properties of AttSets onlarge public datasets. Extensive experiments show AttSets together with JTSOalgorithm significantly outperforms existing aggregation approaches.