Fine-grained temporal action parsing is important in many applications, suchas daily activity understanding, human motion analysis, surgical robotics andothers requiring subtle and precise operations in a long-term period. In thispaper we propose a novel bilinear pooling operation, which is used inintermediate layers of a temporal convolutional encoder-decoder net. Incontrast to other work, our proposed bilinear pooling is learnable and hencecan capture more complex local statistics than the conventional counterpart. Inaddition, we introduce exact lower-dimension representations of our bilinearforms, so that the dimensionality is reduced with neither information loss norextra computation. We perform intensive experiments to quantitatively analyzeour model and show the superior performances to other state-of-the-art work onvarious datasets.