TL;DR: A self-supervised capsule architecture that canonicalizes data while simultaneously decomposing point clouds into parts to perform unsupervised representation learning.
Weiwei Sun, Andrea Tagliasacchi, Boyang Deng, Sara Sabour, Soroosh Yazdani, Geoffrey Hinton, Kwang Moo Yi
In Conference, NeurIPS 2021.
@InProceedings{sun2021canonicalcapsules,
title = {Canonical Capsules: Self-Supervised Capsules in Canonical Pose},
author = {Weiwei Sun, Andrea Tagliasacchi, Boyang Deng, Sara Sabour, Soroosh Yazdani, Geoffrey Hinton, Kwang Moo Yi},
booktitle = {Neural Information Processing Systems},
year = {2021}}
We propose an unsupervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties. This not only enables the training of a semantically consistent decomposition, but also allows us to learn a canonicalization operation that enables object-centric reasoning. To train our neural network we require neither classification labels nor manually-aligned training datasets. Yet, by learning an object-centric representation in a self-supervised manner, our method outperforms the state-of-the-art on 3D point cloud reconstruction, canonicalization, and unsupervised classification.
We show qualitative highlights, where we decompose 3D point clouds and auto-encode them using Canonical Capsules. We color each Canonical Capsule with a unique colour, and similarly color "patches" from the reconstruction heads of 3D-PointCapsNet and AtlasNetV2. Canonical Capsules provide semantically consistent decomposition that is aligned in the canonical frame, leading to improved reconstruction quality and unsupervised classification performance.
Results with the single-category Canonical Capsules | |||||
Input | Decomposition | Ours reconstruction in canonical frame - not a still image! | Ours reconstruction in input frame | 3D-PointCapsNet reconstruction | AtlasNetV2 reconstruction |
Results with the multi-category Canonical Capsules | |||||
Input | Decomposition | Ours reconstruction in canonical frame | Ours reconstruction in input frame | 3D-PointCapsNet reconstruction | AtlasNetV2 reconstruction |
This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant, NSERC Collaborative Research and Development Grant, Google, Compute Canada, and Advanced Research Computing at the University of British Columbia.
This template was originally made by Phillip Isola and Richard Zhang for a colorful project, and inherits the modifications made by Shangzhe Wu.
The code can be found here.