Canonical Capsules: Self-Supervised Capsules in Canonical Pose

Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

Weiwei Sun ^{1 4 *}

Andrea Tagliasacchi ^{2 3 *}

Boyang Deng ³

Sara Sabour ^{2 3}

Soroosh Yazdani ³

Geoffrey Hinton ^{2 3}

Kwang Moo Yi ^{1 4}

¹ University of British Columbia

² University of Toronto

³ Google Research

⁴ University of Victoria

^* Equal contributions

TL;DR: A self-supervised capsule architecture that canonicalizes data while simultaneously decomposing point clouds into parts to perform unsupervised representation learning.

Paper

Canonical Capsules: Self-Supervised Capsules in Canonical Pose

Weiwei Sun, Andrea Tagliasacchi, Boyang Deng, Sara Sabour, Soroosh Yazdani, Geoffrey Hinton, Kwang Moo Yi

In Conference, NeurIPS 2021.

@InProceedings{sun2021canonicalcapsules,
title = {Canonical Capsules: Self-Supervised Capsules in Canonical Pose},
author = {Weiwei Sun, Andrea Tagliasacchi, Boyang Deng, Sara Sabour, Soroosh Yazdani, Geoffrey Hinton, Kwang Moo Yi},
booktitle = {Neural Information Processing Systems},
year = {2021}}

Abstract

We propose an unsupervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties. This not only enables the training of a semantically consistent decomposition, but also allows us to learn a canonicalization operation that enables object-centric reasoning. To train our neural network we require neither classification labels nor manually-aligned training datasets. Yet, by learning an object-centric representation in a self-supervised manner, our method outperforms the state-of-the-art on 3D point cloud reconstruction, canonicalization, and unsupervised classification.

Presentation Video

Results

We show qualitative highlights, where we decompose 3D point clouds and auto-encode them using Canonical Capsules. We color each Canonical Capsule with a unique colour, and similarly color "patches" from the reconstruction heads of 3D-PointCapsNet and AtlasNetV2. Canonical Capsules provide semantically consistent decomposition that is aligned in the canonical frame, leading to improved reconstruction quality and unsupervised classification performance.

Results with the single-category Canonical Capsules

Input	Decomposition	Ours reconstruction in canonical frame - not a still image!	Ours reconstruction in input frame	3D-PointCapsNet reconstruction	AtlasNetV2 reconstruction

Results with the multi-category Canonical Capsules

Input	Decomposition	Ours reconstruction in canonical frame	Ours reconstruction in input frame	3D-PointCapsNet reconstruction	AtlasNetV2 reconstruction

Acknowledgements

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant, NSERC Collaborative Research and Development Grant, Google, Compute Canada, and Advanced Research Computing at the University of British Columbia.

This template was originally made by Phillip Isola and Richard Zhang for a colorful project, and inherits the modifications made by Shangzhe Wu. The code can be found here.