LiDAR and Semantic Segmentation in 3D point cloud

prash

3 min readMay 23, 2021

Introduction

Problem Domain

Capturing a 3D world

3D data crucial for robotics, autonomous vehicle, 3D scale models, virtual reality.

Approaches

Computed from 2D

Convert images to generate 3D representations e.g. stereo, SfM, SLAM (cheap, not precise).

LiDAR (expensive, precise).

High level tasks in the 2D/3D input analysis

On high level, there are four analysis tasks— classification, partitioning, semantic segmentation and instance segmentation.

Classification: Involves predicting the class of one object in an image. In PointCloud — it is to classify the point cloud among class sets.

P → K

Partition / Object localization / Object Detection : Object localization refers to identifying the location of one or more objects in an image and drawing abounding box around their extent. Object detection combines these two tasks and localizes and classifies one or more objects in an image. In 3D, cluster the point cloud in parts & object.

Pi → [1, · · · , C]

Semantic Segmentation: Semantic segmentation is the process of classifying each pixel belonging to a particular label. In 3D, classify each point of a point cloud between K classes.

Pi → [1, · · ·, K]

Instance Segmentation: The main difference is that in semantic segmentation a pixel-level classification is performed directly, while in instance segmentation approaches an additional object detection step is needed to obtain the individual instances of all classes in an image. in 3D, cluster the point cloud into semantically characterized objects.

Pi → [1, · · · , C] [1, · · · , C] → [1, · · · , K]

Different analysis levels in computer vision

Why 3D analysis is harder than 2D?

What makes it harder to do direct 3D point cloud analysis than doing it in the 2D (or even pseudo-3D, for that matter). First of all we need to admit that there is considerably more data volume to handle which affects the data acquisition and processing strategies making it harder. Moreover, the very nature of the 3D point cloud data is, it is sparse and unstructured. There could be occlusions in the 3D path, and from the modeling perspective there is high and low density in the given space, and the lack of grid structure adds to the issues in 3D analysis, making it an uphill task. Luckily there are still good ways to handle the same which have covered in the later sections of this article.

In Summary, below are the key issues that make 3D analysis difficult —

Data volume
Highly variable density
Permutation-invariance
Sparsity
Occlusions
Acquisition artifacts
Lack of grid structure

The trade-off while scaling up!

The best approaches are generally very memory-hungry and the data volumes are huge. To optimize if we apply aggressive subsampling then there is a huge loss of information. On the other hand, if we apply sliding windows then there is a loss of global structure.

Image-Based Methods

Voxel-Based Methods

The idea is to generalize 2D convolutions to regular 3D grids

Voxelization + 3D convNets

3D Convolution-Based Methods

The idea is to generalize 2D convolutions to 3D point clouds as unordered data.

Tangent Convolution: 2D convolution in the tangent space of each point.
PointCNN: χ-convolutions: generalized convolutions for unordered inputs.
Principle: the network learns how to permute ordered inputs The invariance is learnt!

Segmentation in 3D

Segmentation in the context of Realtime Autonomous Vehicles

Geometric Partition: into simple shapes.

Complexity: very high (clouds of 108 points)
Algorithm: `0-cut pursuit

Superpoint embedding: learning shape descriptors

Complexity: low (subsampling to 128 points × ∼ 1000 points)
Algorithm: PointNet

Contextual Segmentation: using the global structure

Complexity: very low (superpoint graph ∼ 1000 sp)
Algorithm: ECC with Gated Recurrent Unit (GRU)