Isaac's Blog

Prepare Your Own Data for PointNet


Being a novel deep net architecture invariant towards input order, PointNet is able to consume unordered point clouds directly and thus has a promising prospect in the field of geometry processing. At present, the most popular implementation of PointNet is based on TensorFlow and it takes HDF5 as standard input format. It could be a bit confusing for people converting point clouds to HDF5 files and this article is about to tell you how to collect HDF5 datasets for PointNet learning.

PTS Data

We can download raw data from a certain 3D data repositories, for instance, the ShapeNetPart dataset. The data directly derived from those repositories is basically in the PTS file format, which is a set of unordered point coordinates with no headers or trailers. This actually makes things easier, as we can directly read the PTS file line by line and store the point cloud into an array lines. For example, before generating HDF5 datasets, we want that each point cloud has the same length. Thus, a simple subsampler can be applied to the PTS files. The following code snippet shows a random sampler subsampling the point cloud to 2048 points.

f = open(file, 'r+')
lines = [line.rstrip() for line in f]
slice = random.sample(lines, 2048)

PLY Data

PLY is a very famous file format that stores 3D data. It has headers to specify the variation and elements of the PLY file. Thus it could be a bit more complicated to deal with such data than PTS data. Luckily, we can find some ready-made tools to read PLY files, e.g., the plyfile, which is able to read the numerical data from the PLY file as a NumPy structured array. The installation of this tool is pretty easy, we can get it directly via pip.

pip install plyfile

For sure, prior to this, we should also have the NumPy installed.

pip install numpy

The deserialization and serialization of PLY file data are done through PlyData and PlyElement instances, so we have to first import them. Besides, the NumPy module also needs to be loaded.

from plyfile import PlyData, PlyElement
import numpy as np

Then we can start to read a PLY file. Concretely, the code is as follows.

plydata = + '.ply')
for i in range(0, plydata.elements[0].count):
data[i] = [plydata['vertex']['x'][i], plydata['vertex']['y'][i], plydata['vertex']['z'][i]]

Write HDF5 File

We use the h5py package as the interface to the HDF5 data format.

sudo apt-get install libhdf5-dev
pip install h5py

We first import this package.

import h5py

For creating an HDF5 file, we use the h5py.File() function to initialize it, which takes two arguments. The first argument provides the filename and location, the second the mode. We’re writing the file, so we provide a w for write access.

f = h5py.File('data.h5', 'w')

Then we need to define the shape and type of the data to write to the HDF5 file.

data = np.zeros((128, 2048, 3), dtype = np.uint8)

After filling data with the point clouds information read from the PTS or PLY files, we can write it to the HDF5 file f, using the create_dataset function associated with it, where we provide a name for the dataset, and the NumPy array.

f.create_dataset('data', data = data)

I have a very concrete example of providing data for PointNet in my GitHub repository IsaacGuan/PointNet-Plane-Detection. We can take the script here as a reference.

  1. 1. PTS Data
  2. 2. PLY Data
  3. 3. Write HDF5 File