Being a novel deep net architecture invariant towards input order, PointNet is able to consume unordered point clouds directly and thus has a promising prospect in the field of geometry processing. At present, the most popular implementation of PointNet is based on TensorFlow which takes HDF5 as standard input format. It could be a bit confusing for people converting point clouds to HDF5 files and this article is about to tell you how to collect HDF5 datasets for PointNet learning.
We can download raw data from a certain 3D data repositories, for instance, the ShapeNetPart dataset. The data directly derived from those repositories is basically in the PTS file format, which is a set of unordered point coordinates with no headers or trailers. This actually makes things easier, as we can directly read the PTS file line by line and store the point cloud into an array
lines. For example, before generating HDF5 datasets, we want that each point cloud has the same length. Thus, a simple subsampler can be applied on the PTS files. The following code snippet shows a random sampler subsampling the point cloud to 2048 points.
PLY is a very famous file format that stores 3D data. It has headers to specify the variation and elements of the PLY file. Thus it could be a bit more complicated to deal with such data than PTS data. Luckily, we can find some ready made tools to read PLY files, e.g., the plyfile, which is able to read the numerical data from the PLY file as a NumPy structured array. The installation of this tool is pretty easy, we can get it directly via pip.
For sure, prior to this, we should also have the NumPy installed.
The deserialization and serialization of PLY file data is done through
PlyElement instances, so we have to first import them. Besides, the NumPy module is also need to be loaded.
Then we can start to read a PLY file. Concretely, it looks like this.
We use the h5py package as the interface to the HDF5 data format.
We first import this package.
For creating a HDF5 file, we use the
h5py.File function to initialize it, which takes two arguments. The first argument provides the filename and location, the second the mode. We’re writing the file, so we provide a
w for write access.
Then we need to define the shape and type of the data to write to the HDF5 file.
data with the point clouds information read from the PTS or PLY files, we can write it to the HDF5 file
f, using the
create_dataset function associated to it, where we provide a name for the dataset, and the NumPy array.