3D Deep Learning and Its Applications

Breakthroughs in Convolutional Neural Networks – a type of deep learning generally applied to 2D images –  a few years ago took the AI world by storm and spurred the development of various machine vision applications such as self-driving cars, autonomous drones and state-of-the-art facial recognition systems. While CNNs have made immense progress in representing and understanding our world through images, their abilities are inherently limited by the nature of two-dimensional images: in particular their inability to incorporate three-dimensional depth information. Since then, there have been growing efforts to extend the capabilities of deep learning to 3D models which allow a more accurate representation of the three-dimensional physical world we experience.

Thanks in part to an estimated 30 million depth cameras that capture 3D images, the availability of large 3D datasets have facilitated rapid advancements in 3D deep learning applications. Frameworks such as  Kaolin allow a 3D model to be moved into the realm of neural networks in just a few steps on top of existing deep learning frameworks like PyTorch. Inspired by the recent advances in 3D deep learning, we at Trabeya have been excited to implement novel algorithms on these technologies to solve real-world business problems, including ones in agriculture, health care, and manufacturing.

Figure 1 : 3D deep learning functionalities provided by Nvidia’s Kaolin

3D object detection is the most common application of 3D deep learning. For instance, self driving cars utilize this technology to analyze both camera and Lidar data for 3D perception[1]. Breakthroughs in this application area have also extended to medicine, for instance for the identification of minor abnormal growth (initial stage tumors) from MRI scans which facilitates early diagnosis. Going a step beyond 2D computer vision, 3D object detection techniques are able to visualize, model and even predict the three-dimensional growth of cancerous tumors[3].

Where object detection is able to distinguish between a chair and a human, segmentation identifies the exact pixels, or voxels in the case of 3D, that represent a particular object. For instance, self-driving cars have to identify the specific voxels that represent a tree versus a human, and in the field of computer graphics, 3D part segmentation automatically identifies different parts of a 3D model, making it easy to rig a character for animation or to customize the model to generate variants of an object.

Figure 2 : 3D part segmentation and 3D object detection

Another major application of 3D Deep Learning is 3D reconstruction. Traditionally, modelling 3D objects has required 3D scanning techniques such as structure from motion and triangulation that imposes constraints on either the motion of the object, the camera or both[5]. Such camera setups can be both expensive and challenging in remote and hostile outdoor environments[4].

In contrast, deep learning based 3D reconstruction methods allow the reconstruction of 3D models only using 2D images, and in some cases a single 2D image, simplifying the process of modelling 3D objects.  For instance, thanks to these image-to-3D reconstructions you are now able to estimate your body weight using a 2D image from your mobile phone[8]. This demonstrates how 3D deep learning has removed the need for sophisticated depth measuring cameras and setups to build accurate 3D models.

Figure 3:  Mesh-RCNN[7] that converts a 2D image to a 3D model (left) and DeepHuman: 3D Human Reconstruction from a Single Image[8]

In instances where we already have 3D scanned models of objects, the resulting point clouds are usually incomplete or have associated noise. In such cases, deep learning based shape completion algorithms can be applied to reliably complete incomplete 3D models [6] 

Figure 4 : 3D shape completion using deep neural networks

These models could then be used for calculating metrics including surface areas, volumes, and masses that are unavailable in 2D images (and difficult to reliably derive) to drive deeper analysis.  In addition to the fitness application stated above, mass estimation has huge potential in entertainment, agriculture and even real estate[5] settings.

Another exciting application of 3D Deep Learning is that of 3D part generationgenerating novel 3D representations of objects. For example, 3dgan by MIT CSAIL[9] shows how a 3D Generative Adversarial Network (where a discriminator network aiming to tell real and fake representations apart vies with a generator network that generates fake representations from noise) could learn 3D representations of given objects such as chairs, tables to generate new designs for chairs and tables. Such models are likely to enhance the creative productivity of designers. These methods can also be expanded to a point where ‘intuitive AI’ can automatically suggest better designs. There are already commercial applications of this method in the design industry[10][11].

Figure 5 : Dreamcatcher’s purpose-built design synthesis methods that algorithmically generate designs of different types from 3D CAD models[11]


[1] J. Zhang, X. Zhao, Z. Chen, and Z. Lu, “A Review of Deep Learning-Based Semantic Segmentation for Point Cloud,” IEEE Access, vol. 7. Institute of Electrical and Electronics Engineers Inc., pp. 179118–179133, 2019, doi: 10.1109/ACCESS.2019.2958671.

[2] A. Zeng et al., “Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge,” in Proceedings – IEEE International Conference on Robotics and Automation, 2017, pp. 1386–1393, doi: 10.1109/ICRA.2017.7989165.

[3] S. Somasundaram and R. Gobinath, “Current Trends on Deep Learning Models for Brain Tumor Segmentation and Detection – A Review,” in Proceedings of the International Conference on Machine Learning, Big Data, Cloud and Parallel Computing: Trends, Prespectives and Prospects, COMITCon 2019, 2019, pp. 217–221, doi: 10.1109/COMITCon.2019.8862209.

[4] H. Q. Xie and H. H. Jia, “The development of 3D laser scanning technique and its application in land reclamation,” in 2010 2nd International Symposium on Information Engineering and Electronic Commerce, IEEC 2010, 2010, pp. 230–233, doi: 10.1109/IEEC.2010.5533250.

[5] “(PDF) 3D reconstruction of a scene from multiple 2D images.” [Online]. Available: https://www.researchgate.net/publication/322298308_3D_reconstruction_of_a_scene_from_multiple_2D_images. [Accessed: 11-Mar-2020].

[6] X. Han, Z. Li, H. Huang, E. Kalogerakis, and Y. Yu, “High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference.”

[7] G. Gkioxari, J. Malik, and J. Johnson, “Mesh R-CNN.”

[8] DeepHuman: 3D Human Reconstruction from a Single Image. [Online]. Available: 


[9] “3D Generative Adversarial Network.” [Online]. Available: http://3dgan.csail.mit.edu/. [Accessed: 11-Mar-2020].

[10] “The incredible inventions of intuitive AI.” [Online]. Available: https://ieet.org/index.php/IEET2/more/Conti20170222. [Accessed: 11-Mar-2020].

[11] Project Dreamcatcher [Online]. Available :  https://autodeskresearch.com/projects/dreamcatcher

Topics: AI Machine Learning