POINT-E neural network that generates 3D models from text descriptions is unveiled

POINT-E neural network that generates 3D models from text descriptions is unveiled

January 03, 2023

OpenAI, already famous for its DALL-E generator, capable of generating images from text descriptions, has released a revolutionary new product. The company announced its latest development, POINT-E, ready to create 3D figures from a "cloud" of points - also using text descriptions. While existing systems like Google DreamFusion take hours and huge GPU resources for each attempt, POINT-E needs minimum hardware resources and a minute or two of time.

It is quite possible that soon the world will see not only opposition from ordinary authors of digital pictures, but also artists engaged in 3D modeling, which is used literally everywhere in the media sphere today. CGI effects are used in today's movies, video games, AR and VR, and even in the creation of maps of lunar craters by organizations like NASA. The technology is actively used by Google, literally the entire concept of Meta* meta universe is built on the use of 3D graphics. However, the creation of three-dimensional images is still a very resource-intensive and time-consuming process, despite the attempts of companies like NVIDIA and Epic Games to speed up the development of the industry.

Recently, image generators based on text descriptions have become very popular: OpenAI's DALL-E 2 and Craiyon, DeepAI, Prisma Labs' Lensa or HuggingFace's Stable Diffusion. Converting text to 3D is a promising offshoot of such developments.

According to OpenAI, creatinge a three-dimensional object from the text description first creates a text-based image, and then based on it creates a 3D point cloud.

It all happens in seconds and doesn't require expensive procedures. For example, if you type in the description "a cat eating a burrito," POINT-E will first generate a synthetic 3D rendering of the cat and then combine a series of models to create a 3D object, first with 1024 points and then with 4096. The object itself is not created directly from the description.

The neural network is trained to create 3D objects based on the analysis of "millions" of three-dimensional images. The developers state that although the quality of the finished work is inferior to some competing technologies, samples can be created very quickly.