Gsgen: Text-to-3D using Gaussian Splatting

Gsgen generate multi-view consistent and highly detailed 3D assets using Gaussian Splatting and 3D geometrical prior.

Abstract

We present Gaussian Splatting based text-to-3D GENeration (Gsgen), the first approach that generates multi-view consistent and delicate 3D assets using 3D Gaussian Splatting.

Previous methods suffer from inaccurate geometry and limited fidelity due to the absence of 3D prior and proper representation. We leverage 3D Gaussian Splatting, a recent state-of-the-art representation, to address existing shortcomings by exploiting the explicit nature that enables the incorporation of 3D prior. Specifically, our method adopts a progressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under a 3D geometry prior along with the ordinary 2D SDS loss, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative refinement to enrich details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity.With these designs, our approach can generate 3D content with delicate details and more accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components.

Video

Text-to-3D Results

GSGEN can generate 3D assets with accurate geometry and delicate details. (More videos are coming soon.).

Click the captions to watch in the assets in a WebGL based viewer. > 40 FPS on my Mac OS with M1 pro chip.

[...] a furry corgi

[...] car made of sushi

A plate of delicious tacos

An airplane made out of wood

[...] a pineapple

A ficus planted in a pot

[...] a plush toy dragon

A completely destroyed car

[...] a blue tulip

A bunch of blue rose, highly detailed

[...] a durian

A vase with sunflowers

Related Research or Resources

Our approach is based on the following research or resources.

Stable Dreamfusion introduces an idea similar to our windowed position encoding for coarse-to-fine optimization.

threestudio provides a great baseline for text-to-3D generation.

3D Gaussian Splatting the pioneering work achieves superior performance and enables real-time rendering.

splat a WebGL based gaussian renderer.