Abstract
Vector graphics are essential in design, providing artists with a versatilemedium for creating resolution-independent and highly editable visual content.Recent advancements in vision-language and diffusion models have fueledinterest in text-to-vector graphics generation. However, existing approachesoften suffer from over-parameterized outputs or treat the layered structure - acore feature of vector graphics - as a secondary goal, diminishing theirpractical use. Recognizing the importance of layered SVG representations, wepropose NeuralSVG, an implicit neural representation for generating vectorgraphics from text prompts. Inspired by Neural Radiance Fields (NeRFs),NeuralSVG encodes the entire scene into the weights of a small MLP network,optimized using Score Distillation Sampling (SDS). To encourage a layeredstructure in the generated SVG, we introduce a dropout-based regularizationtechnique that strengthens the standalone meaning of each shape. Weadditionally demonstrate that utilizing a neural representation provides anadded benefit of inference-time control, enabling users to dynamically adaptthe generated SVG based on user-provided inputs, all with a single learnedrepresentation. Through extensive qualitative and quantitative evaluations, wedemonstrate that NeuralSVG outperforms existing methods in generatingstructured and flexible SVG.