GenCAD: Turning Images into Editable CAD Programs

GenCAD introduces an image‑conditional pipeline that produces full parametric CAD command histories instead of coarse meshes, promising higher fidelity and downstream editability for engineering workflows.

The problem GenCAD tackles

Traditional AI approaches to 3‑D generation treat geometry as a collection of points, voxels, or triangle meshes. Those formats are easy to source and train on, but they strip away the information that makes a CAD model truly useful: a structured, parameter‑driven description of faces, edges, and construction steps. Without that structure, downstream tasks such as tolerance analysis, feature‑based modifications, or direct export to a manufacturing pipeline become cumbersome or impossible.

Boundary‑representation (B‑rep) models, the backbone of most professional CAD systems, encode each solid as a graph of faces, edges, and vertices together with a sequence of construction commands (extrude, fillet, pattern, etc.). The richness of that representation is what engineers rely on, yet its complexity has kept most deep‑learning research away from it. As a result, most generative tools output only a visual approximation, forcing designers to rebuild the model from scratch if they need to tweak dimensions or add features.

How GenCAD works

GenCAD proposes a four‑stage architecture that bridges the gap between raw images and full CAD programs:

Autoregressive transformer encoder – The system first learns a latent encoding of existing CAD command sequences. By treating a CAD program as a token stream (much like source code), the transformer captures long‑range dependencies between operations.
Contrastive multimodal alignment – A contrastive learning head aligns the latent space of CAD command sequences with that of rendered CAD images. This joint embedding lets the model understand how visual cues map to specific construction steps.
Latent diffusion conditioned on images – Using a diffusion process in the latent domain, GenCAD samples a latent representation of a CAD command sequence that matches the input image. Diffusion provides a controllable way to explore the space of plausible designs while respecting the visual constraints.
Command decoder – Finally, a decoder translates the sampled latent back into a concrete series of parametric commands. The output can be fed directly into a geometry kernel (e.g., OpenCASCADE) to reconstruct a B‑rep solid.

The key novelty is the full command history, not just the final solid. Because the output is a true CAD program, designers can open it in any standard CAD package, adjust dimensions, add constraints, or run simulations without re‑modeling.

Why the approach matters

Precision – Mesh‑based generators often introduce surface artifacts and lose exact dimensions. GenCAD’s parametric output preserves the exact geometry defined by the original commands.
Editability – Engineers can modify a generated design by editing the command list, something impossible with a static mesh.
Design space exploration – By sampling different latents conditioned on the same image, the diffusion step can propose multiple viable design variations, each fully editable.
Data efficiency – The contrastive alignment means the model can learn from relatively small paired datasets of CAD images and command logs, avoiding the need for massive synthetic mesh collections.

Early results and next steps

In the authors’ benchmark, GenCAD reproduced over 85 % of the original command sequence for a test set of mechanical parts, compared with less than 30 % for a mesh‑to‑CAD baseline. Visual fidelity measured by Chamfer distance improved by roughly 40 %.

The research team plans to extend the system in three directions:

Broader part categories – Adding support for sheet‑metal and free‑form surfacing commands.
Interactive conditioning – Allowing users to sketch or annotate an image to guide the generation of specific features.
Open‑source tooling – Publishing the encoder, diffusion, and decoder modules, along with a conversion pipeline to popular kernels such as OpenCASCADE and Siemens NX.

Where to learn more

The pre‑print describing GenCAD is available on arXiv, and the authors have opened a prototype repository on GitHub that includes the transformer encoder, contrastive head, and diffusion sampler. Interested readers can explore the code and try the demo here:

GenCAD does not claim to replace seasoned CAD engineers, but it does provide a concrete step toward AI‑assisted, editable design generation. If the community can build on the open tools and expand the training corpus, we may soon see workflows where a quick photograph of a component yields a fully parametric model ready for analysis and production.