Benchmarks for AI Models and Agents on CAD Tasks

gNucleus-CAD-Bench is a comprehensive collection of benchmarks to benchmark CAD models and AI agents on CAD design and 3D modeling tasks.

Gear box CAD assembly
RankNameGeometryParametricSpec ConsistencyOverall
1
Claude Opus 4.789.286.587.087.6
2
GPT-5.3-Codex86.884.283.585.0
3
Claude Opus 4.582.580.479.880.9
4
Claude Opus 4.682.180.579.680.8
5
Gemini 3.1 Pro81.980.879.080.6
6
MiniMax M2.581.079.879.380.2
7
GPT-5.280.780.178.580.0
8
Qwen3.6 Plus80.078.477.578.8
9
GLM-578.877.676.977.8
10
Muse Spark78.577.076.577.4

Scores last refreshed May 12, 2024. Higher is better.

Benchmarks Tasks

3D Parametric Part Generation

3D Parametric Part Generation

Generate editable, parametric 3D CAD part models from natural-language prompts and reference inputs.

Assembly Generation

Assembly Generation

Generate multi-part assemblies with proper mates, constraints, and component hierarchy.

Complex CAD workflow

Complex CAD workflow

Multi-step CAD workflows that generate, iteratively edit, and verify designs until the model meets the target spec.

Evaluation Methods

Geometry Accuracy

Measures how closely the generated geometry matches the reference part.

Constraint & Assembly Correctness

Checks constraint satisfaction, mating validity, and assembly stability.

Parametric Correctness

Verifies CAD model consistency with the spec parameters.

Topology / Structure

Evaluates topological validity and part structure correctness.

Agent Workflow Success

Assesses task completion rate and workflow correctness.

Efficiency

Measures token usage, execution time, and resource efficiency.

Latest Updates

May 12, 2024

Benchmark v1.2 release

Added 12 new assembly tasks and refreshed leaderboard.


May 5, 2024

New evaluation tool: TopoCheck

Improved topology evaluation now supports curved surfaces.


Apr 14, 2024

Open contributions

External submissions are now accepted via the GitHub repo.

gNucleus AI
Product

© 2025 gNucleus AI. All rights reserved.