CRUST-bench is a benchmark that measures the performance on the C-to-Rust translation task.
Please see our blog post for a more detailed description.
| Model | Pass@1 | Compiler repair | Test repair | |||
|---|---|---|---|---|---|---|
| Build | Test | Build | Test | Build | Test | |
| gpt-5 (high) | 48 | 26 | 92 | 43 | 85 | 70 |
| claude-opus-4-20250514 | 43 | 22 | 78 | 29 | 65 | 40 |
| o3-2025-04-16 | 35 | 19 | 68 | 31 | 63 | 48 |
| o1-preview-2024-09-12 | 32 | 15 | 69 | 28 | 54 | 37 |
| claude-3.7-sonnet-20250219 | 26 | 13 | 54 | 23 | 49 | 32 |
| claude-3.5-sonnet-20240620 | 26 | 11 | 49 | 21 | 38 | 24 |
| o1-mini-2024-09-12 | 19 | 9 | 47 | 16 | 27 | 21 |
| gpt-4o | 18 | 7 | 52 | 18 | 42 | 22 |
| gemini-1.5-pro | 11 | 3 | 35 | 11 | 30 | 14 |
| arcee-ai/Virtuoso-Medium-v2 | 2 | 2 | 21 | 6 | 10 | 6 |
| Qwen/Qwen-Coder-32B | 0 | 0 | 0 | 0 | 1 | 0 |
| DeepSeek/DeepSeek-Coder-33B | 1 | 0 | 2 | 0 | 1 | 0 |
| Qwen/QwQ-32B-Preview | 1 | 0 | 1 | 0 | 1 | 0 |
| Adapted SWE-agent (claude-3-7-sonnet-20250219) | 41 | 32 | – | – | – | – |