Claude Opus 4.5 vs ChatGPT 5.2: Performance on Hard Valuation Tasks
· Danial A.
The latest benchmark run of IB-Bench reveals fascinating differences in how Claude Opus 4.5 and ChatGPT 5.2 approach investment banking tasks. While both models excel at basic financial calculations, the hard tasks tell a different story.
Key Findings
Our evaluation across 36 tasks shows that model performance diverges significantly as task complexity increases. Here’s what we observed:
LBO Modeling
Both models struggled with our most complex LBO task—building a full leveraged buyout model from scratch given only a CIM and term sheet. Claude Opus 4.5 achieved a 72% score while ChatGPT 5.2 scored 68%.
The primary failure mode was consistent across both: sensitivity analysis tables were frequently miscalculated, and debt paydown schedules often contained circular reference errors.
DCF Analysis
DCF tasks revealed interesting patterns. Claude performed notably better on WACC calculations, particularly when dealing with:
- Unlevering and relevering beta
- Country risk premium adjustments
- Size premium considerations
ChatGPT, however, showed stronger performance on terminal value calculations and growth rate assumptions.
Document Parsing
Perhaps the most significant gap appeared in document parsing tasks. When asked to extract key metrics from 10-K filings, Claude’s structured approach led to 15% higher accuracy on average.
What This Means for Practitioners
These results suggest that neither model is a clear winner for all investment banking tasks. The optimal choice depends on your specific use case:
- For model building: Consider Claude for complex Excel work
- For document analysis: Claude’s parsing capabilities edge ahead
- For quick calculations: Both models perform comparably
Methodology Notes
All evaluations were conducted with default API parameters. Models were given identical prompts and materials. Scoring was performed using a combination of automated verification and expert human review.
Stay tuned for our next analysis, where we’ll dive into how these models handle M&A transaction analysis.