IB-bench
Can Large Language Models Replace Investment Banking Analysts?|
33 public tasks just launched!
Here's an early look at Opus 4.5 and ChatGPT 5.2
Leaderboard
2 models evaluated · 33 total tasks
| # | Model | Provider | ||||
|---|---|---|---|---|---|---|
| 1 | claude-opus-4-5-20251101 | Anthropic | 45.0 | 40.0 | 16.7 | 30.5 |
| 2 | gpt-5.2-2025-12-11 | OpenAI | 37.5 | 5.0 | 0.0 | 9.2 |
#1 Anthropic
30.5claude-opus-4-5-20251101
Easy
45.0 20/20Medium
40.0 10/10Hard
16.7 3/3#2 OpenAI
9.2gpt-5.2-2025-12-11
Easy
37.5 20/20Medium
5.0 10/10Hard
0.0 3/3Results are preliminary: IB-bench is in active development and eval results may change.
Scoring: Overall score is weighted 20% Easy, 35% Medium, 45% Hard.
Difficulty levels: Easy (<1 hour), Medium (few hours), Hard (>1 day) - based on time a human analyst would need.