IB-bench
Can Large Language Models Replace Investment Banking Analysts?|
33 public tasks just launched!
Here's an early look at Opus 4.5 and ChatGPT 5.2
Leaderboard
2 models evaluated · 33 total tasks
| # | Model | Provider | ||||
|---|---|---|---|---|---|---|
| 1 | claude-opus-4-5-20251101 | Anthropic | 50.0 | 43.8 | 0.0 | 46.0 |
| 2 | gpt-5.2-2025-12-11 | OpenAI | 35.0 | 0.0 | 0.0 | 12.2 |
#1 Anthropic
46.0claude-opus-4-5-20251101
Easy
50.0 10/20Medium
43.8 8/10Hard
0.0 0/3#2 OpenAI
12.2gpt-5.2-2025-12-11
Easy
35.0 10/20Medium
0.0 5/10Hard
0.0 0/3Scoring: Overall score is weighted 20% Easy, 35% Medium, 45% Hard.
Difficulty levels: Easy (<1 hour), Medium (few hours), Hard (>1 day) - based on time a human analyst would need.