IB-bench

Can Large Language Models Replace Investment Banking Analysts?|

Check out how Claude Opus 4.5 and ChatGPT 5.2 performed on 18 out of 36 public tasks below!

Leaderboard

6 models evaluated · 36 total tasks

Scoring: Overall score is weighted 20% Easy, 35% Medium, 45% Hard.

Difficulty levels: Easy (<1 hour), Medium (few hours), Hard (>1 day) — based on time a human analyst would need.

© 2026 IB-bench. All rights reserved.