Tasks
IB-bench evaluates LLMs on real-world investment banking tasks.
Overview
Most finance benchmarks test financial concepts and CFA-style trivia - useful for measuring knowledge, but not reflective of real investment banking. IB-bench tests the tasks a junior analyst would actually encounter in their day-to-day work.
Task materials are derived from real work created by industry professionals or synthetic data generated under supervision by IB practitioners. 33 tasks across three difficulty levels, weighted to emphasize harder, more complex work.
Examples
Public Set
Explore the dataset by difficulty tier. Drill down into individual tasks for performance analysis, or inspect the source code on GitHub.
Tasks an analyst would require less than 1 hour to complete.
Tasks an analyst would require a few hours, but less than a day, to complete.
Tasks an analyst would require more than 1 day to complete.
Scoring
Each task is scored 0-100 by verified ground truth, LLM judge, human judge, or hybrid. Credit is awarded based on the score:
Difficulty Score
Easy = (credits earned / total credits possible) × 100
Same formula applies to Medium and Hard.
Overall Score
Overall = (0.2 × Easy) + (0.35 × Medium) + (0.45 × Hard)
Limitations
- Results reflect isolated task performance, not end-to-end workflows
- Some tasks were blocked by API providers leading to instant failure due to over-refusal
- v1 of IB-bench does not include slide-building workflows
Get in Touch
Interested in private evaluations or training data? Want your model benchmarked on IB-bench? Reach out on X or GitHub.
Check out the repository for full details on tasks, prompts, and scoring rubrics. If you find IB-bench useful, consider giving us a star!