Methodology
The Challenge
The Travelling Salesman Problem (TSP) is a classic algorithmic challenge: given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the starting city?
In this benchmark, we use a portion of London's road network, with distances computed using Dijkstra's algorithm. The foundation models are called using Vercel AI Gateway and must determine the optimal visit order given these distance constraints.
How It Works
1. Graph Construction
We build a graph of the road network. Intersections become nodes, and roads become edges with weights corresponding to their length.
2. Pre-computation Phase
Before each race, we run Dijkstra's algorithm on the server to find the shortest road path between every pair of POIs. This creates a distance matrix and stores the actual node-by-node paths for route visualization.
3. The Foundation Model Task
The model receives only the distance matrix and POIs, and its job is to determine the visit order that minimizes the total distance. We prompt the model (with temperature set to 0) to use the Held-Karp algorithm, but because LLMs are probabilistic by nature, they don't always return the same route order—rather, they predict what the optimal solution might be based on patterns learned during training.
4. Route Visualization
Once the model returns the visit order, we combine it with the pre-computed Dijkstra paths to reconstruct the full route. The route is then animated on an OpenLayers map, drawing each segment in sequence to visualize the complete journey through London's road network.
Leaderboard Ranking
Models are ranked using a weighted scoring system that balances speed, route quality, and cost-efficiency:
Fastest (33%)
Races where the model achieved the lowest response time.
Shortest (33%)
Races where the model found the shortest total route.
Cheapest (33%)
Races where the model had the lowest API cost.
When scores are tied, models are ranked by their total number of runs.