Methodology

The Challenge

The Travelling Salesman Problem (TSP) is a classic algorithmic challenge: given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the starting city?

In this benchmark, we use a portion of London's road network, with distances computed using Dijkstra's algorithm. The foundation models are called using Vercel AI Gateway and must determine the optimal visit order given these distance constraints.

How It Works

1. Graph Construction

We build a graph of the road network. Intersections become nodes, and roads become edges with weights corresponding to their length.

2. Pre-computation Phase

Before each race, we run Dijkstra's algorithm on the server to find the shortest road path between every pair of POIs. This creates a distance matrix and stores the actual node-by-node paths for route visualization.

3. The Foundation Model Task

The model receives only the distance matrix and POIs, and its job is to determine the visit order that minimizes the total distance. We prompt the model (with temperature set to 0) to use the Held-Karp algorithm, but because LLMs are probabilistic by nature, they don't always return the same route order—rather, they predict what the optimal solution might be based on patterns learned during training.

4. Route Visualization

Once the model returns the visit order, we combine it with the pre-computed Dijkstra paths to reconstruct the full route. The route is then animated on an OpenLayers map, drawing each segment in sequence to visualize the complete journey through London's road network.

Leaderboard Ranking

Models are ranked using a weighted scoring system that balances speed, route quality, and cost-efficiency:

Score = ⅓ × Fastest + ⅓ × Shortest + ⅓ × Cheapest

Fastest (33%)

Races where the model achieved the lowest response time.

Shortest (33%)

Races where the model found the shortest total route.

Cheapest (33%)

Races where the model had the lowest API cost.

When scores are tied, models are ranked by their total number of runs.