PlanarBench: Evaluating LLM Spatial Reasoning via Planar Graph Drawing

Abstract Snapshot

Compressed abstract

Main idea

Method signal

We evaluate 91 models on the 199 simplest non-isomorphic connected planar graphs (2 - 7 vertices). Edge count is the dominant difficulty predictor (r = -0.85) -- a finding not reported in prior LLM graph benchmarks, which use only node count as the difficulty axis.

Contribution signal

Contribution details should be verified from the main paper.

Original Abstract

PlanarBench tests whether LLMs can draw planar graphs as ASCII art given only an edge list -- a spatial reasoning task that resists memorization because edge order, edge orientation, and node labels are all permutable. We evaluate 91 models on the 199 simplest non-isomorphic connected planar graphs (2 - 7 vertices). Edge count is the dominant difficulty predictor (r = -0.85) -- a finding not reported in prior LLM graph benchmarks, which use only node count as the difficulty axis.

#9 PlanarBench: Evaluating LLM Spatial Reasoning via Planar Graph Drawing

Abstract Snapshot

Compressed abstract

Main idea

Method signal

Contribution signal

Original Abstract