Representation Without Reward: A JEPA Audit for LLM Fine-Tuning

Abstract Snapshot

Compressed abstract

Main idea

Joint-embedding predictive architectures (JEPAs) propose that a model should learn more useful abstractions when trained to predict latent representations rather than observed outputs.

Method signal

For autoregressive language-model fine-tuning the principle entails a stricter requirement: the induced hidden-state geometry must reach the language-model head and improve the decoded task metric. We test that requirement under a fixed Llama-3.2-1 B-Instruct LoRA harness on natural-language-to-regex generation, comparing twenty-two training-time auxiliaries across trajectory-shape regularisation, distributional constraints, predictor/target asymmetry, Fisher-metric Jacobi residuals, and a decoder-visible JEPA objective constructed to lie in cross-entropy's positive cone.

Contribution signal

The empirical answer is a structured null: several auxiliaries clear single-cell paired = 0.10 without correction (T3-Local at = +2.53~pp, p = 0.003 being the strongest), but none survives Bonferroni or Holm--Bonferroni at the relevant family-wise threshold, even though many change curvature, anisotropy, variance, and gradient direction. Decoder-visible JEPA yields the first positive auxiliary--cross-entropy gradient cosine in the study, yet exact match remains inside seed noise; a full-fine-tuning replication of the same auxiliary at n = 5 seeds reproduces the null on both benchmarks (TURK: = +0.04~pp, p_{paired} = 0.96; SYNTH: = +0.52~pp, p_{paired} = 0.28), so the null is robust across LoRA and full fine-tuning for the decoder-visible construction.

Original Abstract

Joint-embedding predictive architectures (JEPAs) propose that a model should learn more useful abstractions when trained to predict latent representations rather than observed outputs. For autoregressive language-model fine-tuning the principle entails a stricter requirement: the induced hidden-state geometry must reach the language-model head and improve the decoded task metric. We test that requirement under a fixed Llama-3.2-1 B-Instruct LoRA harness on natural-language-to-regex generation, comparing twenty-two training-time auxiliaries across trajectory-shape regularisation, distributional constraints, predictor/target asymmetry, Fisher-metric Jacobi residuals, and a decoder-visible JEPA objective constructed to lie in cross-entropy's positive cone. The empirical answer is a structured null: several auxiliaries clear single-cell paired = 0.10 without correction (T3-Local at = +2.53~pp, p = 0.003 being the strongest), but none survives Bonferroni or Holm--Bonferroni at the relevant family-wise threshold, even though many change curvature, anisotropy, variance, and gradient direction. Decoder-visible JEPA yields the first positive auxiliary--cross-entropy gradient cosine in the study, yet exact match remains inside seed noise; a full-fine-tuning replication of the same auxiliary at n = 5 seeds reproduces the null on both benchmarks (TURK: = +0.04~pp, p_{paired} = 0.96; SYNTH: = +0.52~pp, p_{paired} = 0.28), so the null is robust across LoRA and full fine-tuning for the decoder-visible construction. Hidden-state representation work and decoded-task accuracy are therefore weakly coupled in this regime; we accordingly reframe LLM-domain JEPA evaluation as a coupling problem, in which the operative question is under which metrics useful hidden geometry becomes decoder-visible task signal.

#4 Representation Without Reward: A JEPA Audit for LLM Fine-Tuning

Abstract Snapshot

Compressed abstract

Main idea

Method signal

Contribution signal

Original Abstract