#10 KARMA: Knowledge-Action Regularized Multimodal Alignment for Personalized Search at Taobao

Score: 24.8 | Matched keywords: alignment, fine-tuning, large language models, llm, multimodal, token

Detailed Summary (EN)

Problem definition

In personalized search systems, ranking models are predominantly optimized using post-hoc user feedback(e.g.
This optimization paradigm inevitably privileges post-hoc memorization features (item IDs, co-occurrence statistics, etc.
This naturally motivates leveraging the rich, pre-trained semantic knowledge of LLMs to compensate for the system’s generalization capabilities[6, 9, 11, 15].
Although integrating LLMs is intuitively appealing, directly finetuning LLMs for personalized ranking often yields limited performance gains.

Core idea & method

that treats semantic reconstruction as a train-only regularizer.
KARMA optimizes a nextinterest embedding for retrieval (Action) while enforcing semantic decodability (Knowledge) through two complementary objectives: (i) history-conditioned semantic generation, which anchors optimization to the LLM’s native next-token distribution, and (ii) embedding-conditioned semantic reconstruction, which constrains the interest embedding to remain semantically recoverable.
On Taobao search system, KARMA mitigates semantic collapse (attention-sink analysis) and improves both action metrics and semantic fidelity.
With KARMA, we achieve +0.25 CTR AUC in ranking, +1.86 HR in pre-ranking and +2.51 HR in recalling.
Deployed online Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.

Experimental setup & results

We attribute this bottleneck to a critical Knowledge–Action Gap: the inherent conflict between preserving pre-trained semantic knowledge and aligning with specific personalized actions by discriminative objectives.
Empirically, action-only training objectives induce Semantic Collapse, such as attention “sinks”.
This degradation severely cripples the LLM’s generalization, failing to bring improvements to personalized search systems.
We propose KARMA (Knowledge–Action Regularized Multimodal Alignment), a unified framework that treats semantic reconstruction as a train-only regularizer.
KARMA optimizes a nextinterest embedding for retrieval (Action) while enforcing semantic decodability (Knowledge) through two complementary objectives: (i) history-conditioned semantic generation, which anchors optimization to the LLM’s native next-token distribution, and (ii) embedding-conditioned semantic reconstruction, which constrains the interest embedding to remain semantically recoverable.

Limitations & risks

However, in practice we find that directly fine-tuning LLMs on industrial personalized tasks (e.g.
We attribute this bottleneck to a critical Knowledge–Action Gap: the inherent conflict between preserving pre-trained semantic knowledge and aligning with specific personalized actions by discriminative objectives.
Empirically, action-only training objectives induce Semantic Collapse, such as attention “sinks”.
This degradation severely cripples the LLM’s generalization, failing to bring improvements to personalized search systems.

Read-like-fullpaper digest

This paper addresses In personalized search systems, ranking models are predominantly optimized using post-hoc user feedback(e.g. The core method is that treats semantic reconstruction as a train-only regularizer. Key empirical findings include We attribute this bottleneck to a critical Knowledge–Action Gap: the inherent conflict between preserving pre-trained semantic knowledge and aligning with specific personalized actions by discriminative objectives.

상세 요약 (KO)

문제 정의

개인화된 검색 시스템에서 순위 모델은 주로 사후 사용자 피드백(예:
이 최적화 패러다임은 필연적으로 사후 암기 기능(항목 ID, 동시 발생 통계 등)에 특권을 부여합니다.
이는 자연스럽게 시스템의 일반화 기능을 보상하기 위해 LLM의 풍부하고 사전 훈련된 의미 지식을 활용하는 동기를 부여합니다[6, 9, 11, 15].
LLM 통합은 직관적으로 매력적이지만 개인화된 순위를 위해 LLM을 직접 미세 조정하면 성능 향상이 제한적인 경우가 많습니다.

핵심 아이디어/방법

의미론적 재구성을 열차 전용 정규화 도구로 취급합니다.
KARMA는 두 가지 보완적인 목표를 통해 의미론적 디코딩 가능성(지식)을 적용하는 동시에 검색(작업)을 위한 nextinterest 임베딩을 최적화합니다. (i) LLM의 기본 다음 토큰 배포에 최적화를 고정하는 기록 조건화된 의미론적 생성과 (ii) 의미론적으로 복구 가능한 상태로 유지되도록 관심 임베딩을 제한하는 임베딩 조건부 의미론적 재구성입니다.
Taobao 검색 시스템에서 KARMA는 의미론적 붕괴(주의 집중 분석)를 완화하고 행동 지표와 의미론적 충실도를 모두 향상시킵니다.
KARMA를 통해 순위에서 +0.25 CTR AUC, 사전 순위에서 +1.86 HR, 리콜에서 +2.51 HR을 달성했습니다.
온라인 배포 이 저작물의 전부 또는 일부를 개인적 또는 수업용으로 디지털 또는 하드 카피로 만들 수 있는 권한은 사본이 영리 또는 상업적 이익을 위해 제작 또는 배포되지 않고 사본의 첫 페이지에 이 공지 및 전체 인용이 표시되는 경우 무료로 부여됩니다.

실험 설정/결과

우리는 이러한 병목 현상을 중요한 지식-행동 격차, 즉 사전 훈련된 의미론적 지식을 보존하는 것과 차별적 목표에 따라 특정 개인화된 행동에 맞추는 것 사이의 본질적인 갈등에 기인한다고 생각합니다.
경험적으로, 행동 전용 훈련 목표는 주의 "싱크"와 같은 의미론적 붕괴를 유발합니다.
이러한 저하로 인해 LLM의 일반화가 심각하게 손상되어 개인화된 검색 시스템을 개선하지 못합니다.
우리는 의미론적 재구성을 열차 전용 정규화 도구로 처리하는 통합 프레임워크인 KARMA(Knowledge-Action Regularized Multimodal Alignment)를 제안합니다.
KARMA는 두 가지 보완적인 목표를 통해 의미론적 디코딩 가능성(지식)을 적용하는 동시에 검색(작업)을 위한 nextinterest 임베딩을 최적화합니다. (i) LLM의 기본 다음 토큰 배포에 최적화를 고정하는 기록 조건화된 의미론적 생성과 (ii) 의미론적으로 복구 가능한 상태로 유지되도록 관심 임베딩을 제한하는 임베딩 조건부 의미론적 재구성입니다.

한계/리스크

그러나 실제로 우리는 산업 맞춤형 작업(예:
우리는 이러한 병목 현상을 중요한 지식-행동 격차, 즉 사전 훈련된 의미론적 지식을 보존하는 것과 차별적 목표에 따라 특정 개인화된 행동에 맞추는 것 사이의 본질적인 갈등에 기인한다고 생각합니다.
경험적으로, 행동 전용 훈련 목표는 주의 "싱크"와 같은 의미론적 붕괴를 유발합니다.
이러한 저하로 인해 LLM의 일반화가 심각하게 손상되어 개인화된 검색 시스템을 개선하지 못합니다.

전체 논문 읽은 느낌 요약

이 논문은 개인화된 검색 시스템에서 순위 모델은 주로 사후 사용자 피드백을 사용하여 최적화됩니다(예: 핵심 방법은 의미 체계 재구성을 열차 전용 정규화 도구로 처리하는 것입니다. 주요 경험적 연구 결과는 다음을 포함합니다. 이러한 병목 현상은 사전 훈련된 의미 체계 지식을 보존하는 것과 차별적 목표에 따라 특정 개인화된 행동과 얼라인먼트하는 것 사이의 내재된 충돌인 중요한 지식-행동 격차에 기인합니다.