#7 Decoding AI Authorship: Can LLMs Truly Mimic Human Style Across Literature and Politics?

Score: 26.5 | Matched keywords: ai, alignment, benchmark, large language models, llm, machine learning, transformer

Detailed Summary (EN)

Problem definition

Generative models powered by large language models (LLMs) have significantly impacted fields ranging from academic publishing to creative writing (Castillo 2025; Wu et al.
Their ability to produce text with human-like fluency has opened new opportunities for co-creation while simultaneously raising concerns regarding authorship, originality, and the potential for misuse in digital environments (Wu et al.
Beyond legal frameworks, AI-generated text also carries risks of deception, such as impersonating individuals or spreading misinformation, underscoring the need for critical evaluation of these technologies in creative domains (Mitchell et al.
From a technical perspective, state-of-the-art models such as GPT-4 and Gemini Pro demonstrate high proficiency in generating grammatically coherent and contextually relevant content.

Core idea & method

with strict thematic alignment, we generated synthetic corpora evaluated through a complementary framework combining transformer-based classification (BERT) and interpretable machine learning (XGBoost).
Our methodology integrates Linguistic Inquiry and Word Count (LIWC) markers, perplexity, and readability indices to assess the divergence between AI-generated and human-authored text.

Experimental setup & results

demonstrate that AI-generated mimicry remains highly detectable, with XGBoost models trained on a restricted set of eight stylometric features achieving accuracy comparable to high-dimensional neural classifiers.
Feature importance analyses identify perplexity as the primary discriminative metric, revealing a significant divergence in the stochastic regularity of AI outputs compared to the higher variability of human writing.
While LLMs exhibit distributional convergence with human authors on low-dimensional heuristic features, such as syntactic complexity and readability, they do not yet fully replicate the nuanced affective density and stylistic variance inherent in the human-authored corpus.
By isolating the specific statistical gaps in current generative mimicry, this study provides a comprehensive benchmark for LLM stylistic behavior and offers critical insights for authorship attribution in the digital humanities and social media.
Keywords: authorship detection, linguistic analysis, NLP, psycholinguistic analysis, Stylometry, LLMs 1.

Limitations & risks

This section reports the findings of our investigation into the extent to which large language models can reproduce stable, author-specific stylistic patterns beyond surface-level fluency.
We first analyze linguistic and psycholinguistic features, including LIWC dimensions, readability indices, and perplexity, to examine how AI-generated texts compare to human-authored baselines in terms of stylistic and distributional properties.
We then report the performance of predictive models across three complementary settings: (i) BERT classifiers trained on raw contextual embeddings, (ii) XGBoost classifiers trained on interpretable stylistic and psycholinguistic features, and (iii) XGBoost classifiers trained on TF–IDF representations to establish a vocabulary-based baseline.
Together, these analyses evaluate stylistic convergence and divergence across poetic and political domains.

Read-like-fullpaper digest

This paper addresses Generative models powered by large language models (LLMs) have significantly impacted fields ranging from academic publishing to creative writing (Castillo 2025; Wu et al. The core method is with strict thematic alignment, we generated synthetic corpora evaluated through a complementary framework combining transformer-based classification (BERT) and interpretable machine learning (XGBoost). Key empirical findings include demonstrate that AI-generated mimicry remains highly detectable, with XGBoost models trained on a restricted set of eight stylometric features achieving accuracy comparable to high-dimensional neural classifiers.

상세 요약 (KO)

문제 정의

LLM(대규모 언어 모델)을 기반으로 하는 생성 모델은 학술 출판에서 창의적 글쓰기에 이르기까지 다양한 분야에 큰 영향을 미쳤습니다(Castillo 2025; Wu et al.
인간과 같은 유창함으로 텍스트를 생성하는 능력은 공동 창작을 위한 새로운 기회를 열어주는 동시에 저작자, 독창성 및 디지털 환경에서의 오용 가능성에 대한 우려를 불러일으켰습니다(Wu et al.
법적 프레임워크를 넘어서, AI로 생성된 텍스트는 개인을 사칭하거나 잘못된 정보를 퍼뜨리는 등의 사기 위험도 수반하므로 창의적인 영역에서 이러한 기술에 대한 비판적 평가의 필요성이 강조됩니다(Mitchell et al.
기술적 관점에서 볼 때, GPT-4 및 Gemini Pro와 같은 최첨단 모델은 문법적으로 일관되고 문맥적으로 관련 있는 콘텐츠를 생성하는 데 있어 높은 숙련도를 보여줍니다.

핵심 아이디어/방법

엄격한 주제 얼라인먼트을 통해 BERT(변환기 기반 분류)와 XGBoost(해석 가능한 기계 학습)를 결합한 보완 프레임워크를 통해 평가된 합성 말뭉치를 생성했습니다.
우리의 방법론은 LIWC(Linguistic Inquiry and Word Count) 마커, 당혹감 및 가독성 지수를 통합하여 AI 생성 텍스트와 인간이 작성한 텍스트 간의 차이를 평가합니다.

실험 설정/결과

AI가 생성한 모방은 고차원 신경 분류기에 필적하는 정확도를 달성하는 제한된 8개의 문체 기능 세트에 대해 훈련된 XGBoost 모델을 통해 여전히 탐지 가능성이 높다는 것을 보여줍니다.
기능 중요도 분석은 당혹감을 주요 판별 지표로 식별하여 인간 글쓰기의 높은 가변성과 비교하여 AI 출력의 확률론적 규칙성에 상당한 차이가 있음을 나타냅니다.
LLM은 구문 복잡성 및 가독성과 같은 저차원 경험적 기능에 대해 인간 저자와의 분포 수렴을 나타내지만 인간이 작성한 코퍼스에 내재된 미묘한 정서적 밀도와 문체 변화를 아직 완전히 복제하지는 않습니다.
현재 생성 모방의 특정 통계적 격차를 분리함으로써 이 연구는 LLM 문체 행동에 대한 포괄적인 벤치마크를 제공하고 디지털 인문학 및 소셜 미디어의 저자 속성에 대한 중요한 통찰력을 제공합니다.
키워드: 저작자 탐지, 언어 분석, NLP, 심리언어 분석, 스타일로메트리, LLM 1.

한계/리스크

이 섹션에서는 대규모 언어 모델이 표면 수준의 유창함을 넘어 안정적이고 저자별 문체 패턴을 재현할 수 있는 정도에 대한 조사 결과를 보고합니다.
먼저 LIWC 차원, 가독성 지수, 복잡성을 포함한 언어적, 심리언어학적 특징을 분석하여 AI 생성 텍스트가 문체 및 배포 속성 측면에서 인간이 작성한 기준선과 어떻게 비교되는지 조사합니다.
그런 다음 (i) 원시 문맥 임베딩에 대해 훈련된 BERT 분류기, (ii) 해석 가능한 문체 및 심리언어적 특징에 대해 훈련된 XGBoost 분류기, (iii) 어휘 기반 기준선을 설정하기 위해 TF-IDF 표현에 대해 훈련된 XGBoost 분류기라는 세 가지 보완적인 설정에 걸쳐 예측 모델의 성능을 보고합니다.
함께, 이러한 분석은 시적, 정치적 영역에 걸친 문체 수렴과 발산을 평가합니다.

전체 논문 읽은 느낌 요약

이 문서에서는 LLM(대규모 언어 모델)을 기반으로 하는 생성 모델이 학술 출판에서 창의적 글쓰기에 이르기까지 다양한 분야에 큰 영향을 미쳤다는 점을 다룹니다(Castillo 2025; Wu et al. 핵심 방법은 엄격한 주제 얼라인먼트을 사용하는 것이며, 변환기 기반 분류(BERT)와 해석 가능한 기계 학습(XGBoost)을 결합한 보완 프레임워크를 통해 평가된 합성 말뭉치를 생성했습니다. 주요 경험적 연구 결과에는 AI 생성 모방이 제한된 세트에서 훈련된 XGBoost 모델을 통해 여전히 탐지 가능성이 높다는 것을 보여줍니다. 고차원 신경 분류기에 필적하는 정확도를 달성하는 8개의 문체 특징.