Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

Authors: 

Mazal Bethany, The University of Texas at San Antonio and Secure AI and Autonomy Lab; Brandon Wherry, The University of Texas at San Antonio, Secure AI and Autonomy Lab, and Peraton Labs; Emet Bethany, The University of Texas at San Antonio and Secure AI and Autonomy Lab; Nishant Vishwamitra and Anthony Rios, The University of Texas at San Antonio; Peyman Najafirad, The University of Texas at San Antonio and Secure AI and Autonomy Lab

Abstract: 

With the recent proliferation of Large Language Models (LLMs), there has been an increasing demand for tools to detect machine-generated text. The effective detection of machine-generated text face two pertinent problems: First, they are severely limited in generalizing against real-world scenarios, where machine-generated text is produced by a variety of generators and spans diverse domains. Second, existing detection methodologies treat texts produced by LLMs through a restrictive binary classification lens, neglecting the nuanced diversity of artifacts generated by different LLMs, each of which exhibits distinctive stylistic and structural elements. In this work, we undertake a systematic study on the detection of machine-generated text in real-world scenarios. We first study the effectiveness of state-of-the-art approaches and find that they are severely limited against text produced by diverse generators and domains in the real world. Furthermore, t-SNE visualizations of the embeddings from a pretrained LLM's encoder show that they cannot reliably distinguish between human and machine-generated text. Based on our findings, we introduce a novel system, T5LLMCipher, for detecting machine-generated text using a pretrained T5 encoder combined with LLM embedding sub-clustering to address the text produced by diverse generators and domains in the real world. We evaluate our approach across 9 machine-generated text systems and 9 domains and find that our approach provides state-of-the-art generalization ability, with an average increase in F1 score on machine-generated text of 11.9% on unseen generators and domains compared to the top performing supervised learning approaches and correctly attributes the generator of text with an accuracy of 93.6%. We make the code for our proposed approach publicly available at https: //github.com/SecureAIAutonomyLab/LLM-Cipher

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {299561,
author = {Mazal Bethany and Brandon Wherry and Emet Bethany and Nishant Vishwamitra and Anthony Rios and Peyman Najafirad},
title = {Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. {Machine-Generated} Text},
booktitle = {33rd USENIX Security Symposium (USENIX Security 24)},
year = {2024},
isbn = {978-1-939133-44-1},
address = {Philadelphia, PA},
pages = {5805--5822},
url = {https://www.usenix.org/conference/usenixsecurity24/presentation/bethany},
publisher = {USENIX Association},
month = aug
}