Hidden Prompts Expose AI-Generated Peer Reviews: Turning Vulnerabilities into Verification Tools
Share this article
The explosive growth of Large Language Models (LLMs) like ChatGPT in academic publishing has introduced a dangerous paradox: while these tools accelerate research dissemination, they also enable sophisticated fraud that threatens scientific credibility. A groundbreaking preprint paper titled "ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected" exposes how attackers can exploit LLM-powered peer review systems—and proposes a clever defensive strategy that repurposes the same vulnerability for detection.
The Vulnerability: Jailbreaking Peer Review with Stealth Prompts
Researchers Kanchon Gharami, Sanjiv Kumar Sarkar, Yongxin Liu, and Shafika Showkat Moni detail how authors can embed hidden prompts within PDF submissions. These prompts "jailbreak" LLM reviewers, coercing them into generating artificially positive assessments that overlook flaws or fabrications. The consequences are severe: biased acceptances could propagate erroneous research into critical domains like medicine or infrastructure, wasting resources and endangering lives.
"Papers written or reviewed by LLMs may lack real novelty, contain fabricated or biased results, or mislead downstream research that others depend on," the authors warn, highlighting how this undermines the entire scientific ecosystem.
The Defense: Inject-and-Detect as Cryptographic Countermeasure
The team proposes flipping the script with an "inject-and-detect" framework. Editors embed invisible trigger prompts into submitted manuscripts. If a reviewer’s response echoes or reacts to these triggers, it signals LLM involvement—unlike human reviewers who wouldn’t process such hidden cues. This method transforms prompt injection from an attack vector into a verification tool, requiring no specialized AI-detection software.
Crucially, the approach includes ethical safeguards:
- Triggers must be non-disruptive to human readers
- Detection must preserve reviewer anonymity
- Positive detections trigger human re-evaluation, not automatic rejection
Why This Matters Beyond Academia
This research illuminates broader cybersecurity principles:
1. Supply Chain Risks: Compromised peer review creates poisoned datasets for future AI training.
2. Adversarial Adaptation: Defenses must evolve as rapidly as attacks in the LLM arms race.
3. Trust Architecture: Scientific publishing relies on fragile authentication—human expertise—now easily spoofed.
As LLMs become embedded in technical workflows, from code review to compliance audits, this study underscores an urgent truth: security in AI-assisted systems demands proactive, context-aware countermeasures rather than reactive patching. The inject-and-detect paradigm offers a template for turning adversarial weaknesses into defensive strengths across multiple domains.
Source: Gharami, K., Sarkar, S. K., Liu, Y., & Moni, S. S. (2025). ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected. arXiv preprint arXiv:2512.20405.