IndicJR: Judge-Free Indic Jailbreak Benchmark

🔑 Enhanced Key Takeaways

•IndicJR is the first regional jailbreak benchmark combining multilingual adversarial coverage across 12 Indic-South Asian languages representing 2.1 billion speakers, addressing a critical gap in LLM safety evaluation beyond English-only assessments[1][2]
•The benchmark reveals that contract-bound JSON formats inflate refusal rates but fail to prevent jailbreaks, with LLaMA and Sarvam models exceeding 0.92 JSR in JSON while all models reach 1.0 jailbreak success in Free naturalistic tracks[2]
•English-to-Indic adversarial attacks transfer strongly across languages, with format wrappers (such as JSON/Free structural modifications) consistently outperforming instruction-based wrappers as attack vectors[2]
•Orthographic variations significantly impact model robustness: romanized or mixed-script inputs reduce JSR with correlations of 0.28-0.32 to romanization share and tokenization patterns, indicating systematic vulnerabilities in how models process non-native scripts[1][2]
•The benchmark employs fully automatic judge-free evaluation across 45,216 prompts with human audits confirming detector reliability, establishing a reproducible multilingual stress test that reveals safety risks hidden by English-centric evaluations, particularly relevant for South Asian users who frequently code-switch[2]

🛠️ Technical Deep Dive

Dataset Construction: 45,216 prompts across 12 Indic languages (2.1 billion speakers) with dual annotation protocol; 600 samples audited (50 per language) exported to CSV format for quality assurance[1]
Evaluation Tracks: Two parallel evaluation modes—JSON (contract-bound) and Free (naturalistic interaction styles)—revealing how structural constraints affect safety mechanisms[2]
Pressure Balance Metrics: Same-mode wrapper coverage ranges 0.875–1.000 with cross-mode coverage ≥0.705, demonstrating adversarial pressure without template cloning[1]
Orthography Coverage: Romanization averages 0.40–0.55 across languages (Urdu highest at 0.552); Gujarati exhibits lowest mean token length (123 tokens), reflecting compact orthography effects on tokenization[1]
Length Stabilization: Mean token counts controlled at 123–146 with p95≤317, ensuring consistent evaluation across linguistic variations[1]
Models Evaluated: Testing across 12 models including LLaMA, Sarvam 1 Base (0.980 JSR), and Qwen 1.5 7B (0.968 JSR)[1]
Sociolinguistic Analysis: Romanized/mixed inputs reduce absolute JSR by -0.338/-0.267; byte/character tokenization correlations (ρ≈-0.29 to -0.32) highlight systematic tokenization pressures affecting safety[1]
Deployment Finding: Hosted APIs demonstrate higher safety than local deployments; Indic language specialization alone does not ensure robustness[1]

🔮 Future ImplicationsAI analysis grounded in cited sources

IndicJR establishes a critical benchmark for evaluating LLM safety in underrepresented linguistic regions, with implications for: (1) Regulatory compliance in South Asian markets where multilingual LLM deployment is expanding; (2) Model development priorities—vendors must address orthographic and code-switching vulnerabilities rather than relying on English-centric safety mechanisms; (3) Enterprise risk assessment for organizations serving 2.1 billion Indic speakers, revealing that contract-based safety measures provide false confidence; (4) Research direction shift toward multilingual adversarial robustness as a core safety requirement rather than a localization afterthought; (5) Tokenization architecture redesign to handle script variations without degrading safety properties.

⏳ Timeline

2026-02

IndicJR benchmark submitted to arXiv (February 18, 2026) introducing judge-free evaluation of jailbreak robustness across 12 South Asian languages

IndicJR: Judge-Free Indic Jailbreak Benchmark

⚡ 30-Second TL;DR

🧠 Deep Insight

🔑 Enhanced Key Takeaways

🛠️ Technical Deep Dive

🔮 Future ImplicationsAI analysis grounded in cited sources

⏳ Timeline

📎 Sources (2)

👉Related Updates

Pramana Fine-Tunes LLMs with Navya-Nyaya

Noncommutativity in Sequential Metacognition

New Proximity Measure for Object Identification