AST-PAC Enhances Code MIA with AST Guidance
๐Ÿ“„#membership-inference#ast-perturbations#code-provenanceStalecollected in 18h

AST-PAC Enhances Code MIA with AST Guidance

PostLinkedIn
๐Ÿ“„Read original on ArXiv AI

๐Ÿ’กSyntax-aware MIA tool audits code LLMs' data usageโ€”key for compliance (62 chars)

โšก 30-Second TL;DR

What changed

Evaluates Loss and PAC MIAs on 3B-7B code models

Why it matters

AST-PAC advances auditing of unauthorized code usage in LLMs, supporting data governance and copyright compliance. It highlights gaps in current MIAs, pushing for domain-specific tools in code provenance.

What to do next

Implement AST-PAC perturbations in your MIA pipeline to test code LLM training data provenance.

Who should care:Researchers & Academics

Researchers introduce AST-PAC, a syntax-aware adaptation of PAC for membership inference attacks on code LLMs. It uses AST-based perturbations to create valid calibration samples, outperforming baselines on larger files but facing limits on small or alphanumeric-rich code. The work calls for syntax-adaptive methods to audit code model training data.

Key Points

  • 1.Evaluates Loss and PAC MIAs on 3B-7B code models
  • 2.PAC fails on complex code due to invalid syntax augmentations
  • 3.AST-PAC generates syntactically valid samples via AST perturbations
  • 4.AST-PAC improves on large syntactic files but underperforms on small/alphanumeric code

Impact Analysis

AST-PAC advances auditing of unauthorized code usage in LLMs, supporting data governance and copyright compliance. It highlights gaps in current MIAs, pushing for domain-specific tools in code provenance.

Technical Details

AST-PAC adapts PAC by perturbing Abstract Syntax Trees to ensure syntactic validity in calibration data. Tested on code models, it scales better with syntactic complexity than syntax-agnostic PAC. Limitations include under-mutation of small files and struggles with alphanumeric-heavy code.

๐Ÿ“ฐ

Weekly AI Recap

Read this week's curated digest of top AI events โ†’

๐Ÿ‘‰Read Next

AI-curated news aggregator. All content rights belong to original publishers.
Original source: ArXiv AI โ†—