Text2GQL-Bench: New Graph Query Benchmark

Post LinkedIn

📄Read original on ArXiv AI

⚡ 30-Second TL;DR

What changed

178k pairs spanning 13 domains and multiple GQLs

Why it matters

Addresses gaps in domain coverage and evaluation for Text-to-GQL systems, enabling systematic model comparisons. Highlights dialect challenges in graph queries, spurring LLM advancements for GDBMS agents. Democratizes graph data analysis via natural language.

What to do next

Prioritize whether this update affects your current workflow this week.

Who should care:Researchers & Academics

Text2GQL-Bench introduces a unified benchmark for Text-to-Graph-Query-Language systems with 178,184 question-query pairs across 13 domains and multiple GQLs. It features a scalable dataset generation framework and a multi-metric evaluation including grammatical validity, similarity, semantic alignment, and execution accuracy. Evaluations show LLMs struggle with ISO-GQL, achieving only 4% zero-shot execution accuracy, improving to 50% with 3-shot prompting and 45.1% with fine-tuning.

Key Points

1.178k pairs spanning 13 domains and multiple GQLs
2.Scalable framework for diverse datasets
3.Comprehensive eval beyond end-to-end metrics
4.LLM gaps: 4% zero-shot EX on ISO-GQL, 45% fine-tuned

Impact Analysis

Technical Details

Multi-GQL dataset with heterogeneous resources and abstraction levels. Metrics: grammatical validity (up to 90.8% fine-tuned), execution accuracy (45.1% fine-tuned 8B model). Reveals prompting boosts EX to 50% but validity <70%.

#research #text2gql-bench #graph-databases #text-to-gqltext2gql-bench

📄Read original article on ArXiv AI

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Read Next

Same topic

Explore #research

Same product