Saltar al contenido principal

Escribe una PREreview

LEGRA: A Pipeline for Building Graph-Based Representations of Polish Court Rulings for Legal Retrieval-Augmented Generation

Publicada
Servidor
Preprints.org
DOI
10.20944/preprints202511.1742.v1

Efficient access to similar legal cases is a crucial requirement for lawyers, judges, and researchers. Traditional text-based search systems often fail to capture both the semantic similarity and the relational context of legal documents \cite{article}. To address this challenge, we present LEGRA, a novel graph-based dataset of Polish court rulings designed for Retrieval-Augmented Generation (RAG) and legal research support \cite{https://doi.org/10.48550/arxiv.2005.11401}. LEGRA is automatically constructed through an end-to-end pipeline: rulings are collected from public sources, converted and cleaned, chunked into passages, and enriched with TF-IDF vectors and embedding representations. The data is stored in a Neo4j graph database where documents, chunks, embeddings, judges, courts, and cited laws are modeled as nodes connected through explicit relations. This structure enables hybrid retrieval that combines semantic similarity with structural queries, allowing legal professionals to quickly identify not only textually related cases but also those linked through judges, locations, or legal references. We discuss the construction pipeline, the graph schema, and potential applications for legal practitioners. LEGRA demonstrates how graph-based datasets can open new directions for AI-powered legal research.

Puedes escribir una PREreview de LEGRA: A Pipeline for Building Graph-Based Representations of Polish Court Rulings for Legal Retrieval-Augmented Generation. Una PREreview es una revisión de un preprint y puede variar desde unas pocas oraciones hasta un extenso informe, similar a un informe de revisión por pares organizado por una revista.

Antes de comenzar

Te pediremos que inicies sesión con tu ORCID iD. Si no tienes un iD, puedes crear uno.

¿Qué es un ORCID iD?

Un ORCID iD es un identificador único que te distingue de otros/as con tu mismo nombre o uno similar.

Comenzar ahora