Ir para o conteúdo principal

Escrever uma avaliação PREreview

CCLUPE: Benchmark for Credit Context Log Understanding and Prediction Evaluation

Publicado
Servidor
Preprints.org
DOI
10.20944/preprints202604.1432.v1

While Large Language Models (LLMs) have shown great promise in transforming credit risk assess-ment, existing evaluation frameworks are almost exclusively concerned with general financial NLP tasks and neglect the specific reasoning needed by practitioners. To address this, we develop the Credit Context Log Understanding and Prediction Evaluation (CCLUPE) benchmark. Unlike the previous benchmarks, CCLUPE aims to capture and evaluate the intricate reasoning unique to each constituent of the Chinese credit market, where evaluations are heavily based on the integration and synthesis of complex transacted logs and the prediction of hidden financial behaviors. Unlike previous benchmarks, CCLUPE aims to capture and evaluate the intricate reasoning unique to each constituent of the Chinese credit market. Unlike previous benchmarks, CCLUPE aims to capture and evaluate the intricate reasoning unique to each constituent. CCLUPE boasts more than 4,000 premium samples segmented by individual and micro-enterprise customers and distributed among 7 principal log types and 16 sub log types. A comprehensive assessment process involving upwards of 20 professional annotators is enacted to guarantee the quality of the dataset. Moreover, we introduce Log-Score, a novel evaluation metric designed to incorporate log misunderstanding penalties and assess multifaceted competencies. Even the state-of-the-art models underperform when it comes to these high-stakes tasks. CCLUPE serves as a rigorous testbed for the next generation of financial LLMs, ensuring their robustness for deployment in complex real credit scenarios.

Você pode escrever uma avaliação PREreview de CCLUPE: Benchmark for Credit Context Log Understanding and Prediction Evaluation. Uma avaliação PREreview é uma avaliação de um preprint e pode variar de algumas frases a um parecer extenso, semelhante a um parecer de revisão por pares realizado por periódicos.

Antes de começar

Vamos pedir que você faça login com seu ORCID iD. Se você não tiver um iD, pode criar um.

O que é um ORCID iD?

Um ORCID iD é um identificador único que diferencia você de outras pessoas com o mesmo nome ou nome semelhante.

Começar agora