Saltar a detalles del preprintSaltar a PREreviews

PREreviews de Towards a Science of AI Agent Reliability

1 PREreview

  1. PREreview de Zirui Wei

    Summary

    This paper argues that current agent evaluation practice — reporting mean task success rates — fundamentally fails to capture whether agents are reliable enough for real-world deployment. The authors propose a four-dimensional reliability framework decomposed into twelve concrete metrics:…

    Leer la PREreview de Zirui Wei