🦨 Alpha's Tech Garden

Home

❯

ai

❯

benchmarks

❯

LLM Benchmarks

LLM Benchmarks

Properties1
tagsto_complete

Jun 07, 20261 min read

More information: https://blog.nilenso.com/blog/2025/09/25/swe-benchmarks/ to_complete

Agentic / voice benchmarks

  • EVA-Bench — end-to-end evaluation of voice agents across enterprise domains (airline, ITSM, healthcare HRSD)

Graph View

Backlinks

  • Model Unlearning
  • EVA-Bench

Created with Quartz v5.0.0 © 2026

  • GitHub