🦨 Alpha's Tech Garden

❯

❯

❯

LLM Benchmarks

Properties1

tags	to_complete

Jul 21, 20261 min read

More information: https://blog.nilenso.com/blog/2025/09/25/swe-benchmarks/ to_complete

Agentic / voice benchmarks

EVA-Bench — end-to-end evaluation of voice agents across enterprise domains (airline, ITSM, healthcare HRSD)

Graph View

Backlinks

Model Unlearning
EVA-Bench

Created with Quartz v5.0.0 © 2026

GitHub