Following the thread of LLM evals
A consistent thread since I left Google has been people asking for help with creating value with LLMs in their products. When I pulled this thread, it seemed that people behind most of my inbound were blocked on creating good evals. Or so I believed, until I dug even further. Note that it’s very much the case that we need more education and better tooling on the nuts and bolts of eval construction. But the root cause seems to be that folks often haven't successfully integrated their product's overall strategy and organizational dynamics with the process of creating meaningful evals.
https://github.com/varungodbole/llm-evals is my attempt to articulate such a strategy towards creating meaningful evals. It’s a v0 and I’d really appreciate any feedback on it if you get a chance to read it!
Please drop a comment, email me or otherwise let me know!