logoalt Hacker News

zwapsyesterday at 9:38 PM2 repliesview on HN

For instance, I have intent classifiers running on my traces and most tools offer some sort of analysis agents or API so it's claude sdk and go.

Maybe let's take Langsmith. Now I know my gripes with that product. How do you see it? What do you add, specifically?


Replies

zwapsyesterday at 9:43 PM

Maybe as a comment, you really put weight on intent classification. I am not sure why. For it to work, you are gonna need my expert domain input. And given that, I feel like the classification bit is basically solved. I wonder a bit why this is the feature you seem to put front and center (e.g. screenshots)

ttpostyesterday at 9:54 PM

tl;dr: Langsmith + homegrown intents doesn't scale with contributors and agent usage as an Analytics solution. Voker adds trend and usage insights on collaborative dashboards that work for the whole AI product team.

Nice, sounds like you've set up your own solution in house. We definitely see some teams do that, and for some it works perfectly, for others, its too expensive to maintain - they get new requests for new dashboards or different subcuts of data from product or design teams, or they run into an issue like way too many intents generated to be useful, and its not worth the tradeoff of investing time in building internal tooling. But for some it makes sense to roll your own! It also really depends on how many people on the team are involved in building the agent products, and how much volume your agents have. If you have millions of conversations a month with thousands of unique intents, you have to set up data eng pipelines just to process categorize, and store all that data in a way thats usable for the whole team.

When it comes to Langsmith, we hear about them a lot from our customers, pretty much all of them love it as an obs tool, but most say that only the engineers have access or spend time in it, and they've told us the strength of Langsmith is its technical tracing, not its visualizations, ui, or usability. They've told us any "insights" are very canned (because thats not Langsmith's key focus).

We add self-serve analytics - like how Google Analytics lets marketers see how their website is performing without needing to ask engineers to write SQL queries on cloudwatch logs.

Ex: PM can self-serve and look at trends in what users are asking of agents, notice a problem, do a quick RCA, look for reproducibility across other sessions - before deciding to assign as an issue to engineer. Old way would be: PM hears a complaint from a customer, asks the engineer to "look into it" and the eng spends 4 hours combing through Langsmith logs to hunt down one session without even knowing if its actually a widespread issues