logoalt Hacker News

rozapyesterday at 10:21 PM1 replyview on HN

There are a number of these out there, and this one has a super easy setup and appears to Just Work, so nice job on that. I had it going and producing plausible results within a minute or so.

One thing I'm wondering is if there's anyone doing this at scale? The issue I see is that with complex workflows which take several dozen steps and have complex control flow, the probability of reaching the end falls off pretty hard, because if each step has a .95 chance of completing successfully, after not very many steps you have a pretty small overall probability of success. These use cases are high value because writing a traditional scraper is a huge pain, but we just don't seem to be there yet.

The other side of the coin is simple workflows, but those tend to be the workflows where writing a scraper is pretty trivial. This did work, and I told it to search for a product at a local store, but the program cost $1.05 to run. So doing it at any scale quickly becomes a little bit silly.

So I guess my question is: who is having luck using these tools, and what are you using them for?

One route I had some success with is writing a DSL for scraping and then having the llm generate that code, then interpreting it and editing it when it gets stuck. But then there's the "getting stuck detection" part which is hard etc etc.


Replies

anerliyesterday at 10:35 PM

Glad you were able to get it set up quickly!

We currently are optimizing for reliability and quality, which is why we suggest Claude - but it can get expensive in some cases. Using Qwen 2.5-VL-72B will be significantly cheaper, though may not be always reliable.

Most of our usage right now is for running test cases, and people seem to often prefer qwen for that use case - since typically test cases are clearer how to execute.

Something that is top of mind for is is figuring out a good way to "cache" workflows that get taken. This way you can repeat automations either with no LLM or with a smaller/cheap LLM. This will would enable deterministic, repeatable flows, that are also very affordable and fast. So even if each step on the first run is only 95% reliable - if it gets through it, it could repeat it with 100% reliability.

show 1 reply