logoalt Hacker News

raw_anon_1111yesterday at 2:21 PM1 replyview on HN

Also with Redshift - split the file up before ingestion to equal the number of nodes or combine a lot of small files into larger files before putting them into S3 and/or use an Athena CTAS command to combine a lot of small files into one big file.

So in my other case, the whole thing was

Web crawler (internal customer website) using Playwrite -> S3 -> SNS -> SQS -> Lambda (embed with Bedrock) -> S3 Vector Store.

Similar to what you said, I ran into Bedrock embedding service limits. Then once I told it that, it knew how to adjust the lambda concurrency limits. Of course I had to tell it to also adjust the sqs poller so messages wouldn’t be backed up in flight, then go to the DLQ without ever being processed.


Replies

Mooshuxyesterday at 4:12 PM

The file splitting tip for Redshift is solid. One thing that caught us in a similar SNS/SQS/Lambda/Bedrock setup was not having a DLQ on the Lambda event source. When Bedrock started throttling hard, messages dropped silently and our vector store ended up with gaps we didn't notice for almost a week. Worth adding if you haven't ... it's the kind of thing you only miss once.