logoalt Hacker News

Rosalind: A genomics toolkit in Rust running whole-genome pipelines on a laptop

182 pointsby samuelllast Thursday at 1:55 PM52 commentsview on HN

Comments

logannyeMDyesterday at 9:48 PM

Hey guys, this is my github repo. Glad it's received some interest - I figured HN might be the culprit when it suddenly jumped ~100 stars despite not working on the code base since last year. I prototyped this out of personal curiosity last year and moved on abruptly so there's a lot of gaps I still need to close and knobs that need to be optimized. But if people genuinely find "deterministic genomics workloads on edge devices" proposal useful, I'll begin refining the code tonight and try to make it as useful as possible. If you have any particular bioinformatics tasks or use cases that you want to be feasible on edge devices, lmk and I'll work on integrating new capabilities. Always happy to be helpful

show 1 reply
devlovstadtoday at 8:28 AM

I work with genomics pipelines in my day job. This repo does not seem quite ready for serious usage until a comparison is made with existing tools such as Bowtie 2/samtools/Strelka or similar. For cancer genomes, it's also a bit limiting that it does not call structural variants instead of just SNVs/indels.

a_bonobotoday at 1:23 AM

There has been a bit of a 'trend' to rewrite common bioinformatics/comp-bio into faster languages (Rust) via LLMs, OP's repo seems to be an early example.

Seqera Labs has a bit of a manifesto: https://rewrites.bio/

Heng Li has an overview here too: https://lh3.github.io/2026/04/17/the-ai-rewrite-dilemma

IMHO it's... OK? Bioinformatics code quality is generally poor, untrained biologists writing functioning code that is poor in scoping, but works. (Unguided) LLMs write on that level, too, so not much harm done.

mriettoday at 1:56 AM

Realistically, without data from a large testset that compares this thoroughly to Samtools (and others?), I wouldn't touch this.

Note to the OP: specify a focus please? short, long, mega-long read and bacterial, human, small plant or large plant genome? Alignment heuristics and performance differ significantly across those axes.

p4ulyesterday at 5:56 PM

This is interesting; thanks for sharing! I have been curious about the adoption of Rust in computational biology. I know that the folks at Saint Jude's [1] are also using Rust for their 'omics research.

[1] https://github.com/stjude-rust-labs

show 4 replies
samuelltoday at 9:18 AM

I shared this since it seems to address a somewhat similar niche that I have had hopes to one day develop, based on FlowBase [1]; A library of streaming processing components based on basic operations, that can be easily stitched together into larger pipelines in a compiled language that can run on smaller hardware too.

FlowBase or I didn't have much of ideas about how to keep data structures compact, as the linked library does, and I was mostly aiming to make it really easy to build streaming pipelines.

I haven't yet got my head around how the composability story is in rosalind though, so would be interested in any pointers or examples on how this would be done using it.

[1] https://github.com/flowbase/flowbase

croemeryesterday at 9:26 PM

Those are all the tests for alignment. They don't even check the alignment is correct. Just that there are no errors. This is a joke: https://github.com/logannye/rosalind/blob/main/tests/alignme...

Looks like total slop to me. All code in one commit, then a bunch of commits polishing the Readme.

No release, no updates in half a year.

vatsachakyesterday at 8:32 PM

Looking at the commenting pattern, it seems like AI unfortunately

show 1 reply
vfalbortoday at 12:20 PM

Have you tested with other similar softwares such as Blast, which is the most common?

danborn26today at 9:36 AM

Rust is a great fit for genomics. Processing whole genomes locally on a laptop is a huge step up from typical Python pipelines.

boron1006yesterday at 7:50 PM

Lots of bad smells in this repo.

show 1 reply
semiinfinitelyyesterday at 8:01 PM

bioinformaticians have been making these useless bioinformatic-toolkit-in-my-favorite-programming-language repos for years

show 3 replies
peterfireflyyesterday at 6:56 PM

Should have called it Raymond.

show 1 reply
Jerry2today at 12:55 PM

Awesome piece of software! Quick side question... does anyone have a recommendation for a DNA genotyping service that prioritizes privacy? I'm looking for a company that provides private results and doesn't add them to any sort of database (dystopian or otherwise). I'd love to get my DNA profile, but I'm concerned about privacy issues. :\

show 1 reply
Rijanhastwoearsyesterday at 8:06 PM

> A deterministic genomics engine with a compact memory footprint.

Uhh... are there stochastic genomics pipelines?

show 1 reply
shaunielyesterday at 7:16 PM

I would love to hear about what the sacrifices are, but this project really looks amazing.

penciltwirlertoday at 2:16 PM

blatant copyright infringement of https://rosalind.info/problems/locations/

bonsai_spoolyesterday at 7:33 PM

Didn't see a publication or preprint for this - is there one?

byrohitrajantoday at 4:34 AM

[flagged]

stelsmindyesterday at 9:43 PM

[flagged]

qzgrid37yesterday at 7:01 PM

[dead]