I built a pipeline that converts all Spanish state legislation into version-controlled Markdown. Eac...

enriquelop • today at 12:01 PM • 13 replies • view on HN

I built a pipeline that converts all Spanish state legislation into version-controlled Markdown. Each law is a file, each reform is a real git commit with the historical date. 8,642 laws, 27,866 commits.

The idea: legislation is just patches on patches on patches. Git already solves this. Instead of reading "strike paragraph 3 and replace with...", you get an actual diff.

The repo is the product. Browse any law, git log to see its full reform history, git diff to see exactly what changed.

Built the pipeline in ~4 hours with Claude Code. Source is BOE (Spain's official gazette) consolidated legislation API.

Exploring whether there's a business here — structured legislation API for legaltech/compliance, or just a useful open dataset. Curious what HN would build with this data.

Replies

artirdx • today at 1:01 PM

Laws intent are often clarified in courts through judgments. If you can overlay the judgements on top of the corresponding law, at correct points in time, I think that will have value. It might, for example, show which laws were referenced the most and which needed to be clarified the most. It might give insights into what legal language constructs stood the test of time and which had to be repeatedly clarified.

➕ show 5 replies

mattogodoy • today at 8:03 PM

I’ve thought of this of many times! But you’re missing the most important part: every reform should be a PR. With discussions and all. That would be the purest form of voting. And all decisions to reach the current state of a law would ve registered and available for everyone to read. Democracy 2.0.

Dreaming costs nothing.

manunamz • today at 7:01 PM

Very cool project. How are you thinking about indexing and discoverability? Git gives you the change history, but navigating the corpus itself seems like the harder problem: Finding related laws, understanding hierarchical relationships between statutes...

Have you considered embedding semantic hierarchical structure directly in the markdown? Something like https://github.com/wikibonsai/semtree ? It lets you build a navigable tree across markdown files using indented [[wikilinks]] as the organizational spine. Could be a natural fit for legislation that already has an inherent taxonomy (constitutional → organic → ordinary, or by subject area).

Bewelge • today at 1:11 PM

Oooh Can you elaborate a bit how the gazette is publishing them? Like what format did you have to parse. And how many documents were there in total? I tried doing the same for German laws 1-2 years ago but LLMs weren't smart enough yet. And the costs would've been at least a couple of thousand €.

Ed: Nevermind, I missed the "BOE (Spain's official gazette) consolidated legislation API" part. Sending jealous greetings from Germany. We just have a bunch of PDFs in Germany. And the private entity that has been publishing them for decades even claims copyright on them!

➕ show 1 reply

rmonvfer • today at 3:29 PM

I looked into this a while back and IIRC, the consolidated legislation doesn’t cover all legislation but only a handful.

Also, in my experience (having built in this space before), regulations aren’t really the issue. Court rulings are, because there’s no open data for them in Spain. And the potential users for a paid product (legal professionals) already know the law; the key players (big law firms) have their own databases of annotated and verified court rulings and other documents.

youknownothing • today at 3:26 PM

This is brilliant. I had thought about this for a long while, you see laws that are just "go to law 132 and amend paragraph 4, then go to law 24 and amend paragraph 9". Basically "laws" are recorded as diffs, and then it's up to the reader to put up the final product in their heads. They should be doing it this way!

airstrike • today at 3:30 PM

This is really cool. I've thought about it for a long time as well but never had the idea of just using git, which is equal parts genius and "obvious" in hindsight, as most great ideas are.

I think the corollary that comes to mind is that reforms, with their git commits, are incrementally valuable if they refer to other parts of the legislation, previous commits, etc. to give more context as to the intent at the time of the law. So maybe there's a way to distill the legislative process into more PR and commit-oriented work—likely ex post as you did here, but perhaps in the future as part of an actual workflow.

And then maybe I'd pitch the idea to some technologically-inclined local government.

daedric7 • today at 2:14 PM

Please! Can you make the same for Portugal? Laws here are a mess of reforms...

➕ show 1 reply

aerhardt • today at 5:10 PM

I’ve had the idea of playing with our laws and trying to ask questions about their growing volume and complexity. This is timely and dope Enrique - mil gracias!

__MatrixMan__ • today at 3:05 PM

It would be a good place to start if you wanted to hard fork the government.

Mordisquitos • today at 1:57 PM

Congratulations, this is a brilliant resource. You have done one of those countless things which I often think about doing, but my utter lack of follow-through and other distractions make it a fantasy. I cannot wait to clone the repo and explore it.

As to what can be done with the data, maybe one interesting step could be a graph-database regarding laws which reference other laws or the definitions that they depend on?

zer00eyz • today at 6:29 PM

Too bad that author, and committer are individuals and not lists. It would be good to see who wrote them and how the voting went as well.

7777777phil • today at 12:25 PM

cool idea, how far back (in time) do those 27k commits go?

Just thinking how this could maybe used for (automated) research / visualization on the evolution of (spanish - in this case) law

➕ show 1 reply

alt Hacker News

Replies