Show HN: Pgit – A Git-like CLI backed by PostgreSQL

103 points • by ImGajeed76 • yesterday at 6:11 AM • 53 comments • view on HN

Comments

Of course, we can’t leave out a mention of Fossil here — the SCM system built by and for SQLite.

➕ show 3 replies

Hey, I tried to import Linux kernel master branch from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin... to pgit. My laptop is not the beefiest (some Ryzen 7 with 16G RAM and about 300G disk free), so that did not quite work. It died when trying to rebuild indexes (after bulk import), due to Postgres running out of disk space.

I guess this could have been expected, but it didn't quite occur to me since plain git has had no issues with that repository. Either way, the import process was quite slow: the failure happened after 3h30m. I'm not sure if it would be possible to speed it up, or estimate resource consumption ahead of time and warn the user? The laptop also had gone almost 2G into swap at some point, so there was quite a bit of memory pressure as well, but I don't quite know at which point this happened.

➕ show 1 reply

aljgz • today at 8:14 AM

Still halfway through reading, but what you've made can unlock a lot of use cases.

> I tried SQLite first, but its extension API is limited and write performance with custom storage was painfully slow

For many use cases, write performance does not matter much. Other than the initial import, in many cases we don't change text that fast. But the simpler logistics of having a sqlite database, with the dual (git+SQL) access to text is huge.

That said, for the specific use case I have in mind, postgres is perfectly fine

➕ show 4 replies

drob518 • today at 3:34 PM

I’m confused by the benchmark detail. It says that the “on disk” size for pgit is always larger than the git aggressive size, but then it breaks out just the pgit data size and says that’s typically smaller. If you’re using PG to implement this, don’t you have to account for the PG storage, too, in your comparison? My takeaway is that pgit always has a larger storage requirement than git aggressive compression. Or am I reading that wrong? Obviously, pgit also brings features like SQL querying that git doesn’t have that you might prioritize more highly. But the author seems to be pushing the storage benefit highly.

➕ show 1 reply

aljgz • today at 11:18 AM

How well does this support random-access queries to the file names and content at a certain revision? Like:

- "Checking out" a specific branch (which can be reasonably slow)

- Query all files and folders in path `/src`

- Query all files and folders in path `/src/*` (and maybe with extra pattern matches)

- Be able to read contents of a file from a certain offset for a certain length

These are similar to file system queries to a working directory

➕ show 1 reply

kardianos • today at 3:37 PM

This could be great for larger repos.

If you couple this with an optional FUSE provider, server side user branches, and gerrit like change sets, that would be awesome.

➕ show 1 reply

dmonterocrespo • today at 1:59 PM

What would be the general purpose of storing the history in a remote database? Is it for use by agents? It's not the same as agents cloning the project and running "git log".

➕ show 1 reply

lmuscat • today at 1:13 PM

Would be cool to populate the DB and keep it in sync by pointing to postgres as an upstream remote inside of git itself. That would probably require a custom postgres extension and a way to accept traffic from git.

➕ show 1 reply

quickrefio • today at 3:30 PM

Feels like swapping filesystem complexity for database complexity.

➕ show 2 replies

Fire-Dragon-DoL • today at 8:59 AM

Wouldn't duckdb be better suited for this? Forgive the stupid question. I just connected "csv as sql" to "git as sql" and duckdb comes to mind

➕ show 1 reply

Terretta • today at 11:44 AM

Why a custom LLM prompt for what appears to be the default 'report' you'd want? Wouldn't the CLI just do this for a report command?

Is there an example of the tool enabling LLM 'discovering' something non-deterministic and surprising?

➕ show 1 reply

Pay08 • today at 10:22 AM

This is incredibly neat and might actually become a part of my toolbox.

➕ show 1 reply

killingtime74 • today at 8:08 AM

I love it. I love having agents write SQL. It's very efficient use of context and it doesn't try to reinvent informal retrieval part of following the context.

Did you find you needed to give agents the schema produced by this or they just query it themselves from postgres?

➕ show 1 reply

Toby11 • today at 10:47 AM

why do agents need to know these metas about git history to perform its coding functions though?

even humans don’t do this unless there’s a crazy bug causing them to search around every possible angles.

that said, this sound like a great and fun project to work on.

➕ show 2 replies

waffletower • today at 3:55 PM

I feel it would be more ergonomic to utilize SQLite as a backend, for the scale of repos I tend to interact with (small-medium sized repos). Yet it might be interesting for all the repos to share a single PostgreSQL db for cross-comparisons -- though that isn't a use case I have seen a need for.

➕ show 1 reply

Zardoz84 • today at 7:51 AM

Interesting... could be used to store multiple git repos and do a full text search across the multiple repos ?

➕ show 1 reply

ydw0127 • today at 4:35 PM

[dead]

olivercoleai • today at 2:03 PM

[dead]

techpulse_x • today at 8:30 AM

[dead]

alt Hacker News

Show HN: Pgit – A Git-like CLI backed by PostgreSQL

Comments