ggsql: A Grammar of Graphics for SQL

433 points • by thomasp85 • yesterday at 12:51 PM • 80 comments • view on HN

Comments

Maybe I skim read it too fast, but I did not find any clear description in the blog post or website docs of how this relates to SQL databases

I was kind of guessing that it doesn't run in a database, that it's a SQL-like syntax for a visualisation DSL handled by front end chart library.

That appears to be what is described in https://ggsql.org/get_started/anatomy.html

But then https://ggsql.org/faq.html has a section, "Can I use SQL queries inside the VISUALISE clause," which says, "Some parts of the syntax are passed on directly to the database".

The homepage says "ggsql interfaces directly with your database"

But it's not shown how that happens AFAICT

confused

➕ show 4 replies

getnormality • yesterday at 2:04 PM

I skimmed the article for an explanation of why this is needed, what problem it solves, and didn't find one I could follow. Is the point that we want to be able to ask for visualizations directly against tables in remote SQL databases, instead of having to first pull the data into R data frames so we can run ggplot on it? But why create a new SQL-like language? We already have a package, dbplyr, that translates between R and SQL. Wouldn't it be more direct to extend ggplot to support dbplyr tbl objects, and have ggplot generate the SQL?

Or is the idea that SQL is such a great language to write in that a lot of people will be thrilled to do their ggplots in this SQL-like language?

EDIT: OK, after looking at almost all of the documentation, I think I've finally figured it out. It's a standalone visualization app with a SQL-like API that currently has backends for DuckDB and SQLite and renders plots with Vegalite. They plan to support more backends and renderers in the future. As a commenter below said, it's supposed to help SQL specialists who don't know Python or R make visualizations.

➕ show 5 replies

lmeyerov • yesterday at 3:49 PM

This is great

We reached a similar conclusion for GFQL (oss graph dataframe query language), where we needed an LLM-friendly interface to our visualization & analytics stack, especially without requiring a code sandbox. We realized we can do quite rich GPU visual analytics pipelines with some basic extensions to opencypher . Doing SQL for the tabular world makes a lot of sense for the same reasons!

For the GFQL version (OpenCypher), an example of data loading, shaping, algorithmic enrichment, visual encodings, and first-class pipelines:

- overall pipelines: https://pygraphistry.readthedocs.io/en/latest/gfql/benchmark...

- declarative visual encodings as simple calls: https://pygraphistry.readthedocs.io/en/latest/gfql/builtin_c...

JHonaker • yesterday at 6:16 PM

I applaud the project, and I completely agree that the concepts maps nicely to SQL. The R equivalent of a WITH data prep block followed by the VISUALIZE is pretty much how all my plotting code is structured.

However, I don't see what the benefits of this are (other than having a simple DSL, but that creates the yet another DSL problme) over ggplot2. What do I gain by using this over ggplot2 in R?

The only problem, and the only reason I ever leave ggplot2 for visualizations, is how difficult it is to do anything "non-standard" that hasn't already had a geom created in the ggplot ecosystem. When you want to do something "different" it's way easier to drop into the primitive drawing operations of whatever you're using than it is to try to write the ggplot-friendly adapter.

Even wrapping common "partial specificiations" as a function (which should "just work" imo) is difficult depending on whether you're trying to wrap something that composes with the rest of the spec via `+` or via pipe (`|>`, the operator formerly known as `%>%`)

➕ show 3 replies

nicoritschel • yesterday at 3:39 PM

This is neat. I do wish there was a way for this to gracefully degrade in contexts without support for the grammar, though.

I devised a similar in spirit (inside SQL, very simplified vs GoG) approach that does degrade (but doesn't read as nice): https://sqlnb.com/spec

➕ show 1 reply

kasperset • yesterday at 1:59 PM

Will this ever integrate rest of the ggplot2 dependent packages described here: https://exts.ggplot2.tidyverse.org/gallery/ in the near or distant future? Sorry if it already mentioned somewhere.

➕ show 1 reply

efromvt • yesterday at 1:51 PM

Love the layering approach - that solves a problem I’ve had with other sql/visual hybrids as you move past the basics charts.

tmoertel • yesterday at 5:56 PM

First, ggsql looks awesome. Can't wait to try it out.

Feedback: A notable omission in the ggsql docs: I cannot find any mention of the possible outputs. Can I output a graphic in PDF? In SVG? PNG? How do I control things like output dimensions (e.g., width=8.5in, height=11in)?

The closest I got was finding these few lines of example code in the docs for the Python library:

    # Display or save
    chart.display()  # In Jupyter
    chart.save("chart.html")  # Save to file

➕ show 1 reply

tomjakubowski • today at 1:45 AM

This is amazing work. I've always wanted to be able to make visualizations straight from a SQL REPL. I've been thinking about building something sort of like this with Perspective[1] + DuckDB. Perspective would give an interactive chart (and/or wicked datagrid), and using a Perspective virtual server, you could feed filter + pivot conditions in the UI back to the database query. Maybe this is the kick I need to get this out of ideation.

[1]: https://perspective-dev.github.io/

thomasp85 • yesterday at 12:51 PM

The new visualisation tool from Posit. Combines SQL with the grammar of graphics, known from ggplot2, D3, and plotnine

➕ show 3 replies

semmulder • yesterday at 5:48 PM

This seems really cool, will try it out soon!

What made it click for me was the following snippet from: https://ggsql.org/get_started/grammar.html

> We’ve tried to make the learning curve as easy as possible by keeping the grammar close to the SQL syntax that you’re already familiar with. You’ll start with a classic SELECT statement to get the data that you want. Then you’ll use VISUALIZE (or VISUALISE ) to switch from creating a table of data to creating a plot of that data. Then you’ll DRAW a layer that maps columns in your data to aesthetics (visual properties), like position, colour, and shape. Then you tweak the SCALEs, the mappings between the data and the visual properties, to make the plot easier to read. Then you FACET the plot to show how the relationships differ across subsets of the data. Finally you finish up by adding LABELs to explain your plot to others. This allows you to produce graphics using the same structured thinking that you already use to design a SQL query.

refset • yesterday at 8:57 PM

Also in this vein is Shaper, a SQL-first approach for handling entire dashboards (and powered by DuckDB): https://taleshape.com/shaper/docs/getting-started/

jorin • yesterday at 3:10 PM

Really cool project! Would love to see a standard established for representing visualizations in SQL! I built a whole dashboarding tool on top of the idea: https://taleshape.com/shaper/docs/getting-started/ But Shaper takes a more pragmatic approach and just uses built in functionality to describe how to visualize the results. The most value I see with viz as SQL is that it's a great format for LLMs to specify what they want while making it easy to audit and reproduce. Just built a slack bot on top of that concept last week: https://taleshape.com/blog/build-your-own-data-analytics-sla...

tauroid • yesterday at 8:49 PM

Would be quite compelling if the CLI worked with nushell.

I see the (a?) backend is polars, which is good as well.

This in CLI might be the quickest / tersest way to go from "have parquet / csv / other table format" to "see graph", though a keen polars / matplotlib user would also get there pretty quick.

gh5000 • yesterday at 2:01 PM

It is conceivable that this could become a duckdb extension, such that it can be used from within the duckdb CLI? That would be pretty slick.

➕ show 1 reply

jiehong • yesterday at 3:16 PM

The cli only produces vega-lite[0] json graphics, right?

It would be nice if it included a rendering engine.

[0]: https://github.com/vega/vega-lite

➕ show 1 reply

kasperset • yesterday at 1:18 PM

Looks intriguing. Brings plotting to Sql instead of “transforming” sql for plotting.

jiehong • yesterday at 3:06 PM

Outstanding!

This can replace a lot of Excel in the end.

It makes so much sense now that it exists!

urams • yesterday at 5:44 PM

What feels like a lifetime ago, I made almost all of the R tooling at Uber and actually implemented what was effectively exactly this on top of the R DB tooling. Everyone, I think correctly in retrospect, I showed it to thought it was more or less useless. I had hoped it would be a nice way to pull and instantly visualize data but that was rarely all that valuable like this.

➕ show 1 reply

ericdfournier • yesterday at 7:32 PM

Very cool. Though it would be great to see this implemented as a PostgreSQL extension (if possible).

radarsat1 • yesterday at 1:28 PM

Wow, love this idea.

data_ders • yesterday at 2:21 PM

ok, this is definitely up my alley. color me nerd-sniped and forgive the onslaught of questions.

my questions are less about the syntax, which i'm largely familiar with knowing both SQL and ggplot.

i'm more interested in the backend architecture. Looking at the Cargo.toml [1], I was surprised to not see a visualization dependency like D3 or Vega. Is this intentional?

I'm certainly going to take this for a spin and I think this could be incredible for agentic analytics. I'm mostly curious right now what "deployment" looks like both currently in a utopian future.

utopia is easier -- what if databases supported it directly?!? but even then I think I'd rather have databases spit out an intermediate representation (IR) that could be handed to a viz engine, similar to how vega works. or perhaps the SQL is the IR?!

another question that arises from the question of composability: how distinct would a ggplot IR be from a metrics layer spec? could i use ggsql to create an IR that I then use R's ggplot to render (or vise versa maybe?)

as for the deployment story today, I'll likely learn most by doing (with agents). My experiment will be to kick off an agent to do something like: extract this dataset to S3 using dlt [2], model it using dbt [3], then use ggsql to visualize.

p.s. @thomasp85, I was a big fan of tidygraph back in the day [4]. love how small our data world is.

[1]: https://github.com/posit-dev/ggsql/blob/main/Cargo.toml

[2]: https://github.com/dlt-hub/dlt

[3]: https://github.com/dbt-labs/dbt-fusion

[4]: https://stackoverflow.com/questions/46466351/how-to-hide-unc...

➕ show 1 reply

persedes • yesterday at 5:56 PM

Soo can I put this on top of e.g. grafana?

breakfastduck • yesterday at 3:15 PM

This is fantastic. Feels like something that should've been in there from the start!

estetlinus • yesterday at 8:17 PM

I like it. I can see a world where these visuals become my serving layer in dbt. Small, clean and versioned .sql-files.