Running Out of Disk Space in Production

116 points • by romes • last Friday at 2:42 PM • 59 comments • view on HN

Comments

A neat trick I was told is to always have ballast files on your systems. Just a few GiB of zeros that you can delete in cases like this. This won't fix the problem, but will buy you time and free space for stuff like lock files so you can get a working system.

➕ show 14 replies

dirkt • today at 2:13 PM

If you run nginx anyway, why not serve static files from nginx? No need for temporary files, no extra disk space.

The authorization can probably be done somehow in nginx as well.

➕ show 2 replies

entropie • today at 12:37 PM

> I rushed to run du -sh on everything I could, as that’s as good as I could manage.

I recently came across gdu (1) and have installed/used it on every machine since then.

[1]: https://github.com/dundee/gdu

➕ show 4 replies

gmuslera • today at 2:41 PM

Putting limits on folders where information may be added (with partitions or project quotas) is a proactive way to avoid that something misbehaves and fills the whole disk. Filling that partition or quota may still cause some problems, depending on the applications writing there, but the impact may be lower and easier to fix than running out of space for everything.

SoftTalker • today at 4:51 PM

I've run into that "process still has deleted files open" situation a few times. df shows disk full, but du can't account for all of it, that's your clue to run lsof and look for "deleted" files that are open.

Even more confusing can be cases where a file is opened, deleted or renamed without being closed, and then a different file is created under the orginal path. To quote the man page, "lsof reports only the path by which the file was opened, not its possibly different final path."

ilaksh • today at 4:18 PM

I'm not sure that his problems are really over if a LOT of people were downloading a 2GB file. It would depend on the plan. Especially if his server is in the US.

But maybe the European Hetzner servers still have really big limits even for small ones.

But still, if people keep downloading, that could add up.

bdcravens • today at 1:39 PM

I appreciate the last line

> Note: this was written fully by me, human.

nottorp • today at 4:05 PM

Didn't root used to have some reserved space (and a bunch of inodes) on file systems just for occasions like this?

huijzer • today at 1:36 PM

> Plausible Analytics, with a 8.5GB (clickhouse) database

And this is why I tried Plausible once and never looked back.

To get basic but effective analytics, use GoAccess and point it at the Caddy or Nginx logs. It’s written in C and thus barely uses memory. With a few hundreds visits per day, the logs are currently 10 MB per day. Caddy will automatically truncate if logs go above 100 MB.

grugdev42 • today at 3:39 PM

You missed out point five.

5. Implement infrastructure monitoring.

Assuming you're on something like Ubuntu, the monit program is brilliant.

It's open source and self hosted, configured using plain text files, and can run scripts when thresholds are met.

I personally have it configured to hit a Slack webhook for a monitoring channel. Instant notifications for free!

brunoborges • today at 1:59 PM

I remember a story of an Oracle Database customer who had production broken for days until an Oracle support escalation led to identifying the problem as mere "No disk space left".

➕ show 1 reply

jollymonATX • today at 4:03 PM

Never partition 100%. Simple solution here really and should be standard for every sysadmin. Like never worked with one that needed to be told this...

renatovico • today at 4:10 PM

Why not implement x send file ?

RALaBarge • today at 3:03 PM

Wait until you run out of inodes!

MeetRickAI • today at 4:15 PM

[dead]

tcp_handshaker • today at 1:14 PM

[dead]

giahoangwin • today at 1:34 PM

[dead]

alt Hacker News

Running Out of Disk Space in Production

Comments