logoalt Hacker News

Running Out of Disk Space in Production

116 pointsby romeslast Friday at 2:42 PM59 commentsview on HN

Comments

flanflytoday at 11:50 AM

A neat trick I was told is to always have ballast files on your systems. Just a few GiB of zeros that you can delete in cases like this. This won't fix the problem, but will buy you time and free space for stuff like lock files so you can get a working system.

show 14 replies
dirkttoday at 2:13 PM

If you run nginx anyway, why not serve static files from nginx? No need for temporary files, no extra disk space.

The authorization can probably be done somehow in nginx as well.

show 2 replies
entropietoday at 12:37 PM

> I rushed to run du -sh on everything I could, as that’s as good as I could manage.

I recently came across gdu (1) and have installed/used it on every machine since then.

[1]: https://github.com/dundee/gdu

show 4 replies
gmusleratoday at 2:41 PM

Putting limits on folders where information may be added (with partitions or project quotas) is a proactive way to avoid that something misbehaves and fills the whole disk. Filling that partition or quota may still cause some problems, depending on the applications writing there, but the impact may be lower and easier to fix than running out of space for everything.

SoftTalkertoday at 4:51 PM

I've run into that "process still has deleted files open" situation a few times. df shows disk full, but du can't account for all of it, that's your clue to run lsof and look for "deleted" files that are open.

Even more confusing can be cases where a file is opened, deleted or renamed without being closed, and then a different file is created under the orginal path. To quote the man page, "lsof reports only the path by which the file was opened, not its possibly different final path."

ilakshtoday at 4:18 PM

I'm not sure that his problems are really over if a LOT of people were downloading a 2GB file. It would depend on the plan. Especially if his server is in the US.

But maybe the European Hetzner servers still have really big limits even for small ones.

But still, if people keep downloading, that could add up.

bdcravenstoday at 1:39 PM

I appreciate the last line

> Note: this was written fully by me, human.

nottorptoday at 4:05 PM

Didn't root used to have some reserved space (and a bunch of inodes) on file systems just for occasions like this?

huijzertoday at 1:36 PM

> Plausible Analytics, with a 8.5GB (clickhouse) database

And this is why I tried Plausible once and never looked back.

To get basic but effective analytics, use GoAccess and point it at the Caddy or Nginx logs. It’s written in C and thus barely uses memory. With a few hundreds visits per day, the logs are currently 10 MB per day. Caddy will automatically truncate if logs go above 100 MB.

grugdev42today at 3:39 PM

You missed out point five.

5. Implement infrastructure monitoring.

Assuming you're on something like Ubuntu, the monit program is brilliant.

It's open source and self hosted, configured using plain text files, and can run scripts when thresholds are met.

I personally have it configured to hit a Slack webhook for a monitoring channel. Instant notifications for free!

brunoborgestoday at 1:59 PM

I remember a story of an Oracle Database customer who had production broken for days until an Oracle support escalation led to identifying the problem as mere "No disk space left".

show 1 reply
jollymonATXtoday at 4:03 PM

Never partition 100%. Simple solution here really and should be standard for every sysadmin. Like never worked with one that needed to be told this...

renatovicotoday at 4:10 PM

Why not implement x send file ?

RALaBargetoday at 3:03 PM

Wait until you run out of inodes!

MeetRickAItoday at 4:15 PM

[dead]

tcp_handshakertoday at 1:14 PM

[dead]

giahoangwintoday at 1:34 PM

[dead]