How to Avoid a Server Apocalypse

Jay Wood

9 years ago

As we all know, running an un-managed server can be a hassle. There’s always something you’re not prepared for–and that’s what my story is about!

Sometime ago I was dealing with brute force attacks, and during that time, I thought it was fun to out-wit my attackers. Admittedly, for a while, it was. Down the road I went, with CloudFlare to handle most bots, as well as some general security measures of my own such as moving my login, disabling XML-RPC, installing WangGuard, and a few other scripts I wrote myself.

Lately, I’ve been getting more into actual system administration and learning the ins and outs of a Linux server environment. I started out with Apache (xampp) and evolved into a full-blown dedicated system in Canada. This server holds two Minecraft servers, my remote development environment, my personal website, and a few random databases I use for various side-projects.

On Wednesday March 30th, 2016, my MySQL database filled up, thanks to a sizable database file (1-2 GB) from one of our clients. Historically, during development, I try to mimic the live site of a client as closely as possible. This ensures there aren’t any data integrity issues and guarantees I’m not missing anything.

Well, I realize 1-2 GB isn’t that large when it comes to a database, but considering the fact that I had been working on multiple other projects at the time, as well as my personal data, and add that to the fact that a properly configured Minecraft server can create a significant amount of data in the database with the right logging software.

Well. Whoops. Server apocalypse.

How did this happen?

It’s a culmination of multiple things, the first of which was the MySQL database was mounted in the WRONG location. The second and less controllable external factors being the writing to the database from the Minecraft Server, as well as a few cron jobs I had going, and then the new client database that was imported.

The Server-pocalypse

On to my next mistake! After a lengthy call with Parbs (in my opinion, he is THE guy to go to for server problems), we came up with a solution to move the /var folder over to /home/var, which is where the bulk of my free space existed. After re-initializing mariaDB, I saw my main site come up. Whoa, it worked!

Of course, I checked a few other sites and still saw a db error, but I wrote if off as a whatever because it was end of the day and I wanted to get some R&R in.

Due to my never-ending quest for knowledge, later that night I returned to try to ‘figure’ out the reason the other sites were offline (database issues) while my main site was fine. It NEVER occurred to me the tables may have crashed.

My process went like so:

Google the shiz outta my problem
Proceed to run random commands from StackOverflow…that were from 2006
Kill the server

So how did I kill the server?

Well, first off, if you ever want to break something, take the advice of the internet at face value and do no investigation on your own! That’s pretty much the easiest way to destroy something. I don’t do it when it comes to code, so why I did it for the server issue, I’ll never know.

I ended up mounting a folder ONTO itself (which, up until this point, I didn’t even know that was possible) with symlinks. In the end, I was like “Oh, I don’t need this symlink,” and simply did this: rm -fR /home/var.

deleted my MAIN /var folder, which by default is where MySQL stores its data, which is /var/lib/mysql.

The Aftermath

Well, you can imagine what happens when you delete 20GB worth of data with ZERO backups.

My main site to this day is still offline. I ended up having to flex some time so I could, at minimum, get my development environment online, and of course, my gaming servers (with 100+ players) were offline for two days. As I’m sure you could predict, I had some very unhappy people.

It was during this time I realized I knew nothing about Nginx vs Apache servers. I’m an Apache guy, but making the move to Nginx due to the server wipe was, at least I thought, the logical thing to do. I ended up staying up until 3:30am the day the apocalypse hit, and then spending a chunk of the following day to finish on-lining my development environment, which consisted of me trying and failing to set up Nginx server directives, restarting the server hundreds of times, and finally uploading a 20GB database, and about 130GB worth of files which ate my bandwidth all day.

What I Learned

Don’t touch anything without talking to Brad Parbs first!

In all seriousness: We say it again and again, and we hear it again and again, but even those of us who are professional devs can forget the importance of one very crucial thing: BACKUPS!!! Get a backup system that works. If you’re on a managed server more than likely you already have this. I, however was not, and felt the agony of a server-pocalypse.

One thing I’ve had trouble finding is a good system backup/restore system for remote servers. Anyone out there have a recommendation?

How did this happen?

The Server-pocalypse

So how did I kill the server?

The Aftermath

What I Learned

Share this: