VPSLink - Unresponsive Support for 17+ Hours
Saturday, January 26th, 2008Update (1/26/08 @ 11:50pm): Wow - it’s been 17 hours since I last heard from VPSLink regarding my issue with their VPS. Joel Spolsky recently experienced something similar. If I had been going for six nines, my six sigma would be blown for probably the next 100 years =)
Luckily my server isn’t that important. But really? 17 hours? Wow. Even DreamHost will get back to you within 1-2 hours.
The thing I love about VPSLink is that they can partition a new VPS for you within hours.
Normally I have not had any problems with VPSLink, but I kept bumping into memory allocation issues with only 1GB RAM on my VPS, so yesterday I bit the bullet and upgraded to their most powerful option @ $129.95 per month.
That’s when thing started to go downhill.
After submitting a ticket, I got this response:
Your VE should have been migrated to a Link-6 node, but for whatever reason, that never happened. I am doing so now. This should fix the problem.
But 11 hours later when I had a chance to take a look at the issue again, the server was still down.
When I restarted via the control panel, the CPU was load spiking for no apparent reason.
I run about 4-5 rails apps on there with 2-3 mongrels each. Collectively they drive less than 1k visitors / day.
So it’s a bit surprising when my CPU load spikes to 15, 17+!
I’m no expert on this subject, but I believe that means the server is busting through 1,500%+ of its available CPU cycles!
Pretty much the server is unusable when the CPU load is this high.
I’ll keep this post updated to let you guys know how VPSLink responds…
Update…
In an attempt to investigate further, I turned off all possible applications that were running on the box like: nginx, mongrels, mysqld, sendmail.
This is the list when I do a ps aux:
[root@videolockr init.d]# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 1936 672 ? Ss 17:24 0:00 init [3] root 24101 0.0 0.0 1584 564 ? Ss 17:24 0:00 syslogd -m 0 dbus 24130 0.0 0.0 2916 704 ? Ss 17:24 0:00 dbus-daemon --system root 28168 0.0 0.0 4924 1112 ? Ss 17:24 0:00 /usr/sbin/sshd root 28220 0.0 0.0 2156 804 ? Ss 17:24 0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid root 29869 0.0 0.1 7780 2344 ? Ss 17:25 0:00 sshd: deploy [priv] deploy 29925 0.0 0.0 7928 1712 ? R 17:25 0:00 sshd: deploy@pts/0 deploy 29926 0.0 0.0 2436 1360 pts/0 Ss 17:25 0:00 -bash root 30428 0.0 0.0 2656 1080 pts/0 S 17:25 0:00 su root 30552 0.0 0.0 2308 1364 pts/0 S 17:26 0:00 bash root 30701 0.0 0.1 7780 2344 ? Ss 17:26 0:00 sshd: deploy [priv] deploy 31785 0.0 0.0 7780 1700 ? S 17:26 0:00 sshd: deploy@pts/1 deploy 31786 0.0 0.0 2440 1384 pts/1 Ss 17:26 0:00 -bash root 32059 0.0 0.0 3152 1112 ? Ss 17:27 0:00 crond xfs 32288 0.0 0.0 3088 1152 ? Ss 17:27 0:00 xfs -droppriv -daemon deploy 3810 0.0 0.0 2060 992 pts/1 S+ 17:47 0:00 top root 5935 0.0 0.0 2032 820 pts/0 R+ 17:49 0:00 ps aux
More on the top command and load averages can be found here, which says:
The current time, how long the system has been run-ning, how many users are currently logged on, and the sys-tem load averages for the past 1, 5, and 15 minutes.
So, after stopping all of these processes, eventually the server load went down a bit as reported by “top”. You can see the results below (this is after all of those processes have been stoppe




