Thursday, February 3, 2011

site stall, apparently due to mySQL stall

One site I manage is on a shared hosting server from a major vendor. The site runs a MySQL-backed CMS.

In the past few months, my admin colleague and I have noticed several apparently random occasions when page loading stalls almost immediately after displaying a little bit of generic CMS stuff. No site-specific content at all. We've seen stalls of 15 to 30 minutes. (Inexplicably, no site visitor has ever complained, though there are a lot more of them than us.)

In one case I happened to check the site during a stall and I had time to sign in to the hosting control panel. Suspecting a db issue, I ran phpMyAdmin on the site's db. I got past the preliminaries but the first data query stalled. Eventually, 10 or so minutes later, the problem cleared itself --I certainly didn't fix anything-- and the query returned appropriate data. I checked the site a few seconds later and it was operating normally.

The hosting service tech support has been unresponsive or evasive. In one case, I reported a pure stall and they demanded an error message.

In the most recent stall, my colleague reported this message appeared:

Unable to establish connection to MySQL 2002 : Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

Based on these observations, what is the likely cause of the stalls? A MySQL fault, maybe an overload? Any other possibilities?

What diagnostic logs might be available on the server? What can/should I do to instrument the site?

Does it seem likely that the same underlying problem is being observed, without or with the above-quoted error message? (Maybe the tech staff has recently gotten on the case and added/enabled the message?)

Does this look like something I can take care of or is it a matter of persuading the hosting service to fix?

Are such outages a more-or-less standard "feature" of inexpensive hosting? (The project unfortunately can't afford better.)

Am I missing anything incredibly obvious?

  • If you can get into phpmyadmin, try:

    show processlist

    It may give you some insight if it is a query you're executing that is causing the pause.

    If you can get access to the slow query log, that also MIGHT give you some direction to look. If it is a cheap shared hosting company, you may not get access to that.

    Based on your question, you're connecting to localhost, so, that cuts out most of the network related issues. It is also possible that it isn't a query that you're executing that is causing issues in which case processlist probably won't show you what you need.

    You really need to get your host to sit down, log things, and figure it out since most of the possible issues won't be visible to the username you have and its limited permissions. But, without them being helpful, you're sort of stuck.

    hen3ry : I should have mentioned that it's a small website, under 50 distinct pages, and we observed the stall on the index page. It's simply difficult to believe the problem was a bad query on such a routine operation. -->
    hen3ry : Thanks, I'll try "show processlist" if I ever stumble upon another stall. If it is important to do this, I have been thinking of writing a script on my local client that checks for a stall, say, every 3 minutes, and lets me know about it, but there's no guarantee I'll be available the next time it occurs. Maybe a cron job on the host could do it... -->
    hen3ry : The hosting service doesn't seem to make a slow query log available -- there's nothing mentioned in their support and forum site. It seems my only hope is to document a stall and contact them within, say, 24 hours, and call until I get someone actually responsive. My guess is they will tell me about some additional tool I should have used...
    karmawhore : While you don't think it can be a simple query, depending on their mysql configuration, they may have pinned it to one cpu, or, based on some of the circumstances, you might have hit the maximum number of tables open, in which case mysql closes the least recently used, and opens the new table. If you have other queries being executed for other virtual customers, they are handled in a first come, first served basis (somewhat), and that could explain your issues. 15-30 minutes would be extremely odd, but, could be a severely overloaded server that is just 'catching up' from extreme overload.
    hen3ry : Obviously "shared" is "shared". The key information you provide is that DB accesses are more directly shared, with the implication that the particular configuration (i.e. mis-configuration) might allow extreme hogging by other customers. That said, I was really surprised that phpMyAdmin stalled, too. It seems to me that a such a tool should deal gracefully with such an issue. If the DB doesn't respond within a reasonable amount of time, it should time out and return an error and diagnostic message giving some clues about what's going on. -->
    hen3ry : Maybe this is issue for a MySQL tech forum: Under what conditions in a shared hosting configuration could both ordinary DB queries and phpMyAdmin queries stall? Is there anything an individual hosting service customer with limited privileges do to fix or at least diagnose the issue?
    karmawhore : A single mysql instance runs on a server for a shared server. Depending on the configuration, once you have x threads with a mutex where x = number of cores on the machine, or, a large query that has forced everything into swap, causing the machine to churn away for 15 minutes to clear it, you're at the mercy of the resources at hand. If they have properly configured permissions, it is doubtful you could get enough information. show status;show variables; might give some insight into the config, but, you won't be able to see all processes during an event, or, analyze the slow query log.
    hen3ry : Sounds hopeless. If I were a suspicious type, I'd say that this vendor isn't too concerned about frequent/severe resource blocks as these encourage shared-hosting customers to move to more expensive plans. On top of poor tech support and inadequate documentation, this strengthens my feeling that it is time to move to another hosting service. Just have to figure out how to move 55GB of site data across, and we're out of there. That's a different discussion…
    From karmawhore
  • Unable to establish connection to MySQL 2002 : Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

    The above message means in your case that the MySQL server truly was unavailable. Since it seems to be running on the same host than your CMS, networking problems are out of the question.

    Does the stall happen always around the same time? Maybe the vendor is taking backups and for some reason stopping MySQL for the duration of the backup? Or perhaps they have some other issues with MySQL which causes it to crash and burn and requiring a restart?

    If I were you, I would make a simple monitoring script which would run SHOW GLOBAL STATUS every now and then and keeping an eye if the server constantly runs near its limits (you can get the limits for maximum connections and stuff like that with SHOW VARIABLES). If it does so, I would complain to vendor and kindly ask them to fix the thing.

    On the other hand, if the server usually is not nearly its limits and suddenly has a spike in connections, then you are just observing the effects of shared hosting. It doesn't have to be your site which has problems, it can be someone else.

    Joris : On top of that I'd monitor access failures explicitly (all errors turned on etc). There are numerous reasons why a mysql server would refuse connections or be unavailable that may not show up in the server itself. Best shot imho is to estabelish a pattern and confront the hoster loud and clear.
  • Thanks to everyone who responded. I think the question is pretty much answered.

    I think it is clear that the mySQL server becomes unavailable for what most people would consider unacceptably long times. I may want to instrument the server, seeking to characterize the pattern of these occurrences, as evidence to be submitted to the hosting service's tech support.

    If I am able to access the site's control panel during a stall, people have suggested some useful mySQL diagnostics. I'll try them, but it doesn't much matter what I find; essentially, "unavailable" is "broken" and it is entirely the responsibility of the hosting service to characterize and fix the problem. It's clear that is not a high priority of this vendor.

    It seems clear that some lower standard of site performance must be accepted by users of low-cost shared hosting, and I'm resigned to that. Based on my broad experience with this particular vendor, I feel I will have better results elsewhere, and I'm planning on moving to a different hosting service as soon as I can -- probably a matter of months. Until then, it might be worthwhile to add instrumentation.

    Thanks!

    From hen3ry

0 comments:

Post a Comment