Jayson Minard: Yes, I Crashed the Site!

Jayson Minard wrote a very good arti­cle on upgrad­ing a pro­duc­tion site and what can go wrong and what we can learn from it.

Yes­ter­day, I per­formed an upgrade to a third-party pack­age used with Zend Devel­oper Zone. It has an auto­mated schema update sys­tem which silently per­forms actions on the data­base that had a large impact on ZDZ and related sites caus­ing an out­age. So, there are good lessons from my post-mortem that I would like to share with the community.

The Start of the Problem

First, let us look at the actual list of actions that started the issue:

1. The upgrade does a schema check on first load

2. The upgrade then cor­rects the schema to be valid for the new release (per­form­ing table changes via DDL)

3. The upgrade then may mod­ify large amounts of exist­ing data, or delete large amounts of old data

These schema and data updates can cause huge poten­tial issues when the data­base and tables are used con­cur­rently by the online site. First, the DDL changes will lock the affected tables. And for some stor­age engines (i.e. MyISAM in MySQL) the mod­i­fi­ca­tions or deletes will also cause table locks, and in other engines they could cause con­tention on locked rows, and in other engines cause things like roll­back seg­ments to over­flow. The online site then waits behind the locks or con­tention caus­ing threads in the web server to be held until no more threads are left to serve actual user requests. No more threads, no more site.

Learn from the prob­lem; via Yes, I Crashed the Site!.

Tags: ,

3 Responses to "Jayson Minard: Yes, I Crashed the Site!"

Leave a Comment

*

Get Adobe Flash player