My server has recently hit the perfect storm of having tons of data and a sudden influx of users over a few days all hammering the template store. This lead to my server CPU being maxed out and everything timing out and generally failing. The bulk of the server code was written a long time ago when I had very little data and query efficiency wasn't really of any concern. Over time the data has grown and grown and suddenly my terrible queries have come back to bite me. The SQL involved is quite complex as it has to join data from various tables with various criteria.
Over the weekend I got a book about SQL efficiency and most importantly how to measure and identify the bottlenecks/problems. I discovered that my query for getting the list of users was utterly shambolic in terms of efficiency (I put a lot of effort into making good queries for the macro lists and they are working quite efficiently.). I've spent today reworking and measuring and have deployed a fix that made the user list query about 20x faster. This has been a big help and my CPU usage is now way down. The fun thing is it looks like I can still make this considerably more efficient, but for now this fix is a massive help.
I hope this will stop any more CPU saturation issues. I've learnt a lot in the last few weeks and I may have to have some very short outages to tweak my server settings in the coming weeks, but these should hopefully be short and controlled for only a few minutes (with any luck).