Slack went down on June 10, 2016. I noticed, which is funny, since I don’t necessarily use the tool every day. I tend to still communicate with email, but more and more people like using Slack at Redgate, so I do pop over there. It’s also a good way to drop a quick note and perhaps get a quick reply. In this case I’d asked a group to do something and hadn’t heard back. Since my request didn’t generate a ticket, I didn’t want to send another email, which could result in more delays if someone isn’t processing email. However, that’s not important.
What was interesting is my Slack web page didn’t connect, and when I pinged @slackhq on Twitter, they directed me to status.slack.com. That’s where they post updates. That site was also down, which somewhat defeats the purpose of having a separate location for updates.
I’ve experienced this a few times, where someone (sometimes me) has built an update or notification mechanism that is dependent in some way on the actual service we’re updating. Often this has been because the same http server is being used, but sometimes it’s the same database instance is used to allow non-technical people to post updates. In all those cases, at some point the update mechanism has broken.
I’ve learned to actually separate my update broadcast mechanism from the production server. We’ve done this in a few ways. I’ve had includes of a simple text file in Web applications in addition to a static page that can be served from a web server. I’ve learned to use a separate physical host that can be moved to the proper IP address in the event that our firewall or load balancers don’t work. They key, I’ve learned, is separation. Have a separate resource that can manage a simple message back to users. Perhaps even a small database that can respond to queries with a “we’re down” reply.
Downtime is never good for users, and rarely are people pleased with being unable to access their system, but good communication goes a long way to soothing the hurt feelings. Most of us accept that systems go down and problems occur. What we’d like is a short note (and updates) that let us know something is being done.