Do you test your ability to respond to emergencies?


IT Operations - Mitigate Disasters

Today in Canada we tested our nationwide emergency response system.  Apparently the test failed in the province of Quebec.  It did in fact succeed in Ontario, where I live.  Knowing about the test I purposefully had my phone on this afternoon because I was interested in what would actually happen.  Sure enough, my phone made a very annoying noise and a message came up to inform me that it was just a test.  So that was good.

An important aspect of IT Operations, and Business Operations for that matter, is to be prepared to respond to emergencies.  While the Canadian government is worried about responding to inclement weather, terrorist attacks, military attacks, and coffee being sold out at the local Timmies your IT department should be concerned about ensuring that your systems are running properly, that they are repelling cyber attacks, and that your data centres are operational (to name a few potential issues). This is why the IT Operations process goal includes a process decision point called Mitigate Disasters (see the pic above).

By running this scheduled disaster simulation, after careful planning and communication (which I why I had heard about it), the Canadian government has discovered in a controlled test that their strategy needs work.  This is exactly the type of thing you want to find out when you have the luxury of safely addressing any problems that you do find.  The government certainly wouldn’t have wanted to discover their emergency alert system didn’t work as expect in the middle of an actual emergency.

What your organization should ask itself is what would happen if:

  • One of your data centres lost power?  Or connectivity?
  • Some of your servers went down?
  • Outsourced services you rely on (think SAAS, PAAS, and other cloud solutions) went down?
  • An application/system went down?
  • A denial of service (DoS) attack succeeded?
  • And many more issues.

Will your IT ecosystem respond properly?  Will it recover automatically?  Are you guessing at these answers or do you know for sure because you’ve actually simulated them?

I hope this blog has been food for thought.  Time for a Timmies.

Have any Question or Comment?

2 comments on “Do you test your ability to respond to emergencies?

Looks like I wrote too soon. They also had problems in Ontario and not everyone received a call.

This is exactly why you run tests like this.

Reply
Valentin Tudor Mocanu

Solar storms ?
Sept. 2, 1859 – like Solar storms ?
Continue, pause with easy recovery, pause with irreversible losses. very long term recovery .. etc, etc.

It is important also because of scale – that could affect any/all other related systems, software or not software.. and also people (rather indirectly).

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

Archives