by Danial Schwartz
In Disciplined Agile Delivery (DAD), testing is so important we do it all the way through the lifecycle. One approach that your team will need to consider performing is recovery testing, which is used to see the ability of a system to handle faults. If a fault occurs, does the system keep working and does not stop? In case of a fault can the system recover within a specified period of time? In the event of a critical failure will damage such as physical, economical, health related, etc., result or not?
Recovery testing constitutes of making the system fail; then the results of system recovery are observed. The efficiency of the system to return to normal and the time it takes to do so are examined. The disturbances which can result in failure and need to be checked vary from product to product and from industry to industry.
Consider the healthcare industry and medical devices. When products are developed for the health care industry they have to be in strict accordance with FDA guidelines. They also have to adhere to the guidelines provided by the company for which the product is being made. When recovery tests are made they naturally have to comply with these strict rules. The tests require validation and so does the environment in which they are to be carried out.
The Defense Industry consists of complex systems embedded within one another. The interlink of the systems requires recovery testing which takes into account how different systems affect one another. Since the industry has to deal with harsh environmental variables, these have to be replicated for recovery testing. Doing so is no easy task.
Cloud applications are increasing in popularity. They are part of cloud systems. The cloud systems, in turn, are made up of commodity machines. This allows taking advantage of economies of scale. But this results in needing to use complex software which makes recovery testing quite a challenge.
Before a recovery test can be carried out, the software recovery tester has to make sure that recovery analysis has been undertaken. A fail over test is designed. The fail over test serves to determine that if a given threshold is reached, can the system allocate extra resources. It also serves to show if, in case of critical failure, a system can distribute resources and continue to operate or recover within a specified time.
Consider the example of a server which is reachable but it is not responding as one would expect it to. This is the fail-over cause. The result of this, known as the possible impact, could be a crash. The severity of the impact is medium to high. To simulate this one could initiate wrong responses on the server side.
Another example of a fail-over cause is a power supply failure. If the failure was in the auxiliary power source its possible impact could be a complete shutdown. This is critical. To simulate this the system could be subjected to a change in power strength or the power cord could simply be unplugged.
A low impact severity example includes a DB overload. This could result in slow response time. It could also result in information not being fetched from the DB leading to an error. Using appropriate tools a load test could be created to simulate this scenario.
At times a service might stop posing a low to high impact severity depending on the service which stopped. There might not be any possible impact or an application might stop working. To simulate this one could stop the service manually to see the possible impact.
The tester also has to ensure that the test plan and test environment are prepared, information is backed up, the recovery panel has been provided education and a record is kept of the techniques used for recovery.
Use of resources and having to deal with unpredictable possibilities makes recovery testing a daunting task, but its benefits are worth the trouble. First, recovery testing improves the system quality. It removes risk since one knows that in case of a failure the system will continue to work. Second, recovery testing results in a staff which is educated to perform recovery failure when need arise. Third, recovery testing also fixes problems and mistakes in a system before it has to go live. Finally, recovery testing shows how important recovery is and raises awareness of the fact that long term business continuity relies heavily on recovery management.
In conclusion, recovery testing is used to see how a system behaves when failure occurs. Recovery testing can be a tedious process but shows the efficiency of a recovery plan, educates the staff on how to deal with faults and failures which occur in systems, highlights the importance of recovery at times of crisis to members of the IT and business organizations, and shows how important it is to the long term success of a business to have a recovery strategy in case of a disaster.
About the Author
Danial Schwartz is a content strategist who sheds light on various engaging and informative topics related to the health IT and Q&A industry. His belief in technology, compliance and cost reduction have opened new horizons for people in the health care industry. He is passionate about topics such as Affordable Care Act, EHR,testing, test automation, and privacy and security of data.