We will always strive to ensure that our concurrent systems operate correctly, and maintain a margin of safety even in the face of "expected" faults. In the case of safety-critical systems, not maintaining that margin of safety may lead to injury or death. A famous case of software failure in a safety-critical system is that of the Therac-25. The Therac-25 was designed to deliver controlled doses of radiation to kill cancer cells in a malignant tumor; it was an updated model of an older machine, but with much more control embedded in the software. Because of software errors, six patients died from massive radiation overdoses. There were many causes for this tragedy, but a significant technical problem was the lack of an understanding of basic concurrent system principles.
In the course's myCourses Content area, there are two reports written by Nancy Leveson describing the Therac accidents. The reports detail the actual accidents, the ensuing investigation, and describe the underlying design and process causes for the accidents. The material is mostly the same in both reports, but presented in a different order and with somewhat different focus. The diagrams are clearer in the Safeware version of the report.
While reading this report, keep an eye out for issues that are related to concurrency. There will be a short, individual quiz at the start of the class when you do an exercise about the Therac accidents. While taking the quiz, you can have access to the reports, and any notes that you may make during your reading. The individual quiz can be worth up to 5 points of extra credit for Exam 2.
In the Safeware version, Section 2 provides important background material in the Therac machine and its operation. The sections describing the software bugs are very important for answering the first question below. You will need to quickly read the sections describing each accident and how it happened. Section 4 is important for question 2 on engineering process, in particular, and question 1, in general. In the IEEE Computer version, "Genesis of the Therac-25" provides background material. All of the sidebar items have important information for question 1 below, and some for question 2. "Accident History" describes the accidents, and within that section "Related Therac-20 problems" starting on page 29 and "Yakima Valley Memorial Hospital 1987" on page 33 have technical discussion related to question 1. "Lessons Learned" describes a number of areas in which modifications should be made and is mostly relevant to question 2 below.
The class exercise will be to answer the following questions about the accidents.
Each group is to prepare a PowerPoint presentation discussing each of the individual software defects, and the software process issues that contributed to the accidents. Aim for no more than three slides for each of the three areas, for a maximum of 9 slides in one PowerPoint file.
Three teams will be selected to make a presentation to the class, and lead a class discussion, on one of the three questions above (two defects, engineering process).
Before the start of the presentations, deposit your single PowerPoint file into the Therac Case Study class exercise dropbox.