Posts

Showing posts from 2012

Medical tests and software tests, part 2

Medical test interpretation may have lessons for the interpretation of complex software integration test suites. In Chrome, these integration tests are often implemented as "browser tests" or "pyauto tests" that involve spawning a running browser in a separate process and controlling it from the test process. These tests are sufficiently complex that they have a significant false-positive failure rate, which makes a report of a test failure difficult to interpret. This is similar to common medical test suites. For example, a "urinalysis" is a set of several tests of urine: concentration, protein level, signs of white blood cell activity, etc. In general medical tests are not ordered unless there is a specific potential illness being investigated, but sometimes they get ordered routinely and come back with unexpected abnormalities. How do doctors deal with this? A common abnormal finding on a urinalysis is a trace amount of blood. Blood can be a sign of

"Bugs" vs. "defects" in software

Errors in software have been called "bugs" since at least the 1940s, when Admiral Grace Hopper discovered a moth stuck in a mechanical relay in an early computer at Harvard University. The term has stuck, with a vengeance. Now even end users of software refer to misbehaving software as "buggy" or worse as "bugged." I dislike the term bug.  To me, the word bug makes it sound like the error crept into the software from the outside. The code was fine, until the bugs got to it. Or worse, it sounds like the software spoiled somehow—we left it on the shelf too long and it got "buggy". But software doesn't work that way. Computers are highly deterministic. If the software misbehaves, to a first, second, and third approximation it's because the programmer made a mistake. It might be an excusable mistake—software is incredibly complex and can be hard to understand. An innocuous change in module A can cause a catastrophic failure in module B. But

Software automated tests versus medical tests

Medical tests can teach us about how we interpret software automated test results. Ever have blood work ordered by your doctor? Say you’re having some stomach pain and it’s worse than the usual upset stomach. If you see your doctor he might order some “liver tests” with names like ALT (alanine animotransferase), bilirubin, and alkaline phosphatase. Each test result will be printed next to a “normal range”, like bilirubin = 1.0 (0.3 to 1.9) mg/dL. Even if one or two of the tests are out of range your doctor may say they are “normal” and tell you your liver is working OK. Doctors have been doing this sort of laboratory test analysis for a hundred years. Can it teach us anything about unit tests, integration tests, and the types of results we encounter in software? Software is probably the most complex thing humans have ever made. If each individual variable was the equivalent of a mechanical device “moving part” then even a medium size program would be more complicated than the Space Shu