Summary of the book — Perfect Software: And Other Illusions about Testing

35 min readMar 27, 2021

Jerry Weinberg and his writings had a profound influence on me over the years. One of his books “Perfect Software: And Other Illusions about Testing” is a well known classic in the field and a must read for anyone involved in Software development life cycle. One of my favorites book, I penned this long summary, and I do hope as a reader you will enjoy it. If you do, I’d love you to take a look at this and other masterpieces from Gerald (Jerry) Weinberg. His books changed the way I think, work and live! Miss him!

Perfect Software: Summary

Chapter 1. Why Do We Bother Testing?

Testing has its roots in psychology, in the study of the behavior of the human brain. If humans were perfect thinkers, we wouldn’t need to test our work. But we are imperfect, irrational, value-driven, diverse human beings. Therefore, we test, and we test our testing.

Following reasons reasonably prove why we should bother testing:

1. We think imperfectly

If your thinking were perfect, you would not need testing. But if you say, “Well, before I invested my life savings, I might want to do a little testing,” then you won’t always be upset by the very idea of testing.

2. We need to make decisions about software

At the simplest level, people are doing testing all the time. We test to be sure our software isn’t rejected because of any of the following reasons:

• it fails to satisfy an essential customer constraint;

• it doesn’t do the things people want it to do;

• it imposes unacceptable costs and/or requirements on our customers; and

• its failures destroy customer confidence.

3. Decisions can be risky

If the decisions concern software that is being built, borrowed, or bought, testing may be one of the ways to obtain that risk-reducing information. Good testing involves balancing the need to mitigate risk against the risk of trying to gather too much information. Before you even begin to test, ask yourself: What questions do I have about this product’s risks? Will testing help answer these questions?

If you don’t have questions about a product’s risks, then there’s no reason to test. If you have at least one such question, then ask: Will these tests cost more to execute than their answers will be worth?

4. To reduce risks

Different projects may mean different perceptions of risks and thus different sorts of questions that testing might be designed to help answer. For instance, a few basic questions we might ask are:

· Does the software do what we want it to do?

· If the software doesn’t do what we want it to do, how much work will be involved in fixing it?

· Does the software not do what we don’t want it to do?

· Does the software do what we intended?

· Will the software do what our customers want?

· Will the software satisfy other business needs?

· What are the likelihood and consequences of failure?

Since testing provides information that can reduce risk of failure, the amount of testing we do is influenced by both the likelihood and consequences of failure, as suggested by the following table:

• Low Likelihood of Failure, Low Consequences of Failure — →Least Testing

• Low Likelihood of Failure, High Consequences of Failure — →Moderate Testing

• High Likelihood of Failure, Low Consequences of Failure — →Moderate Testing

• High Likelihood of Failure, High Consequences of Failure — →Most Testing

Chapter 2. What Testing Cannot Do

You don’t have to test if, for any reason, you’re not going to use the resulting information. And, if the information isn’t going to be relevant or reliable, you’d better not use it, so don’t bother to buy it in the first place. Here is why:

1. Information doesn’t necessarily help reduce risk

Managers in the software business frequently have to make risky decisions, which can be made less dangerous if based on answers to questions such as the following:

• When should we ship?

• Should we cancel the project or continue the project and add more resources?

• Should we reduce the scope of the project?

• Should we attempt to bring the product to a wider market?

• Should we reduce the price?

More often, such decisions are made after the decision-maker has obtained some bit of information, while resolutely ignoring other information that might be obtained. Since testing is an information-gathering process, there’s always a question of whether it’s worthwhile to pay for more testing.

2. Testing isn’t free.

Sometimes, the information produced also adds to the risk. If the developers have information that something isn’t working well, they may want to spend time fixing it while the decision-maker may want to ignore it. If people sue you because of an error in software you developed and sold, they can subpoena the records of your development process. If the records indicate that your testers found this bug and you didn’t fix it, you’re in more trouble than if they’d never found it in the first place. Sometimes, ignorance really is (legal) bliss.

3. We may not be using the information we’re paying for.

There’s never an easy answer to the question “Should we do more testing?” because information can guide risk reduction, but doesn’t necessarily do so. Sometimes, when testers don’t generate reams of paper, it’s easy to decide you’re not getting your money’s worth. You fall victim to equating information quality with data quantity (bug counts, pages of test reports, number of test cases run, and so on).

To obtain your money’s worth from testing, you will have to continue the process of questioning what’s going on beyond what’s stated explicitly in test reports. If you don’t pursue the questions beneath the surface, you generally won’t obtain your money’s worth out of your investment in testing. So, if you’re not going to use the information generated, don’t pay for tests.

4. Our decisions are emotional, not rational.

People have an emotional investment in not finding out that they’ve made mistakes. Some managers don’t want to know that their project is headed down the slippery slope to failure. Some developers don’t want people to know that their code is buggy.

5. Your product may not be ready for testing.

Following questions will show you when you’re not ready for test:

• Is there at least one question about your product that testing can help you answer? If yes, do you really want to know the answer?

• Are you willing to evaluate or act on the answers to your questions?

• Can you agree with testers in advance as to what constitutes a passed test?

• Do you think the outcome of testing will make your decisions for you?

• Conversely, is there any possible test outcome that would make you change your decisions?

Chapter 3. Why Not Just Test Everything?

So why not just test everything? Because it is simply impossible. Why is it impossible? First of all, the human brain not only makes mistakes, its capacity is finite. Second, nobody lives forever. Besides, for most situations, it would cost too much — since the number of possible tests for any given program is infinite. Let’s see why.

1. There are an infinite number of possible tests

Let’s think of the simplest program we could conceive of to test: a program whose function will be to respond to tapping on the space bar by putting a “Hello Dolly!” message on the screen. What would we test for? To keep it simple, we’d want to test that every time we pressed the space bar, we got the “Hello Dolly!” message, and every time we pressed any other key or combination of keys, we never got anything on the screen.

If you can’t see inside the program, how many test cases would you need to execute in order to bet your life that you’ve tested all possibilities? Identify the number of test cases, write that number down, and then read the next paragraphs to see if you got it right. Since you can’t see inside the program, you have no idea what bizarre conditions the programmer might have set up. For example, the program might be designed so that everything looks normal on the outside unless the user types an extremely unlikely sequence of keystrokes.

Simply put, testing can reveal the presence of bugs, not their absence.

2. Testing is, at best, sampling

Since we can’t test everything, any set of real tests is some kind of sample — a portion that is in some way representative of a whole set of possible tests. We hope it’s a good representative, but that brings up the question, “Good for whom?” Fundamentally, sampling is yet another psychological process — and an emotional one. A sample that satisfies one person may not satisfy another in the slightest.

3. The cost of information can exceed the cost of ignorance

The impossibility of exhaustive testing squeezes us between two difficult and simultaneously

desirable objectives:

1. We want to cover all interesting conditions.

2. We want to reduce the set of tests to a manageable, affordable level.

Consider the number of times testers stumble across a critical bug when they aren’t looking for it. They find it because they are lucky (or unlucky). The bug just appears. But is it pure luck? Is there some psychology to figuring out how to find more of these surprise bugs?

4. We can obtain more information with less testing — perhaps

These days, people are more concerned about reducing the set of tests to a manageable, affordable level. They’re asked to subsist with smaller teams and greater responsibilities.

Consider this dilemma: A company has to downsize from a team of thirty testers to three because of bad economy, but they are still supposed to ‘ensure’ the product. How do you decide what to test? One argument would say that testers can’t “ensure” anything at all so they shouldn’t even try. But that argument won’t persuade an executive staff struggling to keep a firm afloat. Admittedly, a downsized team can’t do everything the larger staff used to do, but it can identify the tests that make the best use of limited resources.

Chapter 4. What’s the Difference Between Testing and Debugging?

The practice of testing has often received a bad reputation because of confusion over exactly what it is and what it’s supposed to do. If a manager wants to make any progress understanding and improving testing, he or she is going to have to clarify the process. Paradoxically, that clarification may be more difficult if the manager has a great deal of experience developing software. Let’s see how this works.

Early in her career, Jane, a project manager, was a one-woman development and support team in a tiny software consulting company that wrote custom applications for small businesses. Her job required that she wear many hats. On a typical day, Jane did all of these tasks, and that’s where she learned the most about testing. Let’s look back and travel with her through part of such a day.

· Testing for discovery: Jane is testing: performing actions on the software under test that have the potential to discover new information.

· Pinpointing: Jane is still testing: performing actions on the software under test that have the potential to discover new information. In this case, the new information is that the bug is repeatable. In determining that the bug is repeatable, Jane’s also beginning another process. Jane is now pinpointing: isolating the conditions under which the bug occurs.

· Locating: Jane is now locating: finding the location of the bug in the code so that she can fix it.

· Determining significance: Jane is now determining significance: balancing the risks of repairing versus not repairing this bug in the code.

· Repairing: Jane is now repairing: changing code to remove the problem.

· Troubleshooting: Jane is troubleshooting: trying to remove and/or work around obstacles to make the software do what it’s supposed to do.

· Testing to learn: Jane is testing again, but she’s not trying to find bugs in the graphics package. We could say she’s testing to learn: an essential skill that might otherwise be called hacking, reverse engineering, or playing.

1. Task-switching

Jane easily shifts among all the aforementioned activities. As long as she continues to serve her customers, it doesn’t really matter which task she performs at any given time. However, in larger organizations with dedicated testers and/or customer support personnel, confusion about the differences among all these testing processes can lead to conflict and failed projects.

2. What happens to testing as an organization grows?

Jane has left the tiny software consulting company and has moved to a much larger firm as project manager for one of its commercial software products. In a larger company, the problem is that no one in the organization agrees to what extent testers are expected to pinpoint, so conflict is inevitable. The distinctions among testing for discovery, pinpointing, locating, determining significance, repairing, troubleshooting, and testing to learn become even more important when we are under pressure to speed up testing. Sometimes it’s difficult to know exactly which one task we’re doing. When we try to learn whether a bug is repeatable, for example, are we testing or pinpointing?

As a project manager, Jane can end the confusion as to who is responsible for what tasks by avoiding lumping these types of activities into one big blob labeled Testing.

3. Make the time-limit heuristic a management mantra — but adjust it as needed

A heuristic that helps untangle who does what for how long states the concept simply: Nobody on a project should carelessly waste the time of anyone on the project.

Jane applies this principle by teaching her testers to limit their time when investigating a bug to no more than ten minutes before alerting the programmer. She understands that what may be an obscure problem to software tester may be an obvious problem to the programmer, in which case the tester would be wasting his own time to consume more than ten minutes before seeking possibly more knowledgeable assistance.

Chapter 5. Meta-Testing

The job of software testing, we know, is to provide information about the quality of a product. Many people believe that the only way to get such information is to execute the software on computers, or at least to review code. But such a belief is extremely limiting. Why? There’s always other information about product quality just lying around for the taking — but only by managers who are watching, and who recognize it as relevant. Because of their psychological distance from a client’s problems, external consultants are often able to see information that escapes the eyes and ears of their clients.

1. We have specs, but we can’t find them

I was asked to help an organization assess its development processes, including testing. I asked the test manager, “Do you use specs to test against?” He replied, “Yes, we certainly do.” “May I see them?” “You could,” he said, “but we don’t know where they are.” The inability to find key documents pretty much is Strike Three against an organization’s development process. I didn’t need details beyond this one piece of meta- information (information about the quality of information) to know that this organization was a mess.

2. We didn’t find many bugs, though we didn’t really look

Irene was asked to help improve the testing of a product with 22 components. The client identified “the worst three components” by the high number of bugs found per line of code. Irene asked which were the best components and was given the very low, bugs-per-line-of-code figures for each. When she examined the configuration management logs, she discovered that for each of these three, more than 70% of the client’s code had not yet successfully been checked in for a build.

Here, we easily learn how clueless the client is about its measurement system. They have no idea how to measure quality, so they aren’t likely to arrive at quality.

3. We modify our records to make bugs look less severe

When Linda was invited into a company to help the chief development manager evaluate his testing department’s work, she noticed that the severity counts had been covered with white-out and written over. Under each white-out was a higher printed number each with highest severity count. Under each lowest severity count was a lower printed number. Puzzled, she asked the development manager for an explanation. “Those are corrections by the product’s development manager,” he explained. “Sometimes the testers don’t assign the proper severity, so the product development manager corrects them.”

The fact that this organization hasn’t spent a moment’s thought is strong evidence that its trouble goes much deeper than simply falsifying records.

4. It’s not in my component, so I don’t record it

I was watching a tester running tests on one component of a software product. I noticed an error in one of the menus and pointed it out to the tester. He navigated around the error but I asked how he would document the error, he said, “Oh, I don’t have to. It’s not in my component.”

This kind of falsification is likely to be even more serious than the previous case. That case involved one manager who was falsifying records. This case may prove to be a widespread attitude in the culture of the organization — a defect that will be much more difficult to uproot.

5. We don’t test the worst components because it takes too long

Called in to evaluate an organization’s process, I asked the development manager whether the developers on her project unit-tested their code. “Absolutely, they test almost all” she said. “Almost all?” I asked. “Which code you didn’t unit-test?” “Oh, some of the code is late, so of course we don’t have time to unit-test that or we’d have to delay the start of systems test.”

Here we see a sign of an oblivious manager. Why is nobody asking why certain code is late? If they asked, they would find out that it’s late because the developers had trouble making it work. If any code was to be tested most thoroughly, it ought to be the latest code.

6. If our software works okay for three users, obviously it will work okay for a 100

When I asked several testers about scheduling performance testing, they replied, “We’ve already done that. We ran the system with one user, and the response time was about ten milliseconds. With three users, it was thirty milliseconds.”

“But the system is supposed to support at least a hundred simultaneous users. So what response time did you get when it was fully loaded?” I asked.

“Oh, that test would have been too hard to set up.”

This group of testers was committing The Linearity Fallacy (a form of The Composition Fallacy, which assumes that two small systems joined together make just twice as big a system, but no more). But the people in this organization don’t understand The Composition Fallacy, so I would predict that their testing situation will grow worse and worse as their system grows bigger.

In short, you can greatly improve the efficacy of your testing, and lower your costs, if you learn to use meta-information — information about the quality of information.

Chapter 6. Information Immunity

Although the purpose of testing is to provide information, people often view the information as threatening. Consider the underlying fears reflected in the following comments:

Project Manager: “With all these bugs, I won’t make my schedule.”

Developer: “That stupid error shows people I’m not as good a programmer.”

Marketing Manager: “This product is so full of bugs, it isn’t going to sell very well.”

Tester: “My boss is going to bite my head off if I report this error this late in the game.”

Because information can be so threatening, we’ve all developed an “immune system” that tends to protect us from information we don’t want to hear. So what does a manager have to watch for?

1. We repress the unacceptable

Repression is denying or overlooking what we deem to be unacceptable thoughts, feelings, and memories — for example, while condemning another’s late arrival, we overlook the fact that we have been late on similar occasions in the past. Every form of defensive behavior probably involves repression in some form or another. Repression can be conscious or it can be unconscious, as when people shade the truth to lead themselves or others away from perceived danger.

2. We project our own negative qualities onto other people

Projection, when negative, is criticizing others for having the same qualities we dislike in ourselves. Following are some common displacement complaints we hear when a developer is faced with an unacceptable problem from a tester.

First, there is displacement onto the tester:

• “If you can’t reproduce it, I can’t do anything about it.”

• “You’re being too picky.”

If there’s no obvious tester involved, a developer may displace blame and fear onto other developers:

• “It’s their code.”

• “It’s not my code.”

Developers may also displace blame and fear onto their managers:

• “They think new features are more important than working features, so I’m putting in new features now.”

• “I have to go to too many management meetings, so I don’t have time.”

3. We overcompensate for our self-perceived deficiencies

Overcompensation is exaggerating our attempts to compensate for some real or imagined personal deficiency.

Avery tested an administrative tool that configured backups. When creating his test cases, he forgot to include cases that tested backups to CDs. When this omission was pointed out, Avery overcompensated by creating test cases for more than 140 different pieces of hardware to which backups might be made — including paper tape, which hadn’t been used in decades.

4. We become compulsive when we feel we’re losing control

Compulsiveness is being unable to depart from a counterproductive behavior pattern — for example, a person is incapable of allowing small deviations from a defined process.

Therefore, to assess testing information, you must take into account aforementioned emotional defenses. By remaining vigilant, thoughtful, and pragmatic, you can help diffuse emotional chaos and prevent illogical processes from undermining your testing efforts.

Chapter 7. How to Deal with Defensive Reactions

You don’t have to be a psychologist to deal more effectively with defensive reactions that make people immune to important information. How would you know that a reaction is defensive?

There are some heuristics that help. For instance, does the reaction seem out of proportion to its ostensible cause?

1. Identify the fear

From the outset, you need to understand that fear drives defensive reactions — although the underlying fear will generally not be visible to you. Nevertheless, it’s there. See whether you can identify what a person fears, then see what happens when you find a way to reduce that fear.

2. Practice, practice, practice

With sufficient practice, you’ll grow much better at recognizing defensive reactions and dealing with them. Some of them are so common that you’ll hear them repeatedly, such as,

• “It’s for the user’s own good.”

• “It’s too [risky] [costly] [hard] to fix.”

• “No one will [notice] [care].”

What makes these defenses effective is that there’s some truth in each of them, some circumstances in which they might be rational, not rationalizations.

3. Test yourself

Practice on how you would respond to the following situation described to me by a fellow consultant:

Test:

1. Lower the virtual memory to the minimum needed to run the OS.

2. Launch a document in the application.

3. Edit the document.

4. Save the document.

Result: Crash, requiring reboot of the entire system.

When given the bug report, the developer responded, shaking his head vigorously, “Oh. That’s not a bug.” “What? It crashes.” “Look,” shrugged the developer. “It can crash or it can corrupt your data. Take your pick.”

This is the common defensive reaction we call “It could be worse.” And, yes, it could be worse. This might be the best developer in your organization.

Chapter 8. What Makes a Good Test?

You’ll never know for sure whether your testing was done well, but there are many ways to know or estimate if it was done badly.

1. You may want to insert bugs intentionally

Sometimes, you can gain quantitative estimates of how many problems might remain in software by seeding (or “bebugging,” a term I believe I coined — and wrote about — in The Psychology of Computer Programming). Insert known bugs without telling the testers, then estimate the number of remaining unknown bugs by the percentage of known bugs they find.

For bebugging to work at all reasonably, the distribution of inserted bugs must closely match the (unknown) distribution of the unknown bugs. Be careful, though — it doesn’t give a great deal of reliable information even if all the known bugs are found.

2. You can estimate not-badness

At its deepest technical level, testing involves some rather esoteric logic. To assess whether these esoteric activities have been done well, most managers must rely on second and third opinions from independent experts. There are, however, many assessments of not-badness most managers can make themselves by answering the following kinds of questions:

• Does testing purport to give me the information I’m after?

• Is it documented? If not, is it observed, reported, or performed by someone you trust?

• Can I understand it? If you can’t, how can you possibly know whether it’s good or bad?

• Is it actually finished? Do you have ways of knowing what was actually done?

• Can I tell the difference between a test and a demonstration? Demonstrations are designed to make a system look good. Tests should be designed to make it look the way it truly is.

• Are trends and status reports overly simplistic and regular? If test status reports show extremely predictable trends, testing may be shallow or the reports may be leaving out something important.

• Are there inconsistencies between different kinds of test activities?

Chapter 9. Major Fallacies About Testing

Let’s start our exploration of why bad testing occurs by looking at some fallacies in action.

1. The Blaming Fallacy

The more time and effort someone spends looking for someone else to blame for a problem, the less the chance of solving the problem. Fallout from The Blaming Fallacy is quite common: a manager will pass the blame on to the first vulnerable person he sees. The first vulnerable person would, in turn, pass the ball to another vulnerable person. And it goes on and on.

2. The Exhaustive Testing Fallacy

It’s never possible to test everything. The only real kind of exhaustive testing is when the tester is too exhausted to continue.

3. The Decomposition Fallacy

The Decomposition Fallacy is when you expect the whole system to work when you have conducted your tests on the parts of the system. Unfortunately, it does not work like that.

4. The Composition Fallacy

The Composition Fallacy means that that you expect the parts that make up the system will be tested if you test the system as a whole. It doesn’t work that way. Yes, if a module is so broken that the system can’t function, we may see a problem. But a system test won’t exercise the modules the way testers should in order to say they really tested them.

So, learning to recognize a handful of major fallacies about testing could eliminate half the gross mistakes project managers make.

Chapter 10. Testing Is More Than Banging Keys

To qualify as a test, an action has to seek information that will influence action, whether or not it involves banging on keys.

1. Computers can’t read minds

Computers do what you tell them to do, whether or not that’s what you really had in mind.

2. Coverage tests do not prove that something is tested

Just because you can show that all parts of the code have been touched by some test or other, you can’t say that those parts have been thoroughly tested. Nor can code coverage tell you that all functions have been thoroughly tested. For that to be true, the tests have to be analyzed for relevance and comprehensiveness.

3. Process documents are not processes

A process is what you actually do. A process document describes what someone would like you to do, ideally. They rarely coincide exactly, and sometimes don’t overlap at all. Most processes aren’t documented at all, which generally is a good thing; otherwise, we would be crushed by the weight of endless documents. It is better to spend your valuable time observing processes — what people actually do. You can use the time you save to decide which few processes would be best backed up by precise documents.

4. Documents are not always facts

The previous point is a general case for such special cases as confusing test scripts with tests, confusing test reports with tests, and confusing requirements documents with requirements.

5. Don’t be micromanager

Failing to test your testers, or testing them too much are two extremes you shouldn’t be at. Everyone’s job needs to be evaluated, but not incessantly. At some point, you have to trust people to do their jobs without a boss standing over them.

6. Demonstrations are not tests

You can be on the sending or receiving end of this one. I’m not sure which is worse — fooling others or fooling yourself.

Chapter 11. Information Intake

Intake is an active process. Try to be aware of factors that limit your intake, sources of information, and how data may be flavored by biased meaning.

1. Use the Satir Interaction Model to unravel communications

The full Satir Interaction Model was designed to account for what happens when people interact.

You can use the model to improve how you receive and give information about testing, such as when you are doing activities like the following:

• observing the behavior of systems under test

• interacting directly with others

• writing and understanding testing reports

• watching users at work, as in usability testing or beta testing

• presenting observations and conclusions

• observing myself at work, in order to improve

The Satir Interaction Model breaks down any communication process into four major parts: intake, meaning, significance, and response.

Intake: During the intake part of the process, a person takes information from the world. Intake does not “just happen”; it also involves a selection process. Whether or not we recognize that we are being selective as we observe, we are actually exercising a great many choices about what we see and hear.

Meaning: During the meaning part of the process, someone considers the sensory intake and gives it a meaning. Some people believe that the meaning lies in the data, but the same intake can have many different meanings. The meaning process also interacts with the intake process. For example, certain inferred meanings may lead us to take in more information or different information.

Significance: Data may suggest certain meanings, but never their significance. Only the receiver can make this determination of the significance. The world we perceive would be an overwhelming flood of data if we did not categorize and select information in the context of its significance.

Response: During the response part of the process, a person formulates an action to take. Software testers and their managers are observers, but they are never passive observers. They may not respond immediately to everything and anything they observe, but they do sift through

and assign priorities to observations according to how important they are to the observer, and store them away to guide future actions. Testers are not interested in observation without reference to possible action.

2. The source of the data influences intake

People tend to make different responses to information received from different sources — often predetermining the importance of a message based on the sender. If you’re having trouble getting people to pay attention to your test reports, try sending them via someone whose reputation will make the recipients change or drop their intake filters.

If you discover that others are more willing to accept information you have to convey when another person delivers it, think of the knowledge as a test result. Continue testing by asking yourself, “What have I done in the past that would make people think I’m not a reliable source?”

3. You may convey more information by reducing the number of tests

It may seem paradoxical, but by reducing the amount of data generated by tests, you may actually gain more information. So, narrow your set of potential tests by asking, “Which tests will have the greatest impact on further testing and development?” By learning the answer to this question, you can often determine what information you should find first.

4. Don’t confuse interpretation with intake

If you feed people a random bit of data, they’ll struggle to divine meaning from it — and they’ll move from the intake phase to the meaning phase so fast they won’t be aware of doing so. Some statements that appear to convey straightforward facts are actually subject to interpretation by the receiver.

Following are a few examples I’ve gleaned from clients’ test reports:

• “There were too many bugs.” Sometimes one is too many, and other times one hundred is okay.

• “There were only four bugs.” Is “only four” better or worse than expected?

• “There were four bugs.” “There were” is an interpretation. The speaker doesn’t know how

many bugs “there were,” only how many were found. Say, “We found four bugs,” and say under what conditions they were found.

• “Tests show that the project has to slip.” Tests themselves don’t show any such thing, but someone has decided that there’s only one meaning to the tests, and only one significance to that meaning.

Chapter 12. Making Meaning

Data do not speak for themselves, nor are they unambiguous. It is up to human beings to attach meaning to data they take in, and each person does so differently.

1. Use the information you have

Generally, you can’t make meaning of a test report without considering information that’s not in the report. At least start with the information you already have before you blindly ask for more.

Similarly, take care to make the best use of the information you already have.

2. Use indirect information

When properly documented, bug reports contain much more information than just the location of a bug and how to replicate the problem. Pay attention to the extra information that testers sometimes consider to be bureaucratic overhead: the date and time each test was run; the date and time each report was submitted; who ran each test; who submitted each report; unambiguous identification of the software and version(s) tested, the operating system(s), the browser(s), the computer(s), and the source language(s); references to earlier bug reports; and all sorts of free-form comments. Organize the collection and entry of this data to be as easy and as automatic as possible, or human beings won’t tolerate it.

3. Sometimes it’s better to be imprecise

Paradoxically, when you’re trying to communicate meaning, it’s sometimes more effective to use ambiguous language than to be very precise. That’s because people often skip right past the opportunity to determine meaning and go straight to assigning significance. Their instant emotional reaction to an unconsciously assumed meaning then prevents them from hearing the meaning you intended.

For example, you may want to use inexact words, like “bug” or “issue,” if you want to be understood by people who cannot disconnect from the blame connotation of “error” or “fault.”

If you want to communicate effectively in a situation in which meanings can differ, think about the receiver.

Chapter 13, Determining Significance

Our emotions carry information about how important things are. If we pay attention to emotions, listen, and address important matters before unimportant matters, we’ll be doing the best we can with the data we have.

1. Don’t put too fine a point on it

The value of a human life is always subjective; in fact, so are all significance measures. Many companies perform numerological gyrations to try to make their significance judgments seem objective. Don’t be fooled — either by these folks, or into tricking yourself that significance is objective.

To counter these pseudo-objective schemes, examine the meta-significance of a situation. Start by asking yourself, “How good are my estimates?” Instead of spending too much time in determining your objective scales to gauge the significance, simply reduce your criteria to four:

• Level 0: This issue is blocking other testing.

• Level 1: Our product cannot be used if this issue isn’t resolved.

• Level 2: The value of our product will be significantly reduced if this issue isn’t resolved.

• Level 3: This issue will be important only if there are large numbers of similar issues when the product is shipped.

These four categories provide an adequate level of detail for testers to use when determining significance from the testing point of view — one of several perspectives that the deciding executive will want to consider.

2. Address significant problems first

Although the testers’ ideas of significance shouldn’t determine what is done about a found failure, they should influence the order in which various tests are executed. If test-first is a good idea, then significance-first is even better. Why? You might perform an infinite number of tests, but if you actually perform even an enormous number of tests, you would likely lose the valuable information among all the worthless crud.

Chapter 14. Making a Response

If a project hasn’t been managed well before testing, most good responses will no longer be available. In many cases, there is no response that will save the project — except starting over and doing it right from the beginning.

1. Is it bad luck or bad management?

Some years ago at IBM, several of us did a study of a dozen failed projects, looking for commonalities. Our main finding was that each of these million-dollar projects failed because of “bad luck” — a fire, an earthquake, an influenza epidemic that put half the people on the project out of commission for weeks. We wrote a report on these findings and submitted it to a journal for review. One of the reviewers asked, “Where are your controls?”

We went back and found a dozen comparable projects that had succeeded. Lo and behold, each of these projects had also experienced “bad luck,” but these projects had not failed because of it. When fire, flood, or earthquake destroyed their software configuration, they simply recovered their safe backups and returned to work. When half the team members were out sick, pair programming and technical reviews provided the redundant knowledge that allowed them to keep critical-path items moving along. In other words, it wasn’t bad luck but bad management that killed the first dozen projects.

2. Why do projects rush at the end?

“Software projects are always in a rush at the end,” they say. “That’s just the nature of software.” Though there may not be a “nature of software,” there may be a nature of some badly managed software projects — with characteristics such as in the following sequence — that leads them to be in a rush at the end:

(1) The managers don’t understand the difference between testing, pinpointing, and debugging.

(2) They believe that testing caused most of the trouble in projects they’ve experienced.

(3) They tend to postpone all forms of testing as long as they can.

(4a) Because they’ve chosen their processes to postpone testing, testing is the first time they can no longer pretend things are going well.

(4b) They suffer from information immunity, they can pretend things are going well even after they’ve done some testing.

(5) For them, everything seems to be going “smoothly” through all the early stages of the project.

(6) Because managers have stalled, bugs reveal themselves in late testing.

(7) Because the entire system is now lumped together, many of these bugs are difficult to pinpoint.

(8) Because developers working under deadline pressure make new errors when trying to fix newly found errors, tempers flare, minds numb, absenteeism mounts, meetings proliferate, and strategy backfires.

3. How should you respond when close to the end?

Instead of a simple testing block in your plan, you need something that looks like this:

(1) Stop all testing and start planning the endgame.

(2) Rank the remaining known failures by significance.

(3) Estimate how many of these failures your organization can reliably fix in the time remaining.

(4a.) Drop unfixable features from the shipment plan.

(4b.) Or if step 4a requires you to drop something that makes the product unacceptable, cancel and reschedule shipment.

(5) Proceed to remove the bugs in order of significance as identified in step 2.

4. Determine whether you’ve passed the point where you can make a difference

Perhaps the most important response to test information is making a decision about whether any response can improve the software itself. To do this, you may have to wait a while for more information. Waiting is an acceptable response, as long as you make clear why you’re waiting.

Never announce that you’re “just waiting,” but always refer to a specified event or time, or both.

But what if you can wait no longer? You can ship the product as is, you can feature the failures, you can warn customers about failures, you can withdraw parts, you can withdraw the whole product, or you can start over from the point at which things began to go wrong. The most drastic response would be to declare the project bankrupt and start over from scratch. This response may cost you your job, but save your life. Whether or not it costs you your job, you can begin the next project using all the learnings you’ve gleaned from this one. You’ve paid the tuition; the learnings are optional.

Chapter 15. Preventing Testing from Growing More Difficult

If you’re going to do your next project better than you did the last, a good place to start is by understanding why it’s actually going to be a more difficult project than the previous one.

1. Keep systems as small as possible

The first counteraction to escalating testing costs is to keep systems as small as possible (but no smaller). Keeping requirements under control is largely a management job, outside of the development project itself. It is a management failure when this is not done well.

Uncontrolled growth of a system’s requirements is so common that software analysts, designers, and developers have several names to describe the phenomenon: requirements leakage, requirements creep, requirements drift. It’s just all too easy to agree to “add just one more thing” to a contemplated or ongoing project, especially if you don’t have a process for estimating what nonlinear effects the addition will have on error cost.

2. Keep your model of “system” expansive

You might be successful in building a small application but run into trouble because the run-time or browser or operating system or network is big — and hairy. Further trouble may come from human beings who will interact with the application — they’re the hairiest systems of all. Be vigilant in checking how the simple system you’re developing is intertwined with larger, desperately complex systems.

3. Build incrementally in isolated components with clear interfaces

The size of a program is not defined in terms of just the number of lines of code. Two programs of the same physical size can differ greatly in their internal complexity, which, in the end, can be a dominant factor in how difficult the testing effort will be. To help keep testing under control, you can take steps to control complexity.

For instance, as suggested by the not-all-at-once strategy, you can build incrementally, with each piece built, tested, and fixed before the next piece is attempted. The key is to build pieces small enough so that you have a high probability of leaving no bugs in the finished product.

Although poorly organized and executed testing can certainly prolong the effort, there are intrinsic system dynamics that make testing and fixing take longer as products grow larger and more complex. If you understand these dynamics, there are ways they can be countered, up to a point.

Chapter 16. Testing Without Machinery

The number one testing tool is not the computer, but the human brain — the brain in conjunction with eyes, ears, and other sense organs. No amount of computing power can compensate for brainless testing, but lots of brainpower can compensate for a missing or unavailable computer.

1. Testing by machine is never enough

The simplest way to put lots of brainpower to work testing a system is the technical review. A technical review takes place when peers sit down together as a group to analyze and document the good and bad qualities of some technical product, such as code, design, a specification, or a user manual.

The choice between machine testing and technical reviewing is not either-or. Technical reviewing is an especially powerful testing technique when combined with machine execution testing because the two approaches tend to detect different types of bugs. Whereas either method alone — applied to code and done well — may find 60 to 70 percent of bugs, when combined they often may find 90-plus percent.

2. Instant reviews

Over the years, the most common complaint I have heard about technical reviews is that they take too long. One reason organizations think that is that their people don’t know when a review has been completed. In many cases, a review actually may be finished before these inexperienced reviewers even recognize that it has started. I call these types of reviews, which occur when someone gives a reason why a work product cannot or should not be reviewed, “instant reviews.”

3. Worst-first reviews establish bug severity

Some of the same instant reviews can be used once you get inside the review room. In general, the quickest reviews are done by working on a “worst first” basis. So, you simply ask each reviewer to start with the worst problem he or she has found and work from there down to relatively minor problems. If, for example, a program is using a defective algorithm, there’s no sense worrying about spelling errors in the interface.

The worst bugs from a tester’s perspective are those that block testing. Certainly, the worst bug for testing is any bug that blocks you from testing all or part of the object under test. If a reviewer says, “I cannot understand this code well enough to be sure it works,” this is a blocking issue, and thus has highest significance. If you’re puzzled by what to do in such a situation, translate the objection into testing terms. For example, “We can’t review the product because the customer doesn’t want to pay for it” becomes “We can’t test the product because the customer doesn’t want to pay for it.”

4. Testers make valuable reviewers

Managers frequently tell me that there’s no need for testers to participate in technical reviews, explaining their viewpoint thusly: “Because testers don’t write code, they can’t find bugs.” My response is, “You don’t have to know any programming language to find logic bugs, major design flaws, poor human-machine interfaces, and many other difficulties.”

The greatest single benefit that reviews have to offer is learning, in any or all of the following ways:

1. By observing the patterns of flawed thinking that developers are likely to produce, testers learn to compose better tests.

2. By reviewing specs early, testers get a head start on the scope of their test plans.

3. By gaining familiarity with designs, testers accelerate the process of detecting bugs and then helping to pinpoint them.

4. By participating in reviews, testers learn how to be better reviewers of their own test cases.

Chapter 17. Testing Scams

Here are some warning signs that you’re in danger of falling prey to a testing scam.

1. We’ll sell you a magic tool

Here’s the secret about tools: Good tools amplify effectiveness. If your testing effectiveness is negative, adding tools will only amplify the negativity. Any other claim a tool vendor or manufacturer makes is, most likely, some kind of a scam.

2. With all these testimonials, it must be good

Often, a vendor doesn’t even need to use a demonstration to scam you. Sales literature frequently contains pseudo-test results that appear to contain information, but which really only identify charlatans who took a fee or some payment in kind for the use of their names.

If you want reliable tool or software testimonials, get them in person. If possible, visit referenced

customers, watch how the product is used in their normal process, and talk to people who actually do use it.

3. We can scam you with our high pricing

By selling only to the big wallets, for a big price, vendors also enlist the effect of cognitive dissonance that prevents customers from complaining about poor performance. Victims of The High Price Scam may keep complaints about the product to themselves because they fear they would look stupid for having spent so much money on a useless tool. Moreover, testers who are forced to use the tool despite its not doing the job keep quiet to protect the wisdom of their boss’s decision.

4. Our tool can read minds

This is The Omniscient Tool Scam. The tool operates with such authority that the user assumes that it is actually testing at the same level of detail as a human operator. Some tool vendors encourage this mentality. After all, their business is selling tools. They sell their tools on the premise that the tool will test the software with minimal human involvement. But beware: Tools are just tools. They still need humans, thinking humans, to operate them meaningfully.

5. We promise that you don’t have to do a thing

Have you ever had an offer like the following from a testing service? “We’ll handle all your testing.”

How tempting is this? Hand over your software, and all your testing headaches go with it. You’ll never have to worry about that pesky testing stuff again. The people in the testing service may, in fact, be capable of doing a good job. However, just like internal testers, they need to know what information they’re supposed to be gathering. Someone needs to think through the value the testing service offers, the information it can reasonably be expected to uncover, and then manage the contract to make sure the testing service is indeed providing the kind of information the project management team most needs.

6. Here’s how to avoid scams

One way to recognize scams is that they always promise something for nothing. The Performance Benchmark Scam promises that a vendor will provide lots of testing work for free. The “You Don’t Have to Do a Thing” Promise suggests that it’s only a matter of money, which is one version of nothing.

The same rule of thumb applies to testing scams as to con artists and telephone solicitors: If it sounds too good to be true, it’s probably not true.

Chapter 18. Oblivious Scams

Scams usually arise from optimism when we unconsciously ignore thorny bits of information, the pain of which would force us to realize how bad a situation really is.

1. What happens when we delay documentation?

Perhaps the most common case of such scamming occurs when testers wait until the end of the week to record the details of the testing they’ve done. A tester may need to wait a few days for interpretations of testing to gel, but nobody can remember exactly the detailed data of a full week of key banging.

2. Take your revenge somewhere else

Sad to say, not all falsified test reports are trying to make developers look good. If relationships

between testers and developers are strained, it’s easy for testers to make particular developers look as if they’re doing a poor job. Generally, this is an unconscious skewing of tests run and of the testers’ assessment of bug significance, but it can be done consciously and maliciously. If so, it’s a sign of a sick organization.

3. Early returns will mislead you

Another common self-perpetrated scam is predicting how long testing will take based on early returns. Generally speaking, the first twenty bugs you fix are among the easiest bugs to fix — they’re the ones you fixed first, so by definition, they were the easiest ones to fix — so this “method” seriously underestimates how much time will be needed to fix the other 180.

4. Hack-‘n’-crack/Whack-a-bug

Bugs do not occur at random. If you fix each one as if it were an isolated case, you’ll misestimate the amount of real accomplishment. Quite often, if you pinpoint the root cause of a failure, you’ll explain — and eradicate — several failures at once. But if you treat each bug as an isolated, random event, you’ll take a lot longer and miss quite a few bugs you could have easily removed.

5. Regression Tests ≠ New Tests

If management rewards testers for performing a large number of tests, you may find that automated regression tests are counted each time they are run. Even if large numbers of tests were significant, counting the same test multiple times is a scam.

6. Counting ≠ Thinking

Counting tests can produce many sinister effects. For instance, when tests are counted, testers may avoid creating any long or complicated tests. Often, they find ways to copy tests

with slight variations and call them different tests. It’s so tempting to substitute test counts for observation, conversation, and thinking. It’s so easy to lie by allowing your listener to continue making false assumptions about what you’re doing.

7. The Partial-Test-As-a-Complete-Test Scam

Yet another common self-perpetrated scam arises when the tester interprets a partial test as a complete test, saying, “Yes, I tested all the values in the table.” (After all, the tester tested the first value, the second value, a value in the middle, and the last value. That amounts to the same thing, doesn’t it?)

8. Garbage arranged in a spreadsheet is still garbage.

If it comes from a spreadsheet, it must be right, right? Wrong! A spreadsheet, like any other computer program, exhibits the garbage-in-garbage-out property. How could it be otherwise?