Notions of “Completeness”

For the past ten years or so, part of my job as a test manager has been interviewing testers. In that time, I’ve interviewed hundreds of testers using an “audition” format.

An audition is a one-hour simulated project, where the tester applying for a job comes in and demonstrates their exploratory testing ability on a piece of software. As I observe, I take notes on a whiteboard about what I see.

The application they test is an old VB app that takes three numbers that the user enters into a text field and tells them what kind of triangle it would create. The numbers entered must be separated by commas. When the “Check” button is pressed, a read-only output field will report one of five things depending on the geometric legality of the triangle: Equilateral, Isosceles, Scalene, Invalid!, and “Not a Triangle.”

It’s a deceptively simple program, and over the years, I have seen and recorded thousands of tests by candidates. Today, I saw a test idea I could have sworn I have seen many times before, but it resulted in an error I know have never seen.

The error was a run-time ’13’ (type mismatch). It was revealed with a simple test: 5%, 17000%, 88.999999999%.

It could be emergent behavior, but I doubt it. Nothing has changed in my program and I suspect nothing new or meaningful is occurring with my machine state. My best guess is that no one has ever run this kind of a test.

I’ve seen people enter all kinds of symbols (like percent signs) and numbers, and combinations thereof, but surprisingly, nothing that combined three types (integers, decimals, and percent signs) in this way.

That simultaneously amazes me and makes me shrug. It amazes me because you’d think that someone would have stumbled on to this bug before now — either via a different test or this very same test. But I shrug because I know that not every test (or combinations of tests) conceived by humans or computers can be performed, so “completeness” will always be a subjective notion.

So how do we report completeness when there may be a latent (and important) bug hiding despite all of our best efforts to find it?

One of the exercises I use toward the end of the audition is to ask the candidate to tell me how complete their testing is on a scale of 0 to 100 (100 being “completely tested”).

I have done about 400 of these auditions over the years and I have seen a lot of interesting tests, comments, and behaviors from testers. I’ve hard all kinds of answers to the 0 – 100 question.

It’s meant to be a trap. If they give me a number, I can push back with a series of tough questions as to why they gave me that number and not another. Some answers are better than others.

Here’s a few I find acceptable enough that I would likely not push back on the candidate too hard:

1) The “Tests Passed” response — “I started with X amount of test ideas in my head and I ran Y. Now, Z of them have passed, and assuming “passing” is an indication that I have a little bit of coverage in that area, I answered in terms of that notion.”

2) “Risk List” response — “I have a story for all of the meaningful risks I identified at the start of this audition.”

3) Confirmatory — “The product met X expectations under these conditions. Anything more is superfluous because I treated this like a trade show demo.”

4) Trap Avoidance — “I can’t answer that without more context.”

5) Zero — “There are an infinite amount of tests. I have run 57 that I can identify. Fifty-seven divided by infinity is a number that’s so close to zero, I chose to say zero.”

6) 100 (based on time) — “Assuming 100 means ‘100 percent’, I have completed all the testing I had time for.”

7) Terminology Check-in — “What do you mean by 0 or 100. 0 or 100 what? Percent? Test cases? Bugs found?”

8) Good Enough — “In the time you gave me, I found bugs I believe to be of value for the stakeholders. The story of my testing does not have any major problems or would not likely provoke critical questions I can’t answer. The benefits of my work outweigh any problems with it, and furthermore, further testing may be more expensive at this point, so we may have reached an acceptable level of testing. Do you agree?”

9) Pushback — The candidate asks me “Why are you asking the question?” or “Why is it important to distill it in this way?” or “What will be done with this information?”

10) Any number — The candidate gives me a number, but can explain to my satisfaction why they came up with it and defends it to the extent that they convince me they would not lose credibility or respect with other members of the team.

Getting Dressed for Testing

Colleague Michael Bolton responded to my “Case Against Test Cases” entry and reminded me of a recent (recorded) conversation he had with my brother James about this topic. The link is to an MP3 – an hour-long conversation between Michael and James where Michael role-plays a tester as James plays the customer.

As you listen, you might deduce that even the simplest notions of assumption and common sense when running a test case are challenged, making it hard to know what should or shouldn’t be included in a test design.

For example, some environments (clean rooms) require wearing a lab coat before testing is performed. Should that be listed in the first test case?

A Case Against Test Cases

I learned something today. I may not know how to run even the simplest scripted test case.

I’ll show you.

This one is for the File/Open dialog of Internet Explorer:

  1. On a clean XP machine, launch Internet Explorer
  2. Go to the File Menu, select Open
  3. Enter the word “TEST” into the text field
  4. Click OK

Verify that “https://test/” appears in the URL text field.

Ok. Let’s look at the word “clean” in step 1. What does “clean” mean? If it means I should start with a fresh install, what about upgrades that the OS will want to run when I connect to the internet?

Let’s look at the word “launch.” How does the test case engineer – the person who wrote this – want me to launch it? Double-click the .exe or shortcut or command line? Does it matter? No? How do you know? Furthermore, as a tester, I could be thinking “is that a flaw in the design or is he leaving me free to choose any old way?”

What about which version of IE?

What does the word “go” mean? If it’s a means to invoke the function, does that mean via the keyboard or mouse?

What about the word “enter” – can I cut and paste into the field? After running this test case several times, I may choose this method – will it alter the test?

What about the word “TEST”? Is it meant to be capital letters? What about the quotes?

You can see there’s a lot of room for thinking here.

Ok, so you say to the tester:

“No thinking allowed… besides, it should be obvious.”

I would argue it’s not, simply because of the problem of being human. The tester could even hate the notion of exploring, loving to have their tests written for them, but the simple fact is, three such testers running this simple script still have the same problem to solve: how do I run this? In answering that question, each may find three different answers for themselves, running it three different ways.

“It doesn’t matter,” some might say. “Any way is fine, it’s all about the spirit of the test – the Verify statement.”

Alrighty then, the tester might confirm the “TEST” text is in the right place, but if the application crashes, is that a pass or a fail?

It’s obvious – it’s a bug to file.

Well, not really.

It could be that the programmer already knows about it and it’s unrelated to the population of the URL bar, which is really what they wanted to know from the test.

In short, if I find that a test step is vague, I might need to ask a question.

In asking the question, I am seeking context.

In seeking context, I am thinking and adjusting.

In thinking and adjusting, it’s exploratory.

“If it’s not exploratory, it’s avoidatory,” my brother James often says in a Johnnie Cochran voice, maybe for the times when you need to avoid any peripheral actions or problems when you run a test. This can be good if your mission is to only find bugs in one feature that will be demonstrated during a trade show. Any bugs outside of that feature are not important, so do not explore! As in the case above, if the word “TEST” appears, that’s all we care about.

Michael Bolton calls this kind of testing “ceremonial.” He did a talk recently at the Pacific Northwest Software Quality Conference where he made the following analogy: When the Queen of England reviews the troops in formation, it is ceremonial. She does not inspect them in detail like a sergeant would, saying “Is that dirt on your cheek, solider?!? Drop and give me 20!”

Anyway, some people in our biz want to compare the efficacy of scripted testing against exploratory testing. I used to be one of them, but study and experience with testing over the years has made me bow out of this silly idea.

It’s not because I believe that ET has been more effective for me in finding better bugs quickly. It’s not because I believe that test cases are an easily prescribed way of accounting for your testing.

It’s because I have not yet seen evidence that a scripted test case can be run by a human the same way every single time. On the flip side, it’s also because I believe exploratory testing can include scripts. Because this line is blurred for me, I don’t know what it is I’d be comparing in a test-off between scripted and exploratory.

If “scripted” means “written down in advance,” that could mean that when I’m exploring, (which many think of as “unscripted”), I am doing scripted testing when I use a script or a procedure to frame my exploration. Rightly so, I can have a model of the test or the feature, a charter, a mission statement, a user role I’m supposed to play – yet still be what Cem Kaner calls “brain engaged”: alert and attune to emerging context.

I think we all do this. We think as we test.

What I’m asserting here is that the EXTENT to which we are ALLOWED to think is the extent to which the test is exploratory.

If our tests are written for us in advance and if we are told not to think, (which could well be the case if we are trying to repro the exact steps from a customer, developer, or our own bug regression case) then I suggest that even still, we can not define it as scripted or exploratory, only a *degree* of being “more scripted” or “more exploratory.”

Here’s something that helps me in those cases, called the Tester Freedom Scale:

Freedom scale image

After my brother and I came up with this, it’s easier for me to talk about “scripted” vs. “exploratory,” because I can talk in degrees of freedom. More freedom is more exploratory, less freedom is more pre-scripted.

Maybe that’s why doctors write these things called “prescriptions” – instructions written down in advance of going to the pharmacy to give the procedure – the exact dose and the duration – without having the pharmacist needing to think or interpret.

But aren’t we glad that the pharmacist is human, and can choose to call when they are confused because of the doctor’s bad handwriting?

Scroll to top