More about “automated” baseball…

Posted by Jacob Stevens on June 21, 2010

My esteemed colleague Jon Bach just made the very good point in his “Automated Baseball?” entry that automation is, ultimately, a test tool. It doesn’t replace testing, but assists and empowers testing.

So here’s another example of a test tool in the guise of something else: code compilers.

Compilers check for syntax and nomenclature. Correct but inapplicable syntax and nomenclature can be accepted and compiled, and later caught by testing, but it’s largely a task testers just don’t have to do anymore. Implications on test and quality still exist, and are, to me, fascinating.

Compilers also perform type checking. Type checking checks for proper handling and usage of objects according to their data types (string, int). If an object is assigned a data type of string, the compiler will check to make sure all the code treats it like a string, calls to the object only call attributes a string has, no disallowed implicit type casting is attempted, etc.

I should clarify that this is only for static programming languages. Static-typed languages like the C family of languages (C, C++, C#, Objective-C) perform their type-checking at compile time. Dynamically-typedlanguages (Python, Ruby, SmallTalk, Lisp) perform type checking at run-time. Some, like Perl,have both (built-in types are static, but Perl supports dynamic type-checking for user-defined types).

How is this relevant to testing? They have different implications. Type-checking at compile time is kind of “conservative:” it won’t let you compile some code that could result in a run-time error, but may not, depending on how the objects are used, how errors are handled, or depending on user input. This is information only available at run-time, and so in lieu of contextual certainty, compilers play it safe and throw an error.

Dynamically-typed languages like Python are arguably more powerful by letting you do things a conservative compiler wouldn’t, allowing you to self-mitigate against run-time errors with more advanced exception handling.

So compile-time checking is generally safer and conducive to more stable code. As testers, we think that’s good! But run-time checkingconstitutes much more testing (it makes unit tests actually useful!), and allows for far more complex risks, that require more complex test coverage, because of the nature of run-time errors compared to compile-time.As testers, job security ain’t so bad either, is it?

So it’s a “risk-for-power” tradeoff, and for many developers, it’s worth it. A widely cited practice with Python is EAFP, which means, “It’s easier to ask forgiveness than permission.” It’s mostly a performance and error handling posterity thing, but it’s got implications on risk, quality and testing.

Which is good or bad, depending on your perspective.Risk-abating test engineers may consider it a bit reckless and not think the tradeoff is worth it. From the perspective of quality engineers, who want to help build great software and contribute quality to development, it may simply pose more of a personal challenge.

But regardless of your perspective, the usage and role of the compiler, as really a test tool, reminds me of Jon’s point, that whatever the tool, we’re still just testing. Contributing quality to a development effort.

And with those objectives in mind, as a quality engineer I see the differences and implications of compile-time and run-time type checking, and I wonder, why not both?

Any test engineer can testify, compilers don’t find all the bugs. Testing for correct and stable program operation, can find a much wider range of bugs, than compile-time type checking.On the other hand, it’s impossible to be as comprehensive and fast as compile-time type-checking at finding the bugs that can be found with a type-checking compiler.

So why not both? Static-typed languages still benefit from testing, after compilation. Wouldn’t dynamically-typed languages benefit from a compiler with full type-checking, at the very least to alert the developers to all the potential issues, that they could then choose to ignore and handle themselves, when/where it’s beneficial? And otherwise allow type-checking to be performed at run-time, where testing can identify what needs to be fixed?

We use automation to supplement test: we let it show us where to investigate, where an issue may be, and then we make the judgment call. I would think dynamically-typed languages would benefit from the same thing. There’s bound to be some veteran Python programmers out there, who’ll read this and enlighten me on why compile-time type checking doesn’t work for them. But as I understand it right now, a valuable tool is at your disposal, so why not use it?

Automated Baseball?

Something big happened in a baseball game last night that is causing a buzz in the sports world today. I think it’s related to a buzz in the world of software testing.

Armando Galarraga, a pitcher for the Detroit Tigers, was on the verge of pitching a “perfect game” — a game not only in which no batter of the opposing team gets a hit (a “no-hitter”), but in which no batter even makes it to first base. That means pitcher Galarragawould have had to outlast 27 batters trying to smack the ball into play. That’s some great pitching on his part along with some exceptional defensive support from his teammates.

Perfect games are rare. In the 134-year history of Major League Baseball, there have only been 20 perfect games. Two of them, amazingly, happened last month, which has never happened in one season. (Three have never happened in one season, but two have. The first two in 1880 happened 5 days apart.)

And last night at 6 pm Pacific Standard Time, Armando Galarragawas set to be the 21st.

In the 9thand last inning, Galarraga faced one last batter: Jason Donald. Galarragadelivered a pitch and Donald connected. The ball was covered by Tiger first baseman Miguel Cabrera who was way off the base to field the ball, so pitcher Galarragaran to cover the base that Donald was running for. Cabrera threw the ball to Galarragain time to beat Donald by a full step before hitting first base in mid-stride.

But to everyone’s astonishment, first base umpire Jim Joyce called Donald safe! Being safe means Donald had made it to first base before the ballreached Galarraga’s glove, spoiling his perfect game.

As the crowd booed, Tiger manager Jim Leyland came out and argued with Joyce, but the call stood. The crowd then watched the instant replay which showed the Indians batter Donald out by a full step. Donald had not beaten the throw. He should have been out. Jim Joyce got the call wrong andeverybody saw it.

But in baseball, even though umpire judgment calls can be argued, those calls rarely get reversed unless by another umpire who saw the play. It was hopeless. Furthermore, it was time to move on to the next batter, which Galarragadid — and subsequently got him out to end the game.

It didn’t matter that the Tigers won the game. The “perfect game”, a game in which Galarragatechnically allowed no batters to reach first base — was spoiled even though the objective truth (according to the camera footage) showed that Galarragadid not allow Donald to safely reach first base.

Unlike other sports, the camera has no say in how baseball games are decided. In baseball, it’s the umpires that decide. It’s purely human judgment in the moment. Other sports allow appeals to officials if the camera shows a different story than what their ruling indicated. Not baseball. At least, not *yet*. After last night, that might change because this particular game had a bearing on some historical statistics that make baseball much more interesting for a lot of people to follow.

That judgment call by umpire Jim Joyce is now the topic of sports radio call-in shows, newspaper sports sections, and online blogs and articles all across the country today – how he got the call wrong, what the camera showed, if baseball should allow instant replay to influence the game, even how the call was handled by the pitcher, the umpire, the manager, and soon, the Commissioner of Baseball, who oversees everything in the sport.

How is this important to software testing?

There is a balance in baseball between what the camera sees and what the umpire sees. In testing, there is a balance between what the tester can test and what the computer can test.

In software, testers use their judgment. Machines have no judgment other than what they are programmed to do. They are programmed to execute and record, to render and calculate.

As it happened, about an hour before that game, I was talking with Michael Bolton and Ben Simo online about the term “exploratory test automation.” I had retweeted Elisabeth Hendrickson‘s post about a class she was hosting at Agilistry (called “Exploring Automated Exploratory Testing“).

Bolton, Simo, and I were discussing that title, trying to see if we could come up with something more accurate, because Elisabeth’s title seemed to be a contradiction-in-terms. How do you automate exploration when exploration is inherently human judgment and skill as we react to what we learn in the moment and automation is not? We were pretty sure we knew what she meant by the class, but how best to describe the interaction between machine and human?

It’s important to know that me, Michael, Ben, and my brother are people who believe in the power of language to convey ideas and meaning. We argue over precision and semantics because they communicate more than just words. We believe it is important to debate these kinds of things, openly, publicly, because it propels and provokes conversation about meaningful ideas that are meant to help all testers everywhere win more credibility and respect, much in the same way arguing baseball calls can evolve the sport.

So we traded ideas of how to describe the computer’s role in exploration. Since it was a public discussion on Twitter, people following that thread could chime in:

Michael Bolton’s idea was to call it “Tool-Supported Exploratory Testing” (proving to be a humorous, dyslexic TSET)

James wanted to flip the words and call it “Exploratory Automated Testing”

Oliver Erlewein liked “ETX” (and so did I) but doesn’t yet know what the X could be — it’s just cool.

Zeger van Hese suggested “Hybrid Exploratory Testing”

I offered the playful “Bionic Testing” after the Six-Million-Dollar Man.

Alan Page said it could simply be called “exploratory testing” and leave it at that because no matter whether your exploration was computer-assisted, it’s still exploration. James liked that and so did I.

But isn’t there a term or a phrase or a word that can more accurately and precisely describe the computer’s role in assisting testing?

Is it automation when you use a tool to help reveal a bug?

Is it automation when a machine executes a programmed test procedure?

Is it automation when you use Task Manager to see the list of processes in memory?

Is it automation when you execute Cucumber or Fitnesse (keyword-driven) tests?

What do you call it when you click a button on a test harness and it clicks on the objects on the screen for you and delivers a report at the end of the script?

If it’s all “automation”, doesn’t that imply that it needs no human intervention?

I think we can find a better term.

Everyone can agree that computers help exploration. Call them “probe droids” or “bots” or “tools” — they inform a human about things that are notoriously hard for humans to know on our own. They do things that are hard or slow or tedious or expensive or impossible for a human to do.

But we also know that it’s also impossible for software to test itself in all the ways we can test it — just like it’s impossible for a camera to replace umpires at baseball games. Computers and humans enhance each other.

Today in baseball, there’s a lot of energy and debate because of that game last night.Galarraga’s near-perfect game may lead to a major change in using replay in baseball games. The Commissioner of Baseball may even overturn Joyce’s ruling, meaning that the official record books would reflect a perfect game last night in Detroit.

Today in software testing, there’s energy and debate around the word “automation”, especially with more classes like Elisabeth’s and the more we talk about Test-Driven Design and tools on projects.

While baseball debates whether to use instant replay in helping to decide close plays , I’ll bet you if they decide to use it, they will not call it “automated baseball.” We testers *know* we use technology to help us with testing, I just think we can do better than “automated testing”.

Scroll to top