I. Problem statement
Think back to every time you've been knee deep in C with no way to test your code but to-forbid the thought- write more C! We've all been there; stub testing requires, at the very least, a cocktail comprised of copy and pastes along with massive printfs. For even tighter control, you may even opt to GDB it. But why should we have to go through the same routine every time we create a new C module? While not particularly hard to write, creating a new program for each module creates a lot of redundancy. We propose a streamlined way of reproducing the following code:
int main(int argc, char **argv) {
printf(“Testing fn\n”);
printf(“Input: 1 Output should be: 3 Output is: %d\n”, fn(1));
printf(“Input: 4 Output should be: 5 Output is: %d\n”, fn(4));
printf(“Input: -34234 Output should be: 0 Output is: %d\n”, fn(-34234));
...
return 0;
}
in a syntax that is both simple and intuitive. Whether you are testing your own C functions or ensuring that someone else's contained within your own code work as expected, this is a very handy feature to execute on the fly.
II. Existing Solution
A similar solution exists in the Python Doctest module. With it, we can specify inputs to a given module's functions, along with the expected outputs. This is done by providing a multiline string after the function's definition, in which we give inputs and the corresponding outputs, line by line. A small example would be:
def factorial(n):
"""Return the factorial of n, an exact integer >= 0.
If the result is small enough to fit in an int, return an int.
Else return a long.
>>> [factorial(n) for n in range(6)]
[1, 1, 2, 6, 24, 120]
>>> [factorial(long(n)) for n in range(6)]
[1, 1, 2, 6, 24, 120]
>>> factorial(30)
265252859812191058636308480000000L
>>> factorial(30L)
265252859812191058636308480000000L
>>> factorial(-1)
Traceback (most recent call last):
...
ValueError: n must be >= 0
“””
(function def...)
Here, the tests are included in the function's docstring. The triple chevron signifies what the input is in python syntax; the first line is a list comprehension using the factorial function. The last line is a simple call to the factorial function, given the argument 30 as a long integer. Each test input is followed by the expected output; even thrown exceptions can be tested. When we execute the module, no output is given if the tests were successful (i.e. they match the expected output on the following line), unless the -v option is given by the caller.
The Doctest module is charged with the parsing of the docstring, and the Python interpreter is used to perform the execution of what is found. Each test that is run uses a shallow copy of the tested module's global variables as to not incorrectly influence the other tests.
An issue with Doctest is that it does not currently recognize doctests from Python extension modules. In its inner workings, Doctest's isfunction() returns False for an actual function that is defined in an extension module, while its isbuiltin() returns True, even though it is not a builtin. Also, there are weird corner cases, such as not being allowed to represent ellipses in the expected output. For example, the following test would incorrectly pass:
>>> print 'Hello'
… print 'World!'
Hello
Aside from these minor shortcomings, this module is a very nice built in feature that we would like to expand on for the C language.
III. Features
Abstraction
Tests/Testing commands stored as variables
Range Values – The ability to have a single command expand into any number of commands
Test Logs – A log created containing the results of the testing commands
Testing Options
Repetition – The ability to run tests a given number of times
Log Options – Verbose versus silent output
Testing Constraints – Flags thrown if testing conditions are met
The testing language is modeled after common scripting languages, where operations are performed with as few keystrokes as possible. The first example of C code in this document could possibly be rewritten as:
fn(1) -> 3;
fn(4) -> 5;
fn(-34234) -> 0;
where the output specified would be checked with the actual output. Or perhaps they can all be combined into one line:
fn({1,4,-34234}) -> {3,5,0};
Maybe we would like to test a range of values with a single testing command:
fn(-10:10);
where no output specified would simply print out the results. The objects being operated on in the language would be the C functions that we are testing, while the operators are the inputs we specify and the options for how the functions are tested.
Abstractions will be somewhat like defining a function, except that in this case it is a test. A test can match some function call to the output and also execute certain actions in the target language, such as creating an object or assigning a variable with some value to use as a test argument. Allowing the specification of repetitions for tests, along with testing logs, are for the benefit of the programmer. Implementing constraints allows the programmer to easily see whether certain conditions on the output are met, such as if the output falls within a certain range, or perhaps if an impossible value is retrieved:
%constraint fn(3)
%contraint fn > 0 %exec $test0;
Now, each output to the fn function will be tested against the preceding constraints. If the first constraint has not been met (function fn given an input of 3 stays below 56), the programmer will simply be notified. If the second constraint fails (any of function fn's return values are not greater than 0), test0 will be executed.
IV. Implementation
We plan to use our Python parser generator, given the specified grammar of our language, to parse the test script. Alternatively, we could start over and write a new parser in the same language we will be examining (C). The latter way would be faster, and not too complex given our simplistic grammar, but we would lose portability. Besides, we have already developed the tools for the job, and keeping each module of our code (parser, interpreter, debugger) in the same language is preferable.
The Python language allows us to easily execute some of the features of the testing language. Specifying a range of variables, or groups of them, along with groups of outputs, can be stored and checked with Python's mapping functions and dictionaries. Constraints can be quickly looked up from dictionaries as well.
After we parse the input, we will use an interpreter written in Python to execute the AST generated. We can alternatively create bytecode as the representation of the parsed input, but since our grammar is simplistic and unambiguous, it lends itself to interpreting the AST.
We can take advantage of an interface generator, like SWIG, to create an interface of the given C code to run in our Python environment. This way, debugging our code will all be done in Python, keeping our entire existing tool set and all new additions that we create in the same language. Then, all of our debugging can be done in PDB.
V. Resources
http://docs.python.org/library/doctest.html
http://bugs.python.org/issue3158
http://bemusement.org/diary/2008/October/24/more-doctest-problems
No comments:
Post a Comment