In order to receive credit for this assignment, your solution must be submitted, using the handin command, by 8AM, Monday, April 2nd, 2007. I will zip your files and move them to my directory at that time. You may submit your solution before the deadline as many times as you like; only your final submission will be considered.
The purpose of this assignment is to provide an opportunity for you to analyze an industrial strength C++ compiler from the GNU Compiler Collection (gcc). This compiler comes packaged with a test suite. Thus, part of this assignment will entail running a lot of test cases to test an extended version of the gcc C++ compiler. In academia, even at the graduate level, many programming assignments entail writing dixie-cup programs that are not thoroughly tested, or not even tested at all. Part of this assignment is an exercise in testing, running more than a few test cases and the need for automating that process.
For this assignment, you must perform the following three tasks:
For the first task, you will be given a test script that uses PyUnit to automatically run test cases. PyUnit is a derivative of JUnit, which is a unit testing framework originally written for Smalltalk by Kent Beck. You should study the PyUnit script carefully; there is a line in the script that contains the path that the instructor uses when he runs the script; thus, it contains the word malloy. You should fix this so that it will work on your account.
The PyUnit test script will collect the test cases in a given directory and automatically run these test cases through astinxml. Your task, for the first part of this assignment is to extent the PyUnit test script to test only a single file in a given directory. The deliverable for this first task is an extended PyUnit script that will work for a directory or a file. This task will be explained and demonstrated during lecture.
For the second task, you will use the PyUnit test script to evaluate the ability of astinxml to generate an AST in valid XML format. For this task, you will use the same PyUnit test script that you extended in the first task. The test cases that you will use are those in the g++.dg directory of the gcc testsuite for gcc-4.0.0. Some of these test cases are positive test cases and some are negative test cases. A positive test case is a test case that is expected to pass. A negative test case is a test case that is expected to fail. A negative test cases fails if it passes.
Part of your second task is to remove the negative test cases from your copy of the g++.dg directory. Another part of this second task is to identify how many and which of the test cases in the g++.dg directory, when used as input to astinxml, generate valid XML. Your submission for this second task should be a script that lists the test cases that generate valid XML, and a list of the test cases that do not generate valid XML. Also, submit the g++.dg directory with negative test cases elided. This second task will also be discussed and demonstrated during lecture.
Finally, your third task is to write an XML parser that takes as input the AST in XML format generated by astinxml. Your XML parser should traverse the AST, print the tokens in the AST, and thereby regenerate the source code for the original program. Thus, the deliverable for your third task is an XML parser that will take an AST in XML format as input, and generate the corresponding C++ source program as output.
One difficulty with this third task is that the gcc C++ parser is a recursive descent, backtracking parser. For some productions, the parser does not know definitively how to proceed so it performs a tentative parse and then, when it determines that the tentative parse is correct, it commits to the parse. You will be given a C++ program that will remove the tentatively parsed tokens. However, class member functions, as part of the tentative parse, are actually dumped after the class is parsed. To handle this, astinxml generates four extra kinds of tokens:
delayed_function_body_start delayed_function_body_end function_body_hole_start function_body_hole_end
The token delayed_function_body_start indicates the beginning of a class member function body and the token delayed_function_body_end indicates the end of a class member function body. The token function_body_hole_start indicates the beginning of a function hole, and the token function_body_hole_end indicates the end of a function hole. You should throw away all of the tokens between function_body_hole_start and function_body_hole_end.
You will be given a directory that has the original g++.dg directory, a PyUnit test script (that you will extend) and a start on the XML parser. You have also been given an account on dartagnan that you can use to access the astinxml version of the gcc-4.0.0 parser.
Your submitted solution should contain a README file that explains your successes and failures, and anything extra that you decided to incorporate into your solution to the assignment. As usual, you should use the handin command to submit your assignment and the handin command is available on dartagnan.
The handin command:
handin.829.1 3 *