Tiny-C Interpreter Internals
Table of Contents
A full walkthrough of one expression
Introduction
I have no intention to write a walkthrough of the entire 11 files of C source code for this project, but I thought it would be fun and perhaps useful to some students to walk through the heart of the system, the parser and actions of the expression evaluator. Bear in mind this code is a C clone of 8080 assembly code written in 1977. One would probably change the architecture several ways on modern computers. We visit that issue latter.
Several issues before we dive in:
I make extensive use of what I call the "tilda notation." Example: tc.c ~94 means file tc.c, line 94.
In the html version of this writing the tildas into the file tc.c are links, and the file tc.c is an appendix. All one document. Click the link to see the code, and click the browser's "back" arrow to return to the walkthrough text.
The link is typically to a few lines above the referenced line to show some context.
There are tildas to other files, stack.c and tc.h for example. These are not links, and the text explains their role usually sufficiently to read on. But the tildas are there for those who wish to dig a bit deeper.
The appendix has only the one file, tc.c, with line numbers. Future work on any program makes line numbers drift, so capturing this walkthrough's line numbers is done in the appendix for referential integrity.
The full source is available: https://github.com/tgibson37/tiny-c. It is GPL'd, so download it and play with the source.
Frequently encountered code
cursor, stcurs
There are two cursors.
char* cursor; char* stcurs;
The important one is cursor. All parsing functions examine the next significant text from the cursor on. And when they get a match, they advance cursor beyond the matched text. Stcurs points to the beginning of the current statement.
lit, the x words
int _lit(char *s)
Lit matches an exact string, returning true on match, else false. It advances the cursor over spaces and tabs, but does not go to the next line. If the cursor then points to an exact copy of the string s, it advances the cursor beyond that copy and returns true. Otherwise it returns false. This is used to match operators, punctuation, and keywords. Almost always it is embedded in an if statement:
if(_lit(xwhile)) {...
The exception is optional punctuation where it matters not if the match was true or false. The word xwhile is defined in tc.c ~16 as “while”. All keywords, punctuations, and operators are defined in similar x words. The list starts at ~12. So
if(_lit(xsemi)) {…
is testing for a semicolon.
It is not an error for a _lit call to return false. As we will see parsing is mostly code asking, “are you this?, no, well are you that?,...”. Look for chains of else-if's doing just such a search.
eset, error
But errors do occur. When a parsing error occurs, perhaps an undeclared or misspelled variable, or weird syntax the function eset sets the variable error to a code.
eset(STATERR);
The codes are defined in tc.h ~33. The variable error is frequently tested, for example at the beginning of each statement. Eventually the interpreter grinds to a halt, and issues a message.
rem
void _rem()
_rem skips whitespace and remarks. It keeps skipping until the cursor is on significant program text. _rem is the only way the cursor advances to the next line.
verbose
The array verbose has switches that turn on debugging reports. Ignore an if(verbose[XX])… you encounter. Or use them to see the parsing in action. The list of XX's to plug in are documented in “the new tiny-c.doc,” debug section, the v (verbose) command. You can turn individual reports on and off.
Token matching
A program is a string of "tokens." For tiny-c the tokens are:
symbols: keywords, function, variable names
constants: numbers, character, strings
punctuation: operators, structure definers ([], ())
end-of-line (including a remark if present)
Those tokens are the object of the "Are you this?" logic. The function _lit is used to ask this question for keywords and punctuation, _rem does end-of-line. We defined those in the Frequent..code section above. Constants are handled by _konst, and functions and variable names, collectively called symbols by _symName.
symName
Consider this code with the cursor at or just before the x:
x = 1 cursor--^
As the parser grinds away it eventually asks, "Are you a symbol?" An example is ~253 in tc.c:
if( !_symName() ) { /*defines fname,lname. True is match.*/
_symname() examines the next significant text after the cursor, but on the same line. If it is a symbol it defines two global char* pointers to the first and last characters of the symbol, fname and lname, and returns the length of the symbol, which also serves as a TRUE. If it is not a symbol it returns 0, FALSE. In either case the cursor is left at the first significant non-whitespace character. Here is the code, tc.c ~181:
0181 int _symName() { 0182 char* temp; 0183 while( *cursor == ' ' || *cursor == '\t' ) ++cursor; 0184 temp=cursor; 0185 if( isalpha(*temp) || *temp=='_') { 0186 fname = temp; 0187 } 0188 else return 0; /* not a symbol */ 0189 while( isalnum(*temp) || *temp=='_'){ 0190 ++temp; 0191 } 0192 lname = temp-1; 0193 if(verbose[VP]){ 0194 fprintf(stderr,"\nparsed "); 0195 dumpft(fname,lname); 0196 } 0197 return lname-fname+1; /* good, fname and lname defined */ 0198 }
At ~183 the cursor is advanced over whitespace. At ~185 the first character is tested for alphabetic or underscore. If true this is a symbol and fname is defined. Otherwise it is not a symbol. The code at ~189..192 defines lname. And at ~197 the length is returned.
Another part of the pattern is changing state. _symname does a trivial state change, the cursor over whitespace. Often when the answer is YES the state change is more significant.
konst
Highlights of the code in tc.c:
0411 Type _konst() { 0412 char* x; 0413 while(*cursor==' ')++cursor; 0414 char c = *cursor; 0415 if( c=='+' || c=='-' || (c>='0'&&c≤'9') ) { ... 0425 return Int; 0427 } else if(_lit("\"")) { 0428 fname=cursor; ... 0440 return CharStar; 0442 } else if(_lit("\'")) { ... 0454 return Char; 0456 } else return Err; /* no match, Err==0 */ 0457 }
And in tc.h is:
typedef enum type {Err,Char,Int,CharStar} Type;
The ellipses have code that finds and sets lname and advances the cursor, different in detail, but similar to code in _symname. From the code shown we see that_konst() not only sets fname and lname, it returns the Type, defined in the typedef. And Err is 0, aka boolean FALSE.
Type Err by the way is a misnomer. It is not an error. Rather it is an answer "NO" to the question: "Are you a constant?"
The expression stack
tiny-c's stack holds not just the datum but also information about the datum. So what is pushed is a struct:
/* a stack entry */ struct stackentry { int class; int lvalue; Type type; union stuff value; };
class is an integer, 0 for actual datum, or 1 for pointer reference to the datum. If we ever implement pointers to pointers (2 and higher dimension arrays) larger values would happen. It is conceptually the distance from the pointer to the datum.
lvalue is true if the data is a valid lvalue, a reference to the value's storage, false otherwise. Thus function _eq (tc.c ~57) which affects an assignment has the address it needs to do so. Normally data is stored in the array pr (tc.h ~98) which also has all the code being interpreted. After all the code, libraries and application are loaded into pr (the program array), the rest of pr is data values.
How an array name is parsed is interesting. It is both a class 1, and an lvalue. Its value can be changed by a simple assignment. For example:
char buff(100), bp(0); bp = buff; …; bp = bp+1;
declares a buff and a “pointer” bp. It is a pointer ONLY because of the way it is used in subsequent code, first to set its value to the same as buff's, second to increment that pointer. The value of bp is in vartab. When bp is parsed it goes into the stack with stuff.value.up pointing into vartab, not pr.
The type is an enum, tc.h ~65:
typedef enum type {Err,Char,Int,CharStar} Type;
The union stuff has the value, tc.h ~63:
union stuff { char uc; int ui; void* up; };
If class is 1 or if lvalue is true the value is referenced via stuff.up which points to the value. Some values are in the variable table, vartab, and some in program space, pr, as described above. Hence stuff.up must be a void*, i.e. typeless. More on typeless coming up. tiny-c's stack is declared in tc.h ~72:
/* the stack */ struct stackentry *stack; int nxtstack, stacklen;
nxtstack starts out 0 and is incremented by each push, and decremented by pops. stacklen is the size.
The stack code is in stack.c. It is short and reasonably readable, so an actual walkthrough is not needed.
The variable table
The file var.c defines actions on the variable table. All symbols, both variables and functions, are placed in this table when they are declared. The table is a stack of frames. Frame 0 has all the globals in all the libraries. Frame 1 has all the globals for the application. Frames 2 and up have locals. Opening a new frame is done by newfun, var.c ~6, closing by fundone, ~24. tc.h ~76..91 define the arrays used for defining frames (fun) and the frames themselves (vartab), plus various pointers into them. The arrays are malloc'd, hence they are defined in tc.h as pointers, ~82 and ~90.
Read newfun and fundone, they are both straightforward to the push/pop task as functions are entered, or returned. But vartab needs a bit of explanation.
At tc.h ~85 is the declaration of an entry in this table:
struct var { char name[VLEN+1]; int class; Type type; int len; int brkpt; union stuff value; };
When a function or variable is declared newvar (var.c ~76) adds a struct var describing that variable into the current frame: name, class, type, length. And brkpt (used only by the debugger) is zero. Var.c ~83 returns the object size, 4 for ints and 1 for chars in this implementation, used by _allocSpace ~83 to allocate space for the actual data. Length is 1 except for arrays when it is the declared dimension plus 1.
Class is 0 for datum, 1 for array, 'E' for function.
The value is the address where the value is stored for a datum or array (value.up, a void *, tc.h ~63). For a function value.up is the entry point of the function's definition, which is just after the function's name. So it points to either the first argument declaration for the function, or the [ of its compound statement body if there are no arguments.
Typeless data access
The original tiny-c code was 8080 assembler. The 8080 platform is typeless. The closest it got to types beyond bytes was a few instructions that were 2 bytes. This implementation is reverse coded from my 1977 8080 code listing. To do it that way needed tools to handle data in a typeless fashion. The struct stuff is an example. It can be passed through many levels of calls, and only the ultimate receiving code need determine its actual type. But stuff is never passed without its descriptive information. Problem solved. Intermediate passing is typeless, but the end receiver has the data to do the proper thing.
But storage needed to be typeless, too. I toyed with using two arrays, char* and int*, but settled on using C's typeless memcpy function. So read the code at tc.c ~945, get_int, put_int, etc. This turned memory into pure ram, just like the 8080.
Function calls
When a symbol is recognized with the special class 'E' that is a function. It will be in _factor that recognition happens, tc.c ~634. Upon recognition only the function name is parsed, but it may have arguments not yet parsed. _factor calls _enter which:
completes the parsing
moves the cursor to the entry point of the function
parses the argument declarations assigning them values from the stack
executes the function body
moves the cursor back to just after the totally parsed call
returns with the global flag 'leave' set true if a return value is stacked, else false.
Upon _enter returning, _factor is done also, and returns. So the net effect of the six steps above is the pushed value, plus advancing the cursor past the call.
Notice how this fits into the pattern of anything _factor recognizes. Whether it is a variable, a constant, a parenthesis expression or a function call the net effect is the same: a pushed value and an advanced cursor.
_enter, ~307, takes one argument, char* where. A special case is where==0, which is a machine call. If not 0 it is a pointer to the entry point of the function definition, somewhere in array pr. This walkthrough will cover that latter case.
The code is divided into two sections, called in the comments ABOVE and BELOW. That boundary separates the parsing of call data from function defined arguments. Above, ~308..340, completes the call parsing stacking the argument values, and handles the special case of machine call. That is the first of the above 5 steps. Below, ~343..389, handles the usual case of calling a tiny-c function, the remaining five steps.
At ~312 advise the debugger a function call is in progress. At ~313 skip the optional left paren, if present. Then real work begins:
~314: Look for all punctuation that means there are no arguments.
~323..327: parse all the args. Note that commas are required to keep this loop going.
~330: skip optional right paren.
~331: advance cursor over remark and/or newline if present.
~332..340: test for and handle machine call special case. (There must be at least one argument, the MC number. ~333 nextstack seems a poor choice to test, 0 means empty stack. A better test would be nargs. I document here the code the way it is, not the way it could be.)
~339: return for this case. Machine calls do not do BELOW code.
~341..391: The fun stuff begins!
~343: save the cursors.
~345: open a new stack frame for locals.
~346..~371: parse the declarations creating locals in the new stack frame. Initialize them with the values stacked ABOVE. _setArg does that. Down count for each arg, and the loop ends when there are no more declarations.
Assume for now a fixed number of arguments. The varargs case is treated later.
~383: if no error execute the body. It is a compound statement, therefore a call to st suffices.
~384: st() may or may not push a resulting value. leave is true if it did. Otherwise push a zero. We need that because _enter must always return a pushed value.
~386..389 cleanup: restore the cursors, pop the locals variable frame (fundone), advise the debugger we are leaving a function.
Variable number of arguments
One of the x words is xvarargs defined as a 3 dot ellipsis, tc.c ~44. It is used in BELOW in parsing the argument declarations, ~363, and, like C, specifies a variable number of arguments, from zero up. Its only use so far is for the printf function in pps/library.tc ~10. This is a new feature in this implementation of tiny-c. It did not exist in the 1977 versions. And it was amazingly easy to implement because a machine call, MC, has always had a variable number of arguments. So most of the work was already done ABOVE.
All the parser does is set a global flag, varargs. It is initially zero, and after every machine call, at ~335 it is set back to zero just in case it was used. So going into BELOW we assured it is zero. At ~364 to the number of arguments plus one, the latter to make it flag true if there are zero arguments. And at ~372..381 a portion of the cleanup to assure the number of arguments parsed equals the number declared is made conditional on the fixed argument case. That is still a valid test for that case.
In _enter ~310 with varargs true the correct number of arguments is recovered. And that is all! Why this works is magic. But it does. I get confused every time I try to figure it out again. But here goes.
The whole varargs thing begins BELOW, then completes on another call to _enter ABOVE. It's topsy-turvey. When a function is called its supplied arguments are parsed, evaluated, and pushed, counting the pushes in nargs. All that happens ABOVE with no knowledge of being fixed or variable number of arguments. Then, BELOW, the cursors are moved and the function definition entered. As the last argument is the xvarargs match, the 3 dot ellipsis. Now the varargs logic kicks in. The body is entered, and that body (for printf) is an MC call:
printf char fmt(0); ... [ MC 101 ]
_enter is called again, and ABOVE restores the nargs, parses and pushes the 101 thus bumping nargs by 1, and calls machineCall(nargs), ~334. MC 101 does its thing, pFmt, a recursive use of the fmt string to print the formatted set of arguments, machineCall.c ~65.
Expression grammar
What is a grammar? It is a way of assembling tokens (words and punctuation) into meaningful natural language sentences. By analogy for a computer language a grammar defines a valid statement.
Computer languages have a well-defined grammar. Usually the grammar can be described very compactly using a formal notation called Backus-Naur form, BNF. Here in simple English is the essence of BNF for just some of tiny-c, captured from comments and slightly edited. It is both definitions and a recipe for discovery:
An ASGN is a RELN or an LVALUE = ASGN an LVALUE is a symbol, specifically a variable reference. a RELN is an EXPR or a comparison of EXPRs an EXPR is a TERM or sum or difference of TERMs a TERM is a FACTOR or a product, quotient, or remainder of FACTORs a FACTOR is an ( ASGN ), or a constant, or a variable reference, or a function call.
Real BNF would continue by defining symbol, comparison, sum, product, etc. In fact it sometimes goes down to defining the alphabet and the digits. Very detailed. But the above gives the idea. A high level element is defined in terms of lower level elements all the way to the most primitive.
If you are familiar with BNF and with recursion, skip the rest of this section. If it's new to you read on.
Consider this assignment:
a = b = c = 77
Following the recipe above ASGN parses the 'a' and concludes it has at least a RELN and maybe an assignment. Then it parses the '='. So it is not a RELN. It must be an assignment, LVALUE = ASGN. So 'a' must be the LVALUE, and the new ASGN must be the not yet parsed 'b = c = 77'. So the new ASGN parses the 'b' and the '=', and the third ASGN must be '77'. The third parses the 'c' and the '=' and the fourth ASGN must be just the 77. And (looking ahead) it is, but let's get there in small steps, following the recipe.
The fourth ASGN tries RELN first, and RELN can be an EXPR. Then EXPR can be a TERM, which can be a FACTOR, which can be a constant. 77 is a constant, and therefore the entire chain is satisfied.
Parsing goes top down through the recipe. When it gets a NO reply, it moves up a level, and at that level the next alternative. When all alternatives at a given level are satisfied it ascends one level saying NO, and that higher level tries its next alternative. This process is called "top down recursion."
Next is the expression parser. This goes into more detail than the quick recipe overview.
Expression parsing
Parsing
Consider this statement with the cursor pointing before the x:
x = y + 3 ---^
st(), ~752, is called to parse and execute a statement. st() tests for several possibilities: is this a declaration (no), is it compound (no), is it a reserved word, if, while, etc, (no)? This rather long if-else statement goes all the way to ~848, where it tests for an assignment statement. These actual tests are done by functions that are called, _decl(), _lit(), and eventually _asgn(). st() is asking: "Is this yours? If so do your thing, and then tell me yes (return true). Otherwise tell me no." st() gets a lot of no's, but _asgn returns yes. And that is where we begin.
Why is _asgn put at the very end of the long if-else chain? Because to be sure it really is an assign it must first eliminate other possibilities including language key words. It's as simple as that.
So the expression recognition/parsing work gets done in _asgn, ~462.
What's an expression?
Let's eliminate a point of possible confusion. A tiny-c expression is parsed by _asgn. But a part of that parsing is done by _expr to recognize an algebraic expression. And if _asgn recognizes something it may not be an assignment! For example (x<10) is recognized by _asgn.
In the C community the word “expression” is used to mean a C expression, a generalization of an algebraic expression.
Classical algebraic functions work down a hierarchy of "parts", and these parts have classical names. An expression is a sum/difference of one or more terms, a term is a product/quotient of one or more factors, a factor is either a constant or a variable or a parenthesized expression. Note the recursion. The grammar chapter showed this way of defining could be expanded to an entire grammar, and BNF has become the standard way of expressing the grammar. The highest level of C expression is the assignment. So an expression is recognized by _asgn.
I used the algebraic names of expressions, adding relation and assignment, at the top, and function class at the bottom. The highest level is an actual assignment. So _asgn() evaluates a tiny-c expression, and _expr (~519) evaluates a classical algebraic expression.
Expression parsing implements the hierarchical recipe in the grammar chapter. One function handles one level of the hierarchy.
Start at tc.c ~462, _asgn(), and read down to ~668. But don't read every line. Read just the lines with no or a single indent. Read just this:
/* An asgn is a reln or an lvalue = asgn. Note that reln can match an lvalue. */ 462 int _asgn(){ if(_reln()){ .. } return error? 0: 1; } /* a RELN is an expr or a comparison of exprs */ 474 int _reln(){ if(_expr()){ .. } return 0; /* not an expr is not a reln */ } /* an EXPR is a term or sum (diff) of terms. */ 519 int _expr(){ if(_lit(xminus)){ .. } /* unary minus */ else if(_lit(xplus)){ .. } else _term(); while(!error){ .. } /* rest of the terms */ return 0; /* error, set down deep */ } /* ;a term is a factor or a product of factors. */ 555 int _term() { _factor(); while(!error) { .. } return 0; } /* a FACTOR is a ( asgn ), or a constant, or a variable reference, or a function reference. */ 590 void _factor() { if(_lit(xlpar)) { .. } else if( type=_konst() ) { .. } else if( _symName() ) { .. } else { eset(SYNXERR); } }
This manner of reading gives a high level view of the parser without getting bogged down in details. A code's pattern becomes a bit easier to see. For example the code in every layer begins with a call to the next layer. So a call to _asgn rapidly descends to _factor which is the first to look through its alternative possibilities. At every layer the function returns an int except _factor which returns nothing (void). So factor is somehow a bit different because it must always succeed or else there is a syntax error communicated more strongly by eset.
Another pattern is calls vs returns. A call descends to the next layer, returns ascend to the layer above signaling YES/NO to the parse so far. When ascending the cursor has been moved so the parent layer can continue the parse.
All of this is classical LR parsing, often described as "bottom up" because actual treatment is done from the bottom, _factor, to the top, _asgn. For a deeper understanding of LR parsing, see:
https://en.wikipedia.org/wiki/LR_parser
Treatment
Treatment means what happens when something is matched. For a compiler treatments emit code. For this interpreter the treatment performs the action determined by the match. Here is one treatment, the one for _asgn. The full code:
/* An asgn is a reln or an lvalue = asgn. Note that reln can match an lvalue. */ 462 int _asgn(){ 463 if(_reln()){ 464 if(_lit(xeq)){ 465 _asgn(); 466 if(!error)_eq(); 467 } 468 } 469 return error? 0: 1; 470 }
Back to this line of code with the cursor pointing before the x:
x = y + 3 ---^
x is a symbol name. _asgn is called and tunnels down to _symname which returns "I got it" (true), and the pointer is advanced just past the x:
x = y + 3 ---^
The return true is propagated all the way up to _asgn, where _reln returns true, ~463. So we have a successful _reln and treatment begins. _lit(xeq) looks for an equal sign, ~464, and voila, it is there. _lit advances the cursor, so we have:
x = y + 3 ---^
Then _asgn calls itself, ~465, classical recursion.
https://en.wikipedia.org/wiki/Recursion_(computer_science)
The right side of an = is an expression, and this second call to _asgn totally parses and evaluates the expression (detail in next section), and pushes the value on the stack. The cursor is advanced beyond the expression, pointing to a possible comment or an end-of-line.
x = y + 3 ---^
Finally, if no error has occurred _eq() performs the assignment, ~466, giving x the value of y+3. All the data _eq() needs is in those two stack pushes, the first is the address of the cell holding x, the second being the fully calculated value of y+3. _eq() does no parsing, just assignment. Somewhere out in data-land the x gets its new value. That is the ultimate treatment for an assignment.
Reviewing parsing vs treatment:
Lines 463..467 are treatment for a parsed _reln.
Within that treatment lines 464..465 are treatment for a parsed =.
Parsing is matching, usually trying several possibilities.
Treatment is doing something once a match happens possibly including more matching.
Notice how bullet 2 above is nested within bullet 1. The treatment of bullet 1 includes additional parsing that matches the =, hence treatment for that match.
A full walkthrough of one expression
To wrap up here is a full walkthough of a case that includes parenthetical expression and a function call: x=y*(3+foo(z+7)).
To keep it brief a quite condensed but (hopefully) self explanatory notation is used. For example when I write “_asgn parses the y” you should know by now that _asgn initiates a process that descends through the layers to _symname which does the real work, then “I got it” ascends back to _asgn.
And a notation: To show what is on the stack lv(x) means x as lvalue. ad(3) means actual datum 3. And state is shown like this:
Cursor: Stack: x=y*(3+foo(z+7)) ---^
READY:
_asgn called, it parses the x (~463)
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) ---^
_asgn parses '=' (~464), calls self recursively (~465), parses 'y'
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) lv(y) <<<-top ---^
Returns ascend to _term (~557) which parses the '*', and calls _factor (~559)
_factor parses '(' (~594), calls _asgn to do the full parenthesized expression (~595). _asgn is now 2 recursions deep.
_asgn parses '3', _konst does the work, called by _factor (~600), recognizes digit (~415).
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) lv(y) ad(3) <<<-top ---^
Returns ascend to _expr (~529) which parses '+' (~540), and calls _term (~541), which parses 'foo'. _factor does the work using _symname (~619). foo is class 'E' so _factor calls _enter (~634).
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) lv(y) ad(3) <<<-top ---^
_enter ABOVE calls _asgn (~325) to parse z+7, _asgn parses and pushes z,
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) lv(y) ad(3) lv(z) <<<-top ---^
Returns ascend to _expr which parses + , calls _term (~541) which parses 7.
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) lv(y) ad(3) lv(z) ad(7) <<<-top ---^
Returns ascend to _expr which uses toptoi() to pop the top into b, ~543, and again for the value of z into a, ~544. Assume z==100. toptoi resolves the lv(z) into ad(100).
Data in _expr: (~545) a is ad(100), b is ad(7), both are class 0. Both have been popped by toptoi.
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) lv(y) ad(3) <<<-top ---^
Both are data (class 0) so _expr (~546) pushes their sum 107. Had one of them been a pointer (class 1) _expr would have pushed their sum as a pointer.
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) lv(y) ad(3) ad(107) <<<-top ---^
_expr sees no more + or – so returns ascend to _enter which looks for a comma for more args (~327), tosses the ')' (~330), where is true so control goes to BELOW.
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) lv(y) ad(3) ad(107) <<<-top ---^
The calling cursors are saved (~343), the cursor set to where, a new variable stack frame pushed, newfun, ~345. foo's declaration is
Cursor: Stack: foo int x […] lv(x) lv(y) ad(3) ad(107) <<<-top ---^
so _lit(xint) is true (~348), _setarg allocates foo's local x giving it the initial value 107.
It's not varargs and arg was bumped to match nxtarg so the one arg is popped (~379).
The function is executed (~383), foo does its thing (not a part of this walk through). Assume a return statement in foo's body pushes 200 onto the stack as the value of foo. And return always sets leave true. So the pushzero default (~384) doesn't happen, leave is set back to 0, the cursors are restored:
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) lv(y) ad(3) ad(200) <<<-top ---^
The stack frame is popped (fundone), and the debugger is advised a function return has just happened, (~385..389). foo has done its duty! _enter returns and returns ascend to _expr (~542) which as before toptoi's the two entries on top, adds them, and pushes the sum (~543..545).
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) lv(y) ad(203) <<<-top ---^
_expr looks for another + or -, ascends to _factor... (~596)
[wait a minute! Ascends? How can _expr ascend to _factor which is lower in the grammar? Answer: because of recursion. _factor called _asgn which descended to _expr. So in the call stack _expr is below _factor.]
Ok, so _expr ascends to _factor where _mustFind insists on a ), which is there so the cursor is set just past it (~597).
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) lv(y) ad(203) <<<-top ---^
How deep are the recursive levels of _asgn, I have lost count. But, from the most recent, returns ascend to _term (~560) where the top two on the stack popped and their product pushed. Assume the value of y is 5, so the product is 1015.
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) ad(1015) <<<-top ---^
And finally the returns ascend to the very first _asgn (~466) waiting patiently all this time to do the assignment.
No error, so the actual assign is done by _eq which dips directly into the stack, probably the only code outside of stack.c to do so (~62..63). val points to 1015, lval to x. And x is an lval, so ~75 is happy and testing both classes we end up at ~118. Its an int to int assign, so ~124 gets the 1015 from val, and ~127 puts it into x's storage cell. And ~128 pushes iDatum onto the stack. An assignment, when done, always leaves the value on the stack.
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) ad(1015) ad(1015) <<<-top ---^
Another "Wait a minute." At (~141) two popst() calls pop that value twice, leaving just the lv(x). Isn't that a bug? Hmm!
Cursor: Stack: x=y*(3+foo(z+7)) lv(x) <<<-top ---^
Well, yes. But there is the lv(x) which satisfies the assignment's "when done" requirement. So the (ahem) flaw is just wasted work. It will never show up in a test.
Notice there are 5 calls to _asgn, and each ends by pushing a value. _asgn is happy if it is just a variable or a constant. The parse functions are named after the highest level that may actually be there, but they are all satisfied by a lower level.
And: x is 1015 in its storage allocation.
And so ends our walkthrough.
I would change this code several ways on a modern computer. But I wanted a near clone of the 8080 code, architecture and all. Here are some of my thoughts.
Separate the parsing from the action. Use an action interface, and a separate action module for treatment. Let others ADD new actions, posssibly a code emitter???
Use lex and yacc, or their modern equivalent. Also there are tools today that write a parser from BNF. Googling "tool to write a parser from bnf" generated several interesting sites.
Create a plug-in interface. Let others ADD to the product without recompiling the whole thing.
Fix the few pattern violations. Read the walkthrough to see what I mean by a pattern. For example the return statement follows the pattern but flowing out of a function (no return statement) does not.
Use _name more consistently. It was intended to identify functions private to the particular module, whereas no underscore was public. But you will find several underscore functions in tc.h.
Basically all the above criticims can be summarized as: push the code gently towards object orientation.
The file tc.c is the core file of this version of tiny-c. It is just one of eleven source files for tiny-c. The full source and documentation is available at:
www.github.com/tgibson37/tiny-c
It is GPL licensed, and the license is posted there, too. The Walkthrough document references lines in this file using what I call the "tilda" notation: ~xxx refers to line xxx. If the original file is edited its line numbers will drift, perhaps a lot. The purpose of this appendix is to publish just this file with the line numbers as used in the main document. For the html version each tc.c ~ notation links a few lines above the line number below. Use your browser's back arrow to return to the walkthrough text.
0001 #include "tc.h" 0002 0003 #if defined(_WIN32) 0004 char* defaultLibrary = "pps\\library.tc"; 0005 #else 0006 char* defaultLibrary = "pps/library.tc"; 0007 #endif0008 0009 int varargs = 0; 0010 0011 /************** literals **************/ 0012 char* xif = "if"; 0013 char* xelse = "else"; 0014 char* xint = "int"; 0015 char* xchar = "char"; 0016 char* xwhile = "while"; 0017 char* xreturn = "return"; 0018 char* xbreak = "break"; 0019 char* xendlib = "endlibrary"; 0020 char* xr = "r"; 0021 char* xg = "g"; 0022 char* xlb = "["; 0023 char* xrb = "]"; 0024 char* xlpar = "("; 0025 char* xrpar = ")"; 0026 char* xcomma = ","; 0027 char* newline = "\n"; 0028 char* xcmnt = "/*"; 0029 char* xcmnt2 = "//"; 0030 char* xstar = "*"; 0031 char* xsemi = ";"; 0032 char* xpcnt = "%"; 0033 char* xslash = "/"; 0034 char* xplus = "+"; 0035 char* xminus = "-"; 0036 char* xlt = "<"; 0037 char* xgt = ">"; 0038 char* xnoteq = "!="; 0039 char* xeqeq = "=="; 0040 char* xeq = "="; 0041 char* xge = ">="; 0042 char* xle = "≤"; 0043 char* xnl = "\n"; 0044 char* xvarargs = "..."; 0045 0046 /* stored size of one datum */ 0047 int typeToSize( int class, Type type ) { 0048 if(type==Char)return 1; 0049 else if(type==Int)return sizeof(int); 0050 else eset(TYPEERR); 0051 return 0; /* has to be one of the above */ 0052 } 0053 0054 /* SITUATION: Parsed an assignment expression. Two stack entries, lvalue, datam. 0055 * Effects the assignment. 0056 */ 0057 int _eq() { 0058 int iDatum; /* memcpy into these from pr using val.stuff */ 0059 char cDatum; /* and val.size, giving needed cast */ 0060 void* pDatum; 0061 0062 struct stackentry *val = &stack[nxtstack-1]; /* value (on top) */ 0063 struct stackentry *lval = &stack[nxtstack-2]; /* where to put it */ 0064 if(verbose[VE]){ 0065 fprintf(stderr,"\neq: lval"); 0066 dumpStackEntry(nxtstack-2); 0067 fprintf(stderr,"\neq: val"); 0068 dumpStackEntry(nxtstack-1); 0069 } 0070 void* where = &((*lval).value.up); 0071 int class = (*lval).class; 0072 int type = (*lval).type; 0073 int whereSize = typeToSize(class,type); /* of the lvalue */ 0074 0075 if((*lval).lvalue != 'L') { 0076 eset(LVALERR); 0077 return; 0078 } 0079 0080 if(class==1 && (*val).class==1) { 0081 pDatum = (*val).value.up; 0082 if( (*val).lvalue=='L' ){ 0083 pDatum = (char*)(*(int*)pDatum); /* now its 'A' */ 0084 } 0085 char **where = (*lval).value.up; 0086 *where = (char*)pDatum; 0087 pushst(class, 'A', type, &(*val).value); 0088 } 0089 else if(class==1 && (*val).class==0) { /* ptr = int */ 0090 if( (*val).type != Int ){ 0091 eset(EQERR); 0092 return; 0093 } 0094 if( (*val).lvalue=='L' ) { 0095 iDatum = get_int((*val).value.up); 0096 } 0097 else { 0098 iDatum = (*val).value.ui; 0099 } 0100 pDatum = (void*)iDatum; 0101 char **where = (*lval).value.up; 0102 *where = (char*)pDatum; 0103 pushst(class, 'A', type, &(*val).value); 0104 } 0105 else if(class==0 && (*val).class==1) { /* int = ptr */ 0106 if(type!=Int){ 0107 eset(EQERR); 0108 return; 0109 } 0110 pDatum = (*val).value.up; 0111 if( (*val).lvalue=='L' ){ 0112 pDatum = (char*)(*(int*)pDatum); /* now its 'A' */ 0113 } 0114 iDatum = (int)pDatum; 0115 put_int( (*lval).value.up, iDatum); 0116 pushk(iDatum); 0117 } 0118 else if(class==0 && (*val).class==0) { 0119 if(type==Int){ 0120 if( (*val).lvalue=='L' ) { 0121 iDatum = get_int((*val).value.up); 0122 } 0123 else { 0124 iDatum = (*val).value.ui; 0125 } 0126 if((*val).type==Char) iDatum = iDatum&0xff; 0127 put_int( (*lval).value.up, iDatum); 0128 pushk(iDatum); 0129 } 0130 else if(type==Char){ 0131 if( (*val).lvalue=='L' ) { 0132 cDatum = get_char((*val).value.up); 0133 } 0134 else { 0135 cDatum = (*val).value.uc; 0136 } 0137 put_char( (*lval).value.up, cDatum ); 0138 pushk(cDatum); 0139 } 0140 } 0141 popst();popst(); 0142 } 0143 0144 /******* set error unless already set, capture cursor in errat *******/ 0145 int eset( int err ){ 0146 if(!error){error=err;errat=cursor;} 0147 return error; 0148 } 0149 0150 /* Bump cursor over whitespace. Then return true on match and advance 0151 cursor beyond the literal else false and do not advance cursor 0152 */ 0153 int _lit(char *s){ 0154 while( *cursor == ' ' 0155 || *cursor == '\t' ) ++cursor; 0156 int match = !strncmp(s,cursor, strlen(s)); 0157 if( match ) { 0158 cursor += strlen(s); 0159 return 1; 0160 } 0161 return 0; /* no match */ 0162 } 0163 0164 /* skips balance l-r assuming left is matched. 0165 * Returns 0 on OK, else count of missing ]'s. 0166 */ 0167 int _skip(char l, char r) { 0168 int counter = 1; 0169 while( counter>0 && cursor<endapp ) { 0170 if(*cursor==l)++counter; 0171 if(*cursor==r)--counter; 0172 ++cursor; 0173 }; 0174 if( counter )return counter; 0175 return 0; 0176 } 0177 0178 /* Parse a symbol defining fname, lname. ret: true if symbol. 0179 * Advances the cursor to but not over the symbol, 0180 */ 0181 int _symName() { 0182 char* temp; 0183 while( *cursor == ' ' || *cursor == '\t' ) ++cursor; 0184 temp=cursor; 0185 if( isalpha(*temp) || *temp=='_') { 0186 fname = temp; 0187 } 0188 else return 0; /* not a symbol */ 0189 while( isalnum(*temp) || *temp=='_'){ 0190 ++temp; 0191 } 0192 lname = temp-1; 0193 if(verbose[VP]){ 0194 fprintf(stderr,"\nparsed "); 0195 dumpft(fname,lname); 0196 } 0197 return lname-fname+1; /* good, fname and lname defined */ 0198 } 0199 0200 /****************** some C helper functions **************/ 0201 0202 /* return true if symname matches arg, no state change */ 0203 int _symNameIs(char* name){ 0204 int x = strncmp(fname, name, lname-fname+1); 0205 return( !x ); 0206 } 0207 /* State is not changed by find or mustFind. Returned value is 0208 sole purpose of find. That plus setting err for mustFind. */ 0209 char* find( char* from, char* upto, char c) { 0210 char* x = from; 0211 while( *x != c && x<upto) { 0212 ++x; 0213 } 0214 return x<upto ? x : 0; 0215 } 0216 /* same as find but sets err on no match */ 0217 char* _mustFind( char *from, char *upto, char c, int err ) { 0218 char* x = find(from, upto, c); 0219 if( x ) return x; 0220 else { eset(err); return 0; } 0221 } 0222 0223 /* special find for end of string */ 0224 char* _findEOS( char* x ) { 0225 while( x<endapp) { 0226 if( *x==0 || *x==0x22 ) return x; 0227 ++x; 0228 } 0229 eset(CURSERR); 0230 return 0; 0231 } 0232 0233 /* skip over comments and/or empty lines in any order, new version 0234 tolerates 0x0d, and implements // as well as old /* comments. 0235 */ 0236 void _rem() { 0237 for(;;) { 0238 while( *cursor==0x0a 0239 ||*cursor==0x0d 0240 ||*cursor==' ' 0241 ||*cursor=='\t' 0242 )++cursor; 0243 if( !(_lit(xcmnt)||_lit(xcmnt2)) ) return; 0244 while( *cursor != 0x0a && *cursor != 0x0d && cursor<endapp ) 0245 ++cursor; 0246 } 0247 } 0248 0249 /* SITUATION: int or char is parsed. 0250 * Parses one variable. Makes allocation and symbol entry. 0251 */ 0252 void _varAlloc(Type type, union stuff *vpassed) { 0253 if( !_symName() ) { /*defines fname,lname. True is match.*/ 0254 eset(SYMERR); 0255 return; 0256 } 0257 cursor=lname+1; 0258 if( _lit("(") ){ 0259 vclass = 1; /* array or pointer */ 0260 char* fn=fname; /* localize globals that _asgn() may change */ 0261 char* ln=lname; 0262 if( _asgn() ) alen=toptoi()+1; /* dimension */ 0263 fname=fn; /* restore the globals */ 0264 lname=ln; 0265 char* x = _mustFind(cursor,cursor+5,')',RPARERR); 0266 if(x)cursor = x+1; 0267 } else { 0268 vclass = 0; 0269 alen = 1; 0270 } 0271 newvar(vclass, type, alen, vpassed); 0272 } 0273 0274 /* Situation: parsing argument declarations, passed values are on the stack. 0275 * arg points into stack to an argument of type. 0276 * Gets actual value of arg, calls valloc which parses and sets 0277 * up local with the passed value. 0278 */ 0279 void _setArg( Type type, struct stackentry *arg ) { 0280 union stuff vpassed = (*arg).value; 0281 char* where; 0282 int class = (*arg).class; 0283 int lvalue = (*arg).lvalue; 0284 int stacktype = (*arg).type; 0285 if( lvalue=='L') { 0286 where = (char*)vpassed.up; 0287 if( class==1 ) { 0288 vpassed.up = *((char**)(*arg).value.up); 0289 } 0290 else if( stacktype==Int ) vpassed.ui = get_int(where); 0291 else if( stacktype==Char) vpassed.ui = get_char(where); 0292 /* ui to clear high order byte */ 0293 } 0294 //fprintf(stderr,"\n~295SA type %d passed %d", type, vpassed.ui); 0295 _varAlloc( type, &vpassed); 0296 } 0297 0298 /* SITUATION: Just parsed symbol with class 'E', or special symbol MC. 0299 * Parses the args putting values are on the stack, arg pointing to the first 0300 * of them. 0301 * Sets the cursor to the called function's 'where'. Parses arg decl's 0302 * giving them values from the stack. Executes the function body. 0303 * Pops the locals (vars and values). Restores the caller's stcurs and 0304 * cursor. 0305 */ 0306 0307 void _enter( char* where) { 0308 int arg=nxtstack; 0309 int nargs=0; 0310 if(varargs>0) nargs=varargs-1; 0311 //fprintf(stderr,"\n~311E ABOVE: va %d na %d",varargs,nargs); 0312 if(where)fcn_enter(); 0313 _lit(xlpar); /* optional ( */ 0314 int haveArgs = ! ( _lit(xrpar) 0315 || *cursor==*xlb 0316 || *cursor==*xrb 0317 || *cursor==*xsemi 0318 || *cursor==*xnl 0319 || *cursor==0x0d 0320 || *cursor==*xslash 0321 ); 0322 if ( haveArgs ) { 0323 do { 0324 if(error)return; 0325 if( _asgn()) ++nargs; 0326 else break; /* break on error */ 0327 } while( _lit(xcomma) ); 0328 } 0329 if(error)return; 0330 _lit(xrpar); /* optional ) */ 0331 _rem(); 0332 if(!where) { 0333 if(nxtstack) { 0334 machinecall( nargs ); 0335 varargs=0; 0336 //fprintf(stderr,"\n~336E va %d na %d",varargs,nargs); 0337 } 0338 else eset(MCERR); 0339 return; 0340 } 0341 else { /* ABOVE parses the call args, BELOW parses the called's arg decls */ 0342 //fprintf(stderr,"\n~342E BELOW: va %d na %d",varargs,nargs); 0343 char *localstcurs=stcurs, *localcurs=cursor; 0344 cursor = where; 0345 newfun(); 0346 for(;;) { 0347 _rem(); 0348 if(_lit(xint)) { 0349 do { 0350 _setArg(Int, &stack[arg]); 0351 arg++; 0352 } while(_lit(xcomma)); 0353 _lit(xsemi); /* optional */ 0354 } 0355 else if ( _lit(xchar)) { 0356 do { 0357 //fprintf(stderr," ~358CHAR "); 0358 _setArg(Char, &stack[arg]); 0359 arg++; 0360 } while(_lit(xcomma)); 0361 _lit(xsemi); 0362 } 0363 else if ( _lit(xvarargs) ){ 0364 varargs=nargs+1; 0365 //fprintf(stderr,"\n~362E va %d na %d ",varargs,nargs); 0366 break; 0367 } 0368 else { 0369 break; 0370 } 0371 } 0372 if(!varargs) { 0373 if(arg != nxtstack) { 0374 cursor=localcurs; 0375 stcurs=localstcurs; 0376 eset(ARGSERR); 0377 } 0378 while(nargs>0){ 0379 popst(); 0380 --nargs; 0381 } 0382 } 0383 if(!error)st(); /* <<-- execute fcn's body */ 0384 if(!leave)pushzero(); 0385 leave=0; 0386 cursor=localcurs; 0387 stcurs=localstcurs; 0388 fundone(); 0389 fcn_leave(); 0390 } 0391 } 0392 0393 /********** expression parser *****************/ 0394 0395 /* converts fname..lname inclusive to integer 0396 */ 0397 int _atoi() { 0398 char* x = fname; 0399 int val = 0; 0400 int sign = *x=='-' ? -1 : 1; 0401 if( *x=='-' || *x=='+' ) ++x; 0402 do{ 0403 val = 10*val+*x-'0'; 0404 } while( ++x ≤ lname ); 0405 return sign*val; 0406 } 0407 0408 /* parses constant defining fname..lname which brackets trimmed constant. 0409 * Cursor moved just beyond constant. Returns Type: 0410 */ 0411 Type _konst() { 0412 char* x; 0413 while(*cursor==' ')++cursor; 0414 char c = *cursor; 0415 if( c=='+' || c=='-' || (c>='0'&&c≤'9') ) { 0416 fname = cursor; 0417 do{ 0418 ++cursor; c=*cursor; 0419 } while(c>='0'&&c'9'); 0420 lname=cursor-1; 0421 if(verbose[VP]){ 0422 fprintf(stderr,"\nparsed "); 0423 dumpft(fname,lname); 0424 } 0425 return Int; 0426 0427 } else if(_lit("\"")) { 0428 fname=cursor; 0429 /* set lname = last char, cursor = lname+2 (past the quote) */ 0430 if( x = _findEOS(fname) ) { 0431 lname = x-1; /*before the quote */ 0432 cursor = x+1; /*after the quote */ 0433 *x = 0; 0434 } 0435 else { eset(CURSERR); return Err; } 0436 if(verbose[VP]){ 0437 fprintf(stderr,"\nparsed "); 0438 dumpft(fname,lname); 0439 } 0440 return CharStar; 0441 0442 } else if(_lit("\'")) { 0443 fname=cursor; 0444 /* lname = last char, cursor = lname+2 (past the quote) */ 0445 if( x=_mustFind(fname+1,fname+2,'\'',CURSERR) ) { 0446 lname = x-1; 0447 cursor = x+1; 0448 } 0449 else { eset(CURSERR); return -1; } 0450 if(verbose[VP]){ 0451 fprintf(stderr,"\nparsed "); 0452 dumpft(fname,lname); 0453 } 0454 return Char; 0455 0456 } else return Err; /* no match, Err==0 */ 0457 } 0458 0459 /* An asgn is a reln or an lvalue = asgn. Note that 0460 reln can match an lvalue. 0461 */ 0462 int _asgn(){ 0463 if(_reln()){ 0464 if(_lit(xeq)){ 0465 _asgn(); 0466 if(!error)_eq(); 0467 } 0468 } 0469 return error? 0: 1; 0470 } 0471 0472 /* a RELN is an expr or a comparison of exprs 0473 */ 0474 int _reln(){ 0475 if(_expr()){ 0476 if(_lit(xle)){ 0477 if(_expr()){ 0478 if(topdiff()0)pushone(); 0479 else pushzero(); 0480 } 0481 } 0482 else if(_lit(xge)){ 0483 if(_expr()){ 0484 if(topdiff()>=0)pushone(); 0485 else pushzero(); 0486 } 0487 } 0488 else if(_lit(xeqeq)){ 0489 if(_expr()){ 0490 if(topdiff()==0)pushone(); 0491 else pushzero(); 0492 } 0493 } 0494 else if(_lit(xnoteq)){ 0495 if(_expr()){ 0496 if(topdiff()!=0)pushone(); 0497 else pushzero(); 0498 } 0499 } 0500 else if(_lit(xgt)){ 0501 if(_expr()){ 0502 if(topdiff()>0)pushone(); 0503 else pushzero(); 0504 } 0505 } 0506 else if(_lit(xlt)){ 0507 if(_expr()){ 0508 if(topdiff()<0)pushone(); 0509 else pushzero(); 0510 } 0511 } 0512 else return 1; /* just expr is a reln */ 0513 } 0514 return 0; /* not an expr is not a reln */ 0515 } 0516 0517 /* ;an EXPR is a term or sum (diff) of terms. 0518 */ 0519 int _expr(){ 0520 if(_lit(xminus)){ /* unary minus */ 0521 _term(); 0522 pushk(-toptoi()); 0523 } 0524 else if(_lit(xplus)){ 0525 _term(); 0526 pushk(toptoi()); 0527 } 0528 else _term(); 0529 while(!error){ /* rest of the terms */ 0530 int leftclass = stack[nxtstack-1].class; 0531 int rightclass; 0532 if(_lit(xminus)){ 0533 _term(); 0534 rightclass = stack[nxtstack-1].class; 0535 int b=toptoi(); 0536 int a=toptoi(); 0537 if( rightclass || leftclass) pushPtr(a-b); 0538 else pushk(a-b); 0539 } 0540 else if(_lit(xplus)){ 0541 _term(); 0542 rightclass = stack[nxtstack-1].class; 0543 int b=toptoi(); 0544 int a=toptoi(); 0545 if( rightclass || leftclass) pushPtr(a+b); 0546 else pushk(a+b); 0547 } 0548 else return 1; /* is expression, all terms done */ 0549 } 0550 return 0; /* error, set down deep */ 0551 } 0552 0553 /* ;a term is a factor or a product of factors. 0554 */ 0555 int _term() { 0556 _factor(); 0557 while(!error) { 0558 if(_lit(xstar)){ 0559 _factor(); 0560 if(!error)pushk(toptoi()*toptoi()); 0561 } 0562 else if(_lit(xslash)){ 0563 if(*cursor=='*' || *cursor=='/') { 0564 --cursor; /* opps, its a comment */ 0565 return 1; 0566 } 0567 _factor(); 0568 int denom = toptoi(); 0569 int numer = toptoi(); 0570 int div = numer/denom; 0571 if(!error)pushk(div); 0572 } 0573 else if(_lit(xpcnt)){ 0574 _factor(); 0575 int b=toptoi(); 0576 int a=toptoi(); 0577 int pct = a%b; 0578 if(!error)pushk(pct); 0579 } 0580 else return 1; /* no more factors */ 0581 } 0582 return 0; 0583 } 0584 0585 /* a FACTOR is a ( asgn ), or a constant, or a variable reference, or a function 0586 reference. NOTE: factor must succeed or it esets SYNXERR. Callers test error 0587 instead of a returned true/false. This varies from the rest of the expression 0588 stack. 0589 */ 0590 void _factor() { 0591 union stuff foo; 0592 Type type; 0593 char* x; 0594 if(_lit(xlpar)) { 0595 _asgn(); 0596 if( x=_mustFind( cursor, cursor+5, ')' , RPARERR ) ) { 0597 cursor = x+1; /*after the paren */ 0598 } 0599 } 0600 else if( type=_konst() ) { 0601 /* Defines fname,lname. Returns Type. 0602 void pushst( int class, int lvalue, Type type, void* stuff ); 0603 */ 0604 switch(type){ 0605 case Int: 0606 pushk( _atoi() ); /* integer, use private _atoi */ 0607 break; 0608 case Char: 0609 foo.uc = *fname; 0610 pushst( 0, 'A', type, &foo ); 0611 break; 0612 case CharStar: /* special type used ONLY here */ 0613 foo.up = fname; 0614 pushst( 1, 'A', Char, &foo ); 0615 case Err: 0616 return; 0617 } 0618 } 0619 else if( _symName() ) { 0620 cursor = lname+1; 0621 int where, len, class, obsize, stuff; 0622 if( _symNameIs("MC") ) { 0623 _enter(0); return; 0624 } else { 0625 struct var *v = addrval(); /* looks up symbol */ 0626 if( !v ){ eset(SYMERR); return; } /* no decl */ 0627 char* where = (*v).value.up; 0628 int integer = (*v).value.ui; 0629 int character = (*v).value.uc; 0630 int class=(*v).class; 0631 int type=(*v).type; 0632 int obsize = typeToSize(class,type); 0633 int len=(*v).len; 0634 if( class=='E' ) _enter(where); /* fcn call */ 0635 else { /* is var name */ 0636 if( _lit(xlpar) ) { /* is dimensioned */ 0637 if( !class ) { /* must be class>0 */ 0638 eset(CLASERR); 0639 } else { /* dereference the lvalue */ 0640 /* reduce the class and recompute obsize */ 0641 obsize = typeToSize(--class,type); 0642 /* increment where by subscript*obsize */ 0643 _asgn(); if( error )return; 0644 _lit(xrpar); 0645 int subscript = toptoi(); 0646 if(len-1)if( subscript<0 || subscript>=len )eset(RANGERR); 0647 where += subscript * obsize; 0648 foo.up = where; 0649 pushst( class, 'L', type, &foo); 0650 return; 0651 } 0652 } else { 0653 /* is simple. Must push as 'L', &storagePlace. */ 0654 if(class==1){ 0655 foo.up = &((*v).value.up); 0656 } 0657 else{ 0658 foo.up = where; 0659 } 0660 pushst( class, 'L', type, &foo); 0661 } 0662 } 0663 } 0664 } 0665 else { 0666 eset(SYNXERR); 0667 } 0668 } 0669 0670 /************ scan tools ******************/ 0671 0672 /* skip a possibly compound statement. Shortcoming is brackets 0673 * in comments, they must be balanced. 0674 */ 0675 void _skipSt() { 0676 _rem(); 0677 if( _lit(xlb) ) { /* compound */ 0678 _skip('[',']'); 0679 _rem(); 0680 return; 0681 } 0682 else if( _lit(xif)||_lit(xwhile) ) { 0683 _lit(xlpar); /* optional left paren */ 0684 _skip('(',')'); 0685 _skipSt(); 0686 _rem(); 0687 if(_lit(xelse))_skipSt(); 0688 _rem(); 0689 return; 0690 } 0691 else { /* simple statement, eol or semi ends */ 0692 while(++cursor<endapp) { 0693 if( (*cursor==0x0d)||(*cursor=='\n')||(*cursor==';') )break; 0694 } 0695 ++cursor; 0696 _rem(); 0697 } 0698 } 0699 0700 /* Returns true if user signals quit, or any other error. 0701 * NOTE: MC 2 esets KILL on ESC, but test here for hard loop 0702 */ 0703 0704 #if defined(_WIN32) 0705 int quit() { // mods lrb 0706 int x; 0707 if (kbhit()) { 0708 x = getch_(ECHO); 0709 if (x == ESCAPE) { 0710 eset(KILL); 0711 return 1; 0712 } 0713 } 0714 return 0; 0715 } 0716 #else 0717 int quit() { 0718 int foo[]={0}; /* to avoid tcc error */ 0719 if(kbhit()==0x1b){ 0720 getch_(0); 0721 return escKey(); 0722 } 0723 return 0; 0724 } 0725 #endif 0726 0727 /* Match char or int, else do nothing. If match parse 0728 * all comma separated declarations of that particular type 0729 * making var table entries and allocating value storage. Returns false 0730 * if not a declaration statement, true if it is. Leaves cursor just past 0731 * optional semi. 0732 */ 0733 int _decl() { 0734 Type t; 0735 if( _lit(xchar) ) { 0736 do { 0737 _varAlloc( Char, 0 ); /* 2nd arg is vpassed */ 0738 } while( _lit(xcomma) ); 0739 } else if( _lit(xint) ) { 0740 do { 0741 _varAlloc( Int, 0 ); /* 2nd arg is vpassed */ 0742 } while( _lit(xcomma) ); 0743 } else { 0744 return 0; /* not decl */ 0745 } 0746 _lit(xsemi); /* is decl */ 0747 return 1; 0748 } 0749 0750 /* st(): interprets a possibly compound statement 0751 */ 0752 void st() { 0753 char *whstcurs, *whcurs, *objt, *agin ; 0754 brake=0; 0755 0756 if(quit())return; 0757 _rem(); 0758 stbegin(); 0759 stcurs = cursor; 0760 if(_decl()){ 0761 _rem(); 0762 return; 0763 } 0764 else if( _lit(xlb) ){ /* compound statement */ 0765 for(;;){ 0766 _rem(); 0767 if(leave||brake||error)return; 0768 if(_lit(xrb)){ 0769 _rem(); 0770 return; 0771 } 0772 st(); 0773 } 0774 } 0775 else if(_lit(xif)) { 0776 if(_asgn()) { 0777 if(toptoi()) { 0778 st(); 0779 _rem(); 0780 if(_lit(xelse)) { 0781 _skipSt(); 0782 } 0783 } 0784 else { 0785 _skipSt(); 0786 _rem(); 0787 if(_lit(xelse)) { 0788 st(); 0789 } 0790 } 0791 _rem(); 0792 return; 0793 } 0794 } 0795 else if(_lit(xwhile)) { 0796 _lit(xlpar); /* optional left paren */ 0797 if( !_asgn() )return; /* error */ 0798 _lit(xrpar); 0799 int condition = toptoi(); 0800 if( condition ) { 0801 /* prepare for repeating/skipping while (stcurs) 0802 or object */ 0803 agin = stcurs; 0804 objt = cursor; 0805 st(); 0806 0807 if(brake) { 0808 cursor = objt; /* break: done with the while */ 0809 _skipSt(); /* skip over the object */ 0810 brake = 0; 0811 return; 0812 } 0813 else { 0814 cursor = agin; /* no break: do the entire while again */ 0815 return; 0816 } 0817 } 0818 else { 0819 _skipSt(); 0820 } 0821 } 0822 else if(_lit(xsemi)) { 0823 _rem(); 0824 } 0825 else if(_lit(xreturn)) { 0826 int eos = ( _lit(xrpar) 0827 || *cursor==*xlb 0828 || *cursor==*xrb 0829 || *cursor==*xsemi 0830 || *cursor==*xnl 0831 || *cursor==0x0d 0832 || *cursor==*xslash 0833 ); 0834 if ( eos ) { 0835 pushzero(); /* default return value */ 0836 } 0837 else { 0838 _asgn(); /* specified return value */ 0839 } 0840 leave=1; /* signal st() to leave the compound 0841 statement containing this return */ 0842 return; 0843 } 0844 else if(_lit(xbreak)) { 0845 brake=1; 0846 return; 0847 } 0848 else if( _asgn() ) { /* if expression discard its value */ 0849 toptoi(); 0850 _lit(xsemi); 0851 } 0852 else { 0853 eset(STATERR); 0854 } 0855 } 0856 0857 /* Checks for balanced brackets, cursor to endapp inclusive. 0858 void checkBrackets() { 0859 int count; 0860 while(*(cursor++) != '[' && cursor≤endapp) ; 0861 if(_skip('[',']'))eset(RBRCERR); 0862 } 0863 */ 0864 0865 /* 0866 * Checks for balanced brackets, cursor to stop. 0867 */ 0868 int checkBrackets(char* stop) { 0869 int err; 0870 char* save=endapp; /* _skip uses endapp as limit */ 0871 endapp=stop; 0872 while(cursor<stop) { 0873 while(*(cursor++) != '[' && cursor<stop) ; 0874 if(cursor<stop) { 0875 if(err = _skip('[',']'))return err; 0876 } 0877 } 0878 endapp=save; 0879 return 0; 0880 } 0881 0882 /*********** a variety of dumps for debugging **********/ 0883 char* typeToWord(Type t){ 0884 switch(t) { 0885 case Char: return "Char"; 0886 case Int: return "Int"; 0887 default: return " NOT A TYPE "; 0888 } 0889 } 0890 0891 void dumpHex( void* where, int len ) { 0892 char* w=where; 0893 fflush(stdout); 0894 fprintf(stderr,"\n%x: ",w); 0895 int i; 0896 for( i=0; i<len; ++i )fprintf(stderr," %x",w[i]); 0897 } 0898 0899 /* dump from..to inclusive */ 0900 void dumpft(char *from, char *to ) { 0901 fflush(stdout); 0902 while(from ≤ to) fprintf(stderr,"%c",*(from++)); 0903 } 0904 0905 /* dump the line cursor is in from cursor to nl */ 0906 void dumpLine() { 0907 fflush(stdout); 0908 char* begin = cursor; 0909 char* end = cursor; 0910 while (*end!=0x0a && *end!=0x0d && end<endapp){ /* find end of line */ 0911 ++end; 0912 } 0913 while(begin<end){ 0914 fprintf(stderr,"%c",*begin); 0915 ++begin; 0916 } 0917 } 0918 0919 int stateCallNumber=0; 0920 void dumpState() { 0921 fflush(stdout); 0922 if(stateCallNumber==0){ 0923 fprintf(stderr,"\nADDRESSES (hex)"); 0924 fprintf(stderr,"\npr: %x",pr); 0925 fprintf(stderr,"\nstack: %x",stack); 0926 fprintf(stderr,"\nvartab: %x",vartab); 0927 } 0928 0929 fprintf(stderr,"\n----\nSTATE %d\nparsing: ",stateCallNumber++); 0930 dumpLine(); 0931 fprintf(stderr,"<--end of line--"); 0932 dumpVarTab(); 0933 dumpStack(); 0934 } 0935 0936 /* dump a just parsed piece of pr, typically a name */ 0937 void dumpName() { 0938 fflush(stdout); 0939 char *c = fname; 0940 while( c ≤ lname ) { 0941 pc(*(c++)); 0942 } 0943 } 0944 0945 /**** C tools to deal with typeless storage ****/ 0946 void put_int(char *where, int datum) { 0947 memcpy( where, &datum, sizeof(datum) ); 0948 } 0949 int get_int(char *where) { 0950 int datum; 0951 memcpy( &datum, where, sizeof(datum)); 0952 return datum; 0953 } 0954 void put_char(char *where, char datum) { 0955 memcpy(where, &datum, sizeof(datum)); 0956 } 0957 char get_char(char *where) { 0958 char datum; 0959 memcpy( &datum, where, sizeof(datum)); 0960 return datum; 0961 } 0962 0963 /* 0964 * Saves and (and later restores) cursor, then 0965 * scans program from pr to progend and allocates all externals 0966 * in next fctn layer. An "endlibrary" line causes a new fctn layer 0967 * to be opened and globals done there. 0968 */ 0969 void tclink() { 0970 char* x; 0971 char* savedCursor=cursor; 0972 /* check Brackets from cursor to limit*/ 0973 cursor=pr; 0974 if(checkBrackets(lpr))eset(RBRCERR+1000); 0975 if(checkBrackets(apr))eset(RBRCERR+2000); 0976 if(checkBrackets(endapp))eset(RBRCERR+3000); 0977 if(error)whatHappened(); 0978 cursor=lpr; 0979 newfun(); 0980 while(cursor<endapp && !error){ 0981 char* lastcur = cursor; 0982 _rem(); 0983 if(_lit(xlb)) _skip('[',']'); 0984 else if(_decl()) ; 0985 else if(_lit(xendlib)){ 0986 if(curfun==fun) { /* 1st endlib, assume globals follow */ 0987 newfun(); 0988 curglbl=curfun; 0989 } 0990 else { /* subsequent endlib, 0991 move assummed globals to frame 0 */ 0992 fun[0].lvar = nxtvar-1; /* moved */ 0993 fun[1].fvar = nxtvar; /* globals now empty */ 0994 } 0995 } 0996 else if(_symName()) { /* fctn decl */ 0997 union stuff kursor; 0998 kursor.up = cursor = lname+1; 0999 newvar('E',2,1,&kursor); 1000 if(x=_mustFind(cursor, endapp, '[',LBRCERR)) { 1001 cursor=x+1; 1002 _skip('[',']'); 1003 } 1004 } 1005 else if(*cursor=='#'){ 1006 while(++cursor<endapp) { 1007 if( (*cursor==0x0d)||(*cursor=='\n') )break; 1008 } 1009 } 1010 //printf("~985 %d %d %d %c\n",lpr-pr,apr-pr,cursor-pr,*cursor); 1011 if(cursor==lastcur)eset(LINKERR); 1012 } 1013 cursor = savedCursor; 1014 if(verbose[VL])dumpVarTab(); 1015 } 1016 1017 /* allocate four major areas 1018 */ 1019 int allocStuff() { 1020 int prlen=PRLEN, err; 1021 err = iProperty("pps/tc.prop", "PRLEN", &prlen, PRLEN); 1022 if(err){ 1023 fprintf(stderr,"pps/tc.prop err, running pr[%d]",PRLEN); 1024 } 1025 pr = malloc(prlen); 1026 EPR=pr+prlen; 1027 1028 int funlen=FUNLEN,size; 1029 err = iProperty("pps/tc.prop", "FUNLEN", &funlen, FUNLEN); 1030 if(err){ 1031 fprintf(stderr,"pps/tc.prop err, running fun[%d]",FUNLEN); 1032 } 1033 size = sizeof(struct funentry); 1034 fun = malloc(funlen*size); 1035 efun=fun+funlen*size; 1036 1037 stacklen=STACKLEN; 1038 err = iProperty("pps/tc.prop", "STACKLEN", &stacklen, STACKLEN); 1039 if(err){ 1040 fprintf(stderr,"pps/tc.prop err, running stack[%d]",STACKLEN); 1041 } 1042 stack = malloc(stacklen*sizeof(struct stackentry)); 1043 1044 vtablen=VTABLEN; 1045 err = iProperty("pps/tc.prop", "VTABLEN", &vtablen, VTABLEN); 1046 if(err){ 1047 fprintf(stderr,"pps/tc.prop err, running var[%d]",VTABLEN); 1048 } 1049 vartab = malloc(vtablen*sizeof(struct var)); 1050 //fprintf(stderr,"~1035TC: sizes of pr %d fun %d stack %d var %d\n", 1051 // prlen, funlen, stacklen, vtablen); 1052 }