Emile Timothy
Main Page

Projects

Tim and TCompiler

I created a simplistic programming language during the winter of 2020, and named it Tim. Along with it, I built a compiler named TCompiler, which I built using self-written parsing code, node extractors, and stack registers in the assembly language X86-64. In this page, I'm going to describe some of the main components of what went into building the language and the compiler.


This project was motivated by an assignment I did in CS 24 (Computing Systems) during my Sophomore year at Caltech, where we had to create an assembly generator (asmgen) for the TEENY-BASIC programming language. After this, it occurred to me that I could simply expand the code-base for the language to include some additional simplistic features.

Firstly, in the original TEENY-BASIC programming language, variables are defined with the LET command, and all variables have to be single-character uppercase letters, and these variables strictly denote either strings, integers, or floating point numbers. So, for instance, the syntax for setting a variable \(X\) to 3 or setting a variable \(Y\) to \(3.14\) or setting a variable \(Z\) to "Hello, World!" would be:

LET X = 3;
LET Y = 3.14;
LET Z = "Hello, World!";

Secondly, TEENY-BASIC also allows users to print strings, numbers, and even the values of variables through the PRINT command. For instance, the snippets of code below would allow you to print the value of \(X\), print "Hello, World.", and print the number \(1729\), respectively.

PRINT X;
PRINT "Hello, World.";
PRINT 1729;

Thirdly, TEENY-BASIC allows conditionals as a functionality through the IF...THEN...ENDIF command. For example, define the variables \(A=3\) and \(B=6\). It is only valid to compare strings to other strings, and to compare integers or floats to each other. The code below uses the IF command to allow you to check whether A is greater than, less than, or equal to B. It then uses the PRINT command to output an appropriate message to the user.

LET A = 3;
LET B = 6;

IF A < B, THEN PRINT "A is less than B." ENDIF;
IF A = B, THEN PRINT "A is equal to B." ENDIF;
IF A > B, THEN PRINT "A is greater than B." ENDIF;

TEENY-BASIC additionally supports the use of nested conditionals (IF commands nested within IF commands). For instance, given variables \(A = -2\) and \(B = 3\), the code below uses nested conditional statements to tell you which variables are positive.

LET A = -2;
LET B = 3;

IF A > 0, THEN IF B > 0, THEN PRINT "A and B are positive." ENDIF ENDIF;
IF A > 0, THEN IF B <= 0, THEN PRINT "A is positive, B is not positive." ENDIF ENDIF;
IF A <= 0, THEN IF B > 0, THEN PRINT "A is not positive, B is positive." ENDIF ENDIF;
IF A <= 0, THEN IF B <= 0, THEN PRINT "A andd B are not positive." ENDIF ENDIF;

The final, and most computationally powerful, functionality of TEENY-BASIC is the WHILE command. The WHILE command repetitively runs a segment of code until a condition is met. The segment of code (called a sequence) is defined as all the code after the line containing the while statement that is also before a closing ENDWHILE command for the loop. For instance, the following snippet of code can be used to print the message "Hello, World!" 20 times.

LET I = 0;
WHILE I < 20 DO
PRINT "Hello, World!";
ENDWHILE

The applications of WHILE loops are endless. The above code snippet shows how it can automate a fundamentally simple task of printing a message instead of forcing the user to type the same line 20 times, which is terribly space-inefficient. A more practical application of the while loop can be to compute and print the factorial of a non-negative integer stored in variable \(X\).

IF X <= 1, THEN PRINT 1 ENDIF;
LET P = 1;
WHILE X > 1 DO
LET P = P * X;
LET X = X - 1;
ENDWHILE
PRINT P;

However, here is where the TEENY-BASIC functionality ends. The Tim programming language adds onto TEENY-BASIC through the fundamental addition of some obvious commands such as the FOR loop and REPEAT loop. Additionally, Tim supports the exponentiation, modulo, and quotient operators. Furthermore, Tim has an in-built list feature that can allow users to perform basic array operations such as sorting and frequency-counting. Moreover, Tim allows more complex choices of variable names, as long as they are not a pre-defined Tim keyword. Finally, Tim allows to directly view the TYPE of a variable. The remainder of this section explains these functionalities, and finally provides some examples of robust and versatile algorithms that Tim is capable of executing.

In Tim, the FOR loop can be executed using the FOR...TO...NEXT command. The FOR loop instantiates a local counting variable that is not accessible within the code of the program, and runs the code sequence as many times as it takes for the counting variable to traverse from the start value to the end value while it increments by 1 each time. For instance, the following code snippet prints all the numbers between \(5\) and \(20\).

FOR I = 5 TO 20
PRINT I;
ENDFOR

Next, Tim supports the execution of REPEAT loops using the REPEAT...UNTIL command. The original WHILE command is a pre-condition loop: it first checks that a condition is met. If the condition is met, it runs the code sequence, and then continues to keep re-running the code sequence as long as the original condition is met. However, the REPEAT...UNTIL command is a post-condition loop: it runs the code sequence first, and then checks that the condition is not met, and then repeats this process until the condition is met. For instance, the code below prints the first \(N\) Fibonaccci numbers.

LET A = 0;
PRINT A;
LET B = 1;
PRINT B;
LET Count = 2;
REPEAT
LET C = A;
LET A = B;
LET B = A + B;
PRINT B;
LET Count = Count + 1;
UNTIL Count = 20;

Next, in addition to addition, subtraction, multiplication, and division, Tim performs some fundamental operations such as exponentiation with integer powers, as well as finding modulos and quotients. The code snippets below can be used to calculate and print the outputs of \(0.5^2, 2^{10}, 3^{-2}\), and \(6^0\), respectively.

PRINT 0.5^2;
PRINT 2^10;
PRINT 3^(-2);
PRINT 6^0;

Similarly, the code snippet below can be used to calculate and print the remainders (after division) of 1) \(3\) with \(5\), 2) \(8\) with \(5\), 3) \(0.2\) with \(0.1\), and 4) \(-0.4\) with \(0.1\), respectively.

PRINT 3%5;
PRINT 8%5;
PRINT 0.2%0.1;
PRINT (-0.4)%0.1;

Likewise, the code snippet below can be used to calculate and print the quotients of 1) \(3\) with \(5\), 2) \(8\) with \(5\), 3) \(0.2\) with \(0.1\), and 4) \(-0.4\) with \(0.1\), respectively.

PRINT 3//5;
PRINT 8//5;
PRINT 0.2//0.1;
PRINT (-0.4)//0.1;

In addition to the string, integer, and float type, Tim also supports the implementation of lists of either strings, integers, or floats (though not more at once). The Tim lists can be stored as variables by the regular Tim syntax, and must be instantiated with a fixed type and size. Elements in Lists of TYPE string as instantiated with "", whereas elements in Lists of TYPE integer and float are instantiated with 0 and 0.0 respectively. The following code snippets illustrates how to create Tim lists of strings, integers, and floats, respectively, of size \(N=10\).

LET N = 10;
LET X = LIST(STRING, N);
LET Y = LIST(INTEGER, N);
LET Z = LIST(FLOAT, N);

Additionally, the Tim TYPE command can allow users to find the type of a specific variable. For instance, an exhaustive list of Tim types are STRING, INTEGER, FLOAT, LIST_STRING, LIST_INT, LIST_FLOAT, and NONE. The code snippets below use the TYPE command to find the types of various variables, and then prints these types to the user.

LET A = 10;
LET B = "Hello, World!";
LET C = 0.502;
LET D = LIST(STRING, 10);
LET E = LIST(INTEGER, 10);
LET F = LIST(FLOAT, 10);

PRINT TYPE(A);
PRINT TYPE(B);
PRINT TYPE(C);
PRINT TYPE(D;
PRINT TYPE(E);
PRINT TYPE(F);
PRINT TYPE();

Furthermore, Tim lists have three key features. Firstly, it allows 0-valued indexing. So, for instance, the code below can be used to initialize an integer list \(A\) with 3 elements, and sets them to 60, 65, and 70, respectively.

LET X = LIST(INTEGER, 3);
LET X[0] = 60;
LET X[1] = 65;
LET X[2] = 70;

Secondly, Tim lists allow full-stack operations. So, Tim allows operations between an integer or float array and a scalar float or integer, where it uniformly applies the operation with the scalar parameter to every value in the array. For instance, the code snippet below demonstrates the result of full-stack addition of 7.0 to a LIST_FLOAT, the full-stack product of a LIST_INT by 5, the full-stack exponentiation of a LIST_INT from 2, and finally, the full-stack exponentiation of a LIST_INT by 2.0.

LET X = LIST(INTEGER, 3);
LET Y = LIST(FLOAT, 3);
LET z = LIST(INTEGER, 3);

LET X[0] = 60;
LET X[1] = 65;
LET X[2] = 70;
LET Y[0] = 3.2;
LET Y[1] = 3.1;
LET Y[2] = 2.8;
LET Z[0] = 3;
LET Z[1] = 4;
LET Z[2] = 5;

PRINT Y = Y + 0.70;
PRINT X = X * 5;
PRINT X = X ^ 2;
PRINT 2 ^ Z;

The output of these PRINT commands are [3.90, 3.80, 3.50], [300, 325, 350], [3600, 4225, 4900], and [8, 16, 32], respectively.

Thirdly, Tim lists allow pointwise operations. So, for instance, given either two LIST_INT's or two LIST_FLOAT's, Tim allows pointwise operations between the operators by adding a . after the operation. For instance, the code snippet below demonstrates the result of a pointwise addition of two LIST_INT's, the pointwise product of two LIST_FLOAT's, and the modulo operator between two LIST_FLOAT's.

LET A = LIST(INTEGER, 3);
LET B = LIST(INTEGER, 3);
LET X = LIST(FLOAT, 3);
LET Y = LIST(FLOAT, 3);

LET A[0] = 60;
LET A[1] = 65;
LET A[2] = 70;
LET B[0] = 6;
LET B[1] = 9;
LET B[2] = 12;
LET X[0] = 6.2;
LET X[1] = 9.2;
LET X[2] = 12.2;
LET Y[0] = 2.4;
LET Y[1] = 2.25;
LET Y[2] = 1.987;

LET C = A +. B;
LET D = X *. Y;
LET E = X %. Y;

The output of these PRINT commands are [66, 74, 82], [14.88, 20.7, 24.2414], and [1.4, 0.2, 0.278], respectively.

Implementing Robust Algorithms on Tim
This section provides the implementation of several notorious algorithms that can be implemented in Tim. Firstly, we present a code snippet for the insertion-sort algorithm.

LET N = 5;
LET A = LIST(INTEGER, N);
LET A[0] = 12;
LET A[1] = 11;
LET A[2] = 13;
LET A[3] = 5;
LET A[4] = 6;
FOR I = 1 TO N
LET K = A[I];
LET J = I - 1;
WHILE (J >= 0) * (K < A[J]) DO
LET A[J + 1] = A[J];
LET J = J - 1;
ENDWHILE
LET A[J + 1] = K;
ENDFOR
PRINT A;

Secondly, we present a code snippet for an approximation of Euler's constant, \(e\), which is defined as \(e^x = \sum\limits_{k=0}^\infty \frac{x^k}{k!}\). Thus, \(e = \sum\limits_{k=0}^\infty \frac{1}{k!}\). We define the n-pproximation of \(e\) as the sum of first \(n\) terms in the summand, \(e_n = \sum\limits_{k=0}^n \frac{1}{k!}\)

LET N = 3;
LET S = 0.0;
FOR K = 0 TO N
IF K = 0, THEN S = S + 1 ENDIF;
IF K = 1, THEN S = S + 1 ENDIF;
LET P = 1;
IF K > 1, THEN FOR F = 1 TO K
LET P = P * F;
ENDFOR
LET S = S + 1/P;
ENDIF
PRINT S;

Thirdly, we present a code snippet that detects prime numbers.

PRINT X;
LET N = 17;
IF N = 0, THEN PRINT "NOT PRIME" ENDIF
IF N = 1, THEN PRINT "NOT PRIME" ENDIF
LET Prime = 1;
IF N > 1, THEN FOR F = 2 TO N - 1
IF TYPE(N/F) = TYPE(0), THEN
PRINT "NOT PRIME";
LET Prime = 0;
ENDIF
ENDFOR
IF Prime = 1, THEN PRINT "PRIME" ENDIF
ENDIF

Finally, we present a code snippet that checks if a string is a palindrome.

LET X = "TACOCAT";
LET Length = 0;
LET Value = -1;
WHILE TYPE(X[Length]) != NONE DO
LET Length = Length + 1;
ENDWHILE
LET Not = 1;
LET Start = 0;
WHILE (Start < Length) * (Not = 1) DO
IF X[Start] != X[Length] THEN
PRINT
"Not a Palindrome."

LET Not = 0;
ENDIF
LET Length = Length - 1;
LET Start = Start + 1;
ENDWHILE
IF Not = 1, THEN PRINT "Palindrome" ENDIF
ENDWHILE




The conceptual ideas behind the parser and forming nodes to represent parts of the Tim code are really straightforward. By enforcing spaces around keywords or non-functional operators, T-Compiler can identify keywords by doing a simple traversal algorithm that identifies each word and determines the underlying keyword.

Then, depending on the keyword, the word is either stand-alone-keyword or a loop that has a corresponding end-symbol (such as ENDIF, ENDFOR, ENDWHILE, and UNTIL).

If the word is a stand-alone-keyword, the program scans ahead until it either reaches the end of the code or the next keyword, and groups these tasks as a node that corresponds to the meaning of the keyword. If the word is a loop symbol, the program scans ahead until it either reaches the ending demarcation, and groups these tasks as a node that corresponds to the meaning of the keyword.

For each sequence that the algorithm has identified, it recursively breaks these tasks to smaller tasks, which it passes to the compiler. I've attached the corresponding code for the parser and node-formation algorithms (which are self-equipped with more technical descriptions) below.



X86-64 Commands
Next, to execute the tasks, the compiler converts each task into a statement in assembly-language through a simple X86-64 command conversion, which is compiled by a simple assembly-language interpreter. The key details to note here are that variables, local values, and lists are stored on recursive stack-frames, and that operations are executed using a stack. So, for instance, to add 2 and 2 together. One would first push 2 onto the stack, then push another 2 onto the stack, and then finally push a + onto the stack. This would cause the '+,2,2' to become a 4. Typically, more complex tasks might involve computing intermediary values or results. These results are stored in the relatively high-speed rdi or rsi registers, whereas final results are typically stored in the relatively low-speed rax stack, from where the result is often inputted to the user or used somewhere else.
Compiler Code