i'm back! ^^ long time no see!! x3

back in undergrad, i worked with a piece of math software called macaulay2. it's popular primarily among algebraic geometers and commutative algebraists, as those are the two fields it was designed for.

i thought it'd be fun (since i'm on a pl kick) to try digging through and explaining the internals of its underlying language :3

there's also not too many docs specifically on internals for newcomers to learn from, so i thought it could be nice to share some notes as i dive in.


/// macaulay2 ///

macaulay2 development started in 1992 off an NSF grant, and today is freely available on GitHub (woo!)

the core language was developed from 1993-1994, and since then the language has been continuously built upon to make it more useful for research mathematicians.

it's actively maintained by volunteer developers, many of whom are mathematicians using the software in their work. consequently, Macaulay2 has a ton of interesting packages allowing computation on all kinds of algebraic, geometric, and combintorial objects :)))

i'm lazy so i won't give any code samples here ^^; but the Beginning Macaulay2 Guide in the docs is great way to get a sense of the language :3


/// macaulay2 architecture ///

internally, Macaulay2 has three layers:

  • the D language and compiler (NOT the one at dlang.org, an internal one)
  • the Macaulay2 language and interpreter
  • the Macaulay2 engine
  • the Macaulay2 standard library

the D language is a project-specific systems langauge that compiles to C. sometimes it's also referred to as "SafeC." it has its own bison-generated parser and custom code generation backend that outputs C/C++ (depending on the extension of the source, .d or .dd) which the Macaulay2 build system compiles and links alongside the the main.cpp file.

(i've got a lot of notes on M2's D language too, will write them up sometime)

the Macaulay2 language (also sometimes "M2"), on the other hand, is fully interpreted, with both the lexer/parser and interpreter written in D. this interpreter is called from main.cpp in order to start up the M2 repl.

the Macaulay2 engine is a C++ computation backend called into by Macaulay2 to perform actual computation on mathematical objects.

finally, the Macaulay2 standard library simply consists of a host of Macaulay2 files implementing all kinds of mathematical functionality. most added functionality/bugfixes from research mathematicians ends up here, or in the separate packages mentioned above.

for those following along at home, check

  • M2/M2/Macaulay2/c/ for the D compiler and grammar (scc1.c and grammar.y resp)
  • M2/M2/Macaulay2/d/ for the M2 interpreter (starting at process() in interp.dd)
  • M2/M2/Macaulay2/e/ for the C++ implementation of the Macaulay2 enginea (see engine.cpp)
  • M2/M2/Macaulay2/m2 for the Macaulay2 standard library


/// getting a prompt ///

we start in M2/M2/Macaulay2/bin/main.cpp, which is the main file built into the M2 executable (a cute and handy little repl :3).

main() in that file does a bunch of tasks, including

  • handle arguments
  • initialize M2's garbage collector
  • initialize the Macaulay2 engine (IM2_Initialize())
  • initialize the Python bindings (if present)
  • set up jump points for long jumps in case of an abort
    • (this is because we want the M2 repl to restart the prompt if things crash)

and most importantly, kick off interpFunc:

89 ...
90
91 initializeThreadSupervisor();
92 #if PROFILING
93 struct ThreadTask* profileTask = createThreadTask("Profile", (ThreadTaskFunctionPtr)profFunc, M2_vargs, 0, 0, 0);
94 pushTask(profileTask);
95 #endif
96 struct ThreadTask* interpTask = createThreadTask("Interp", (ThreadTaskFunctionPtr)interpFunc, M2_vargs, 0, 0, 0);
97 pushTask(interpTask);
98 waitOnTask(interpTask);
99
100 ...
101

next, in interpFunc (same file), we

  • set the current thread as the interpreter thread
  • reinitialize thread-local variables from past repl prompts
  • hand off all of the env vars and args to the M2 interpreter
  • set up signal handlers for dumping stack traces

and most importantly, call interp_process():

184 ...
185
186 /*
187 process() in interp.dd is where all the action happens, however, interp__prepare()
188 from interp-tmp.cc is called first. This happens even before main() because all
189 "_prepare()" functions have "__attribute__ ((constructor))" in their declaration.
190 */
191 interp_process();
192
193 clean_up();
194
195 ...

because of how the M2 build system processes D files, functions exported from them are namespaced with the name of the file they came from--so interp_process actually refers to the D function process in d/interp.dd.

process in turn sets up a bunch of error handling, and manages some TTY specifics. it (of course) then calls out to another function, readeval:

612 ...
613
614 ret := readeval(stringTokenFile(startupFile.filename,startupFile.contents),false,false);
615
616 ...

now, startupFile is defined earlier in interp.dd as a cachedFile:

598export startupFile := cachedFile(
599 tostring(Ccode(constcharstar,"startupFile.filename")),
600 tostring(Ccode(constcharstar,"startupFile.contents")));

this applies the D function tostring (D strings differ from C strings) on the C values startupFile.filename and startupFile.contents.

these values can be traced back to startup.c, a file dynamically generated by d/Makefile.in based off d/startup-header.h:

217startup.c: startup-header.h ../m2/startup.m2 @srcdir@/../m2/basictests/*.m2 Makefile
218 : making $@
219 @(\
220 cat @srcdir@/startup-header.h; \
221 echo 'cached_file startupFile = {' ; \
222 echo @abs_top_srcdir@/Macaulay2/m2/startup.m2.in | sed $(BSTRING) ; \
223 echo ',' ; \
224 cat ../m2/startup.m2 | sed $(SSTRING) ; \
225 ...

in particular, we see that startupFile is listed as m2/startup.m2.in, which is a M2 file that primarily

  • defines a suite of M2 helper functions
  • provides the usage information for the M2 binary
  • handles various arguments passed from main.cpp
  • displays the actual prompt

but most importantly calls interpreter(), where interpreter is a variable the startup file assigns to commandInterpreter.

this M2 function is defined in interp.dd via the setupfun D function on line 480:

480setupfun("commandInterpreter",commandInterpreter);

(more on this in a later part), where the D function commandInterpreter providing the second argument is defined just prior (here called from startup.m2.in as interpreter(), with the implicit expression (), the empty sequence):

462commandInterpreter(e:Expr):Expr := (
463 incrementInterpreterDepth();
464 ret :=
465 when e is s:Sequence do (
466 if length(s) == 0 then loadprint("-",newStaticLocalDictionaryClosure(),false)
467 else WrongNumArgs(0,1)
468 )
469 ...

don't worry about the "static local dictionary closure" for now--this is just a fresh closure generated for the command we're about to read and run. as for the "-", that's the name of the file we want to read our input from: M2 treats "-" as a special case meaning "stdin".

finally, loadprint (still in interp.dd) simply passes the "-" on to a function called readeval3 to actually do the "r" and "e" in "repl". we'll get there in a sec though after we catch up one loose end.


/// the loose end (oop) ///

we entirely brushed over how readeval (just readeval, not readeval3) runs our startupFile!!

rest assured, this loose end is quickly handled: readeval is actually just a slim wrapper over readeval3--it just manages file exit hooks and errors:

279readeval(file:TokenFile,returnLastvalue:bool,returnIfError:bool):Expr := (
280 savefe := getGlobalVariable(fileExitHooks);
281 setGlobalVariable(fileExitHooks,emptyList);
282 printout := false; mode := nullE;
283 ret := readeval3(file,printout,newStaticLocalDictionaryClosure(file.posFile.file.filename),returnLastvalue,false,returnIfError);
284 ...

phew! loose end managed.


/// evaluating (aka the hard part) ///

now, our final task is understanding what readeval3 is doing! of course, it's a wrapper around another function, readeval4, just with some additional global variable management.

finally, readeval4 is where the true magic happens. pasting the whole function here is not worth either of our time, so here's a summary of how it parses:

  • peek a character from the input file, and print it through if it's a newline
  • peek a token (made by the lexer, described next post!), and check if it's an errorToken
    • if so, set the promptWanted flag to request a new prompt or the interruptedFlag for a ^C, and return
    • otherwise
      • get the text of the token
      • if a newline or EOF, handle newlines by discarding them, and EOF tokens by returning null
      • otherwise
        • set promptWanted, since we're actually going to parse and return something
        • run parse to parse up to the end of the current statement
          • (it does this by scanning all tokens until it reaches one with the semicolon's precedence or lower)
        • set the bumpLineNumber flag to bump the line number up
        • check that the last token from parse isn't one that would indiate an unmatched bracket, returning with errors if any
          • (brackets have lower precedence than semicolons/newlines/EOFs/etc., so if our last token is one of the latter, we probably missed a closing bracket)
        • use localBind to fetch token scopes and lookup parsed symbols
        • use convert to take the parsed and bound expression and prepare it to be executed
        • run the BeforeEval method
        • use evalexcept to evaluate the parsed, bound, and converted expression
        • run the AfterEval method on the returned result

this leaves four mysteries:

  1. how does parse work??
  2. how does localBind work???
  3. what even is convert doing to make the expression executable????
  4. what evil magic lurks in evalexcept????????

stay tuned and i'll answer them all!! exciting details coming up soon :)


/// conclusion ///

this post is just one in a series where i hope to elucidate Macaulay2 internals so that working on them and fixing specifics about the Macaulay2 interpreter is significantly easier :)

it's also just fun to peek under the hood of such a beautifully strange piece of software, so i hope that the joy i've had learning all this comes through, especially in the upcoming posts on M2's parser and lexer! :3

til then! ^^