macaulay2 internals part 1: repl startup
i'm back! ^^ long time no see!! x3
back in undergrad, i worked with a piece of math software called macaulay2. it's popular primarily among algebraic geometers and commutative algebraists, as those are the two fields it was designed for.
i thought it'd be fun (since i'm on a pl kick) to try digging through and explaining the internals of its underlying language :3
there's also not too many docs specifically on internals for newcomers to learn from, so i thought it could be nice to share some notes as i dive in.
/// macaulay2 ///
macaulay2 development started in 1992 off an NSF grant, and today is freely available on GitHub (woo!)
the core language was developed from 1993-1994, and since then the language has been continuously built upon to make it more useful for research mathematicians.
it's actively maintained by volunteer developers, many of whom are mathematicians using the software in their work. consequently, Macaulay2 has a ton of interesting packages allowing computation on all kinds of algebraic, geometric, and combintorial objects :)))
i'm lazy so i won't give any code samples here ^^; but the Beginning Macaulay2 Guide in the docs is great way to get a sense of the language :3
/// macaulay2 architecture ///
internally, Macaulay2 has three layers:
- the D language and compiler (NOT the one at dlang.org, an internal one)
- the Macaulay2 language and interpreter
- the Macaulay2 engine
- the Macaulay2 standard library
the D language is a project-specific systems langauge that compiles to C. sometimes it's also referred
to as "SafeC." it has its own bison-generated parser
and custom code generation backend that outputs C/C++ (depending on the extension of the source,
.d
or .dd
) which the Macaulay2 build system compiles and links alongside the the main.cpp
file.
(i've got a lot of notes on M2's D language too, will write them up sometime)
the Macaulay2 language (also sometimes "M2"), on the other hand, is fully interpreted, with both
the lexer/parser and interpreter written in D. this interpreter is called from main.cpp
in order
to start up the M2 repl.
the Macaulay2 engine is a C++ computation backend called into by Macaulay2 to perform actual computation on mathematical objects.
finally, the Macaulay2 standard library simply consists of a host of Macaulay2 files implementing all kinds of mathematical functionality. most added functionality/bugfixes from research mathematicians ends up here, or in the separate packages mentioned above.
for those following along at home, check
M2/M2/Macaulay2/c/
for the D compiler and grammar (scc1.c
andgrammar.y
resp)M2/M2/Macaulay2/d/
for the M2 interpreter (starting atprocess()
ininterp.dd
)M2/M2/Macaulay2/e/
for the C++ implementation of the Macaulay2 enginea (seeengine.cpp
)M2/M2/Macaulay2/m2
for the Macaulay2 standard library
/// getting a prompt ///
we start in M2/M2/Macaulay2/bin/main.cpp
, which is the main file built into the M2
executable
(a cute and handy little repl :3).
main()
in that file does a bunch of tasks, including
- handle arguments
- initialize M2's garbage collector
- initialize the Macaulay2 engine (
IM2_Initialize()
) - initialize the Python bindings (if present)
- set up jump points for long jumps in case of an abort
- (this is because we want the
M2
repl to restart the prompt if things crash)
- (this is because we want the
and most importantly, kick off interpFunc
:
89 ...
90
91 initializeThreadSupervisor();
92 #if PROFILING
93 struct ThreadTask* profileTask = createThreadTask("Profile", (ThreadTaskFunctionPtr)profFunc, M2_vargs, 0, 0, 0);
94 pushTask(profileTask);
95 #endif
96 struct ThreadTask* interpTask = createThreadTask("Interp", (ThreadTaskFunctionPtr)interpFunc, M2_vargs, 0, 0, 0);
97 pushTask(interpTask);
98 waitOnTask(interpTask);
99
100 ...
101
next, in interpFunc
(same file), we
- set the current thread as the interpreter thread
- reinitialize thread-local variables from past repl prompts
- hand off all of the env vars and args to the M2 interpreter
- set up signal handlers for dumping stack traces
and most importantly, call interp_process()
:
184 ...
185
186 /*
187 process() in interp.dd is where all the action happens, however, interp__prepare()
188 from interp-tmp.cc is called first. This happens even before main() because all
189 "_prepare()" functions have "__attribute__ ((constructor))" in their declaration.
190 */
191 interp_process();
192
193 clean_up();
194
195 ...
because of how the M2 build system processes D files, functions exported from them are namespaced
with the name of the file they came from--so interp_process
actually refers to the D function
process
in d/interp.dd
.
process
in turn sets up a bunch of error handling, and manages some TTY specifics. it (of course)
then calls out to another function, readeval
:
612 ...
613
614 ret := readeval(stringTokenFile(startupFile.filename,startupFile.contents),false,false);
615
616 ...
now, startupFile
is defined earlier in interp.dd
as a cachedFile
:
598 export startupFile := cachedFile(
599 tostring(Ccode(constcharstar,"startupFile.filename")),
600 tostring(Ccode(constcharstar,"startupFile.contents")));
this applies the D function tostring
(D strings differ from C strings) on the C values
startupFile.filename
and startupFile.contents
.
these values can be traced back to startup.c
, a file dynamically generated by d/Makefile.in
based off d/startup-header.h
:
217 startup.c: startup-header.h ../m2/startup.m2 @srcdir@/../m2/basictests/*.m2 Makefile
218 : making $@
219 @(\
220 cat @srcdir@/startup-header.h; \
221 echo 'cached_file startupFile = {' ; \
222 echo @abs_top_srcdir@/Macaulay2/m2/startup.m2.in | sed $(BSTRING) ; \
223 echo ',' ; \
224 cat ../m2/startup.m2 | sed $(SSTRING) ; \
225 ...
in particular, we see that startupFile
is listed as m2/startup.m2.in
, which is a M2 file
that primarily
- defines a suite of M2 helper functions
- provides the usage information for the
M2
binary - handles various arguments passed from
main.cpp
- displays the actual prompt
but most importantly calls interpreter()
, where interpreter
is a variable the startup file
assigns to commandInterpreter
.
this M2 function is defined in interp.dd
via the setupfun
D function on line 480:
480 setupfun("commandInterpreter",commandInterpreter);
(more on this in a later part), where the D function commandInterpreter
providing the second
argument is defined just prior (here called from startup.m2.in
as interpreter()
, with the
implicit expression ()
, the
empty sequence):
462 commandInterpreter(e:Expr):Expr := (
463 incrementInterpreterDepth();
464 ret :=
465 when e is s:Sequence do (
466 if length(s) == 0 then loadprint("-",newStaticLocalDictionaryClosure(),false)
467 else WrongNumArgs(0,1)
468 )
469 ...
don't worry about the "static local dictionary closure" for now--this is just a fresh closure
generated for the command we're about to read and run. as for the "-"
, that's the name of the
file we want to read our input from: M2 treats "-" as a special case meaning
"stdin".
finally, loadprint
(still in interp.dd
) simply passes the "-"
on to a function called
readeval3
to actually do the "r" and "e" in "repl". we'll get there in a sec though after we catch
up one loose end.
/// the loose end (oop) ///
we entirely brushed over how readeval
(just readeval
, not readeval3
) runs our startupFile
!!
rest assured, this loose end is quickly handled: readeval
is actually just a slim wrapper over
readeval3
--it just manages file exit hooks and errors:
279 readeval(file:TokenFile,returnLastvalue:bool,returnIfError:bool):Expr := (
280 savefe := getGlobalVariable(fileExitHooks);
281 setGlobalVariable(fileExitHooks,emptyList);
282 printout := false; mode := nullE;
283 ret := readeval3(file,printout,newStaticLocalDictionaryClosure(file.posFile.file.filename),returnLastvalue,false,returnIfError);
284 ...
phew! loose end managed.
/// evaluating (aka the hard part) ///
now, our final task is understanding what readeval3
is doing! of course, it's a wrapper
around another function, readeval4
, just with some additional global variable management.
finally, readeval4
is where the true magic happens. pasting the whole function here is not worth
either of our time, so here's a summary of how it parses:
- peek a character from the input file, and print it through if it's a newline
- peek a
token
(made by the lexer, described next post!), and check if it's anerrorToken
- if so, set the
promptWanted
flag to request a new prompt or theinterruptedFlag
for a ^C, and return- otherwise
- get the text of the token
- if a newline or EOF, handle newlines by discarding them, and EOF tokens by returning null
- otherwise
- set
promptWanted
, since we're actually going to parse and return something- run
parse
to parse up to the end of the current statement
- (it does this by scanning all tokens until it reaches one with the semicolon's precedence or lower)
- set the
bumpLineNumber
flag to bump the line number up- check that the last token from
parse
isn't one that would indiate an unmatched bracket, returning with errors if any
- (brackets have lower precedence than semicolons/newlines/EOFs/etc., so if our last token is one of the latter, we probably missed a closing bracket)
- use
localBind
to fetch token scopes and lookup parsed symbols- use
convert
to take the parsed and bound expression and prepare it to be executed- run the
BeforeEval
method- use
evalexcept
to evaluate the parsed, bound, and converted expression- run the
AfterEval
method on the returned result
this leaves four mysteries:
- how does
parse
work?? - how does
localBind
work??? - what even is
convert
doing to make the expression executable???? - what evil magic lurks in
evalexcept
????????
stay tuned and i'll answer them all!! exciting details coming up soon :)
/// conclusion ///
this post is just one in a series where i hope to elucidate Macaulay2 internals so that working on them and fixing specifics about the Macaulay2 interpreter is significantly easier :)
it's also just fun to peek under the hood of such a beautifully strange piece of software, so i hope that the joy i've had learning all this comes through, especially in the upcoming posts on M2's parser and lexer! :3
til then! ^^