Monday, February 11, 2008

Another status update

New tuple slot accessors


For a while now Eduardo Cavazos has been lobbying for changes to tuple slot accessors. He wants slots to be read with name>> instead of tuple-name, and for slots to be written with >>name instead of set-tuple-name. He also wants the stack effect of the setter to be changed from ( value object -- ) to ( object value -- object ). While I liked the new names straight away, it took me a while to appreciate the new stack effects. Seeing how they could interact with the query by example feature of Doug Coleman's database library pushed me over the edge:
<person>
18 25 [a,b] >>age
"Minneasota" >>state
find-by-example

That looks pretty nice; having the setter preserve the tuple on the stack makes it easy to write code in this style, where an empty object is created then the slots filled in.

To experiment with the new slots, get the latest git, and USE: new-slots. This shadows the TUPLE: word with one defining the new accessors. They will become the default soon, however the old accessors will still be generated for as long as it takes developers to update code.

Process launcher timeout support


Sometimes you want to launch a process, but kill it if it takes too long to run. For example, Eduardo's continuous integration does this when running unit tests, because if a unit test hangs, then the build host needs to notify the developers of this.

To implement timeouts, I first refactored the I/O timeout code to be more general. The set-timeout word was moved from io to io.timeouts, and now it can be applied to process exits. If a wait-for-process call doesn't return before the timeout expires, an error is raised. It is also possible to set the timeout in the launch descriptor:
{
{ { +command+ "git pull" } }
{ { +timeout+ 30000 } }
} run-process

Automatic image download tool


I wrote two tools, bootstrap.image.upload and bootstrap.image.download. The first one is only for me to run, it generates and uploads new images together with a text file of their MD5 hashes. The MD5 hashes are not for security, but rather as an optimization: the download tool compares the MD5 of your image against the one on the server, and only if it differs does it download a new image. This is a handy stand-alone tool, but the main reason it exists is again for continuous integration: we want the build machines to grab new images but we don't want them to waste bandwidth over and over again if the images haven't changed.

First-class quotation compositions


While quotations are sequences and arbitrary sequence operations can be applied to them, the compiler cannot infer much information about your code if you modify quotations willy-nilly. So arbitrary quotation construction is something that we do at parse-time and compile-time, in parsing words and macros. At run-time, you still want to partially apply and compose quotations, so for a while now, we've had two light-weight run-time quotation operators, curry for partial application and compose for composition. They're lightweight in the sense that they run in constant time, and also the optimizing compiler optimizes them out altogether. The curry word was primitive and compose was implemented as follows:
: compose [ >r call r> call ] curry curry ;

This is nice, however if you prettyprint a composition it looks ugly:
[ 1 ] [ 2 + ] compose .
[ [ 1 ] [ 2 + ] >r call r> call ]

Instead I made compositions a first-class type of their own; they reference two quotations, and the sequence operations are implemented in the obvious way to make it look like one big quotation:
[ 1 ] [ 2 + ] compose .
[ 1 2 + ]

Another change is behind the scenes and doesn't affect the programmer. Previously curry was a built-in type hardwired into the VM. Now, it's just a tuple, and call is generic:
GENERIC: call

M: quotation call (call) ;

M: curry call dup curry-obj swap curry-quot call ;

M: compose call dup compose-first swap compose-second >r call r> call ;

The (call) word is the VM primitive for calling quotations; it basically compiles the quotation if it hasn't been compiled yet then jumps to the compiled definition.

This is interesting from a theoretical perspective, because it means Factor implements partial application and composition entirely in 'user space'. Since curry is the concatenative equivalent of closures, and Factor already implements lexical scope in the locals vocabulary, this means that in Factor, the whole concept of "closures" are entirely implemented in library code!

Binary reduce for associative operators


Daniel Ehrenberg noticed that there is a way to speed up our product and sum words for larger sequences; instead of scanning the sequence from left to right, accumulating a product or sum, we can split the sequence in two, compute the product of each half (splitting recursively and so on), then multiply the two results. The total algorithmic complexity of this is lower. So I've added Dan's split-reduce word to the core (now named binary-reduce) and changed sum and product to use it.

No comments: