S5: A Semantics for Modern JavaScript

The JavaScript language isn't static―the ECMAScript committee is working hard to improve the language, and browsers are implementing features both in and outside the spec, making it difficult to understand just what "JavaScript" means at any point in time. Existing implementations aren't much help―their goal is to serve pages well and fast. We need a JavaScript architecture that can help us make sense of the upcoming (and existing!) features of the language.

To this end, we've developed S5, an ECMAScript 5 runtime, built on λJS, with the explicit goal of helping people understand and tinker with the language. We built it to understand the features in the new standard, building on our previous efforts for the older standard. We've now begun building analyses for this semantics, and are learning more about it as we do so. We're making it available with the hope that you can join us in playing with ES5, extending it with new features, and building tools for it.

S5 implements the core features of ES5 strict mode. How do we know this? We've tested S5 against Test262 to measure our progress. We are, of course, not feature complete, but we're happy with our progress, which you can check out here.

A Malleable Implementation

The semantics of S5 is designed to be two things: a language for writing down the algorithms of the specification, and a translation target for JavaScript programs. We've implemented an interpreter for S5, and a desugaring function that translates JavaScript source into S5 programs.

We have a number of choices to make in defining desugaring. The ECMAScript standard defines a whole host of auxiliary functions and library routines that we must model. Putting these implementations directly in the desugaring function would work, but would make desugaring unnecessary brittle, and require recompilation on every minor change. Instead, we implement the bulk of this functionality as an S5 program. The majority of our work happens in an environment file that defines the spec in S5 itself. The desugaring defines a translation from the syntactic forms of JavaScript to the (smaller) language of S5, filled with calls into the functions defined in this environment.

This separation of concerns is what makes our implementation so amenable to exploration. If you want to try something out, you can edit the environment file and rerun whatever tests you care to learn about. Want to try a different implementation of the == operator? Just change the definition, as it was pulled from the spec, at line 300. Want a more expressive Object.toString() that doesn't just print "[object Object]"? That's right here. No changing an interpreter, no recompiling a desugaring function necessary.

The environment we've written reflects the standard's algorithms as we understand them in terms of S5. The desugaring from JavaScript to S5 code with calls into this library is informed by the specification's definitions of expression and statement evaluation. We have confidence in the combination of desugaring and library implementation, given our increasing test coverage. Further, we know how to continue―implement more of the spec and pass more test cases!

Recently, we've taken the idea of a malleable implementation further. The desugaring function produces large S5 programs in order to match the standard with as much fidelity as possible. However, not all analyses or tool-writers want or need all that detail – if an analysis assumes lexical scope, for example, S5 provides more detail than the analysis needs. Assuming lexical scope in the context of (nonstrict) eval is an unsound assumption, so removing this detail needs to be done with that in mind, but it also may may some analyses more tractable. With this in mind, we investigated semantics-altering transformations to the generated code that produce much smaller resulting S5 programs, at the cost of some semantic fidelity (measured by test cases).

More than λJS

S5 is built on λJS, but extends it in three significant ways:

For those that haven't fiddled with getters and setters, they are a new feature introduced in ECMAScript 5 that allow programmer-defined behavior on property access and assignment. Getters and setters fundamentally change how property access and assignment work. They make property assignment interact with the prototype chain, which used to not be the case, and cause syntactically similar expressions to behave quite differently at runtime.

Attributes on objects weren't treated directly in the original λJS. In 5th Edition, they are crucial to several security-relevant operations on objects. For example, the standard specifies Object.freeze(), which makes an object's properties forever unwritable. S5 directly models the writable and configurable attributes that make this operation possible, and make its implementation in S5 easy to understand.

λJS explicitly elided eval() from its semantics. S5 implements eval() by performing desugaring within the interpreter and then interpreting the desugared code. We implement only the strict mode version of eval, which restricts the environment that the eval'd code can affect. With these restrictions, we can implement eval in a straightforward way within our interpreter.

Code: Building on S5

S5 is open source and lives in a Github repository.