js-language-tooling-talk

JavaScript Language Tooling

Benjamin Woodruff

http://benjam.info

My Background

Internship at Facebook
Worked on "JS Infrastructure"
Moved Facebook.com to ESLint

Why JavaScript?

Lingua Franca of the Web
Modern JS is great language!
Reasonably fast
Multiple independent implementations
Tons of existing tooling to build on top of
Hacker News likes it

In This Talk...

Language theory
Let's fix JS
Tools you can use today

Programming Languages 101

Code → Tokens → AST → Constraints → Transformer → Interpreter

(Some of these are optional)

Code → Tokens → AST → Constraints → Transformer → Interpreter

Lexing and Parsing

Goal: Convert text to an easily traversed representation.

Lexing produces tokens

/\s+/                => whitespace
/"[^"]*"/            => string
/#.+$/               => comment
/[_a-z][_a-z0-9]*/i  => identifier
/[0-9]+/             => integer

⇩

_"foo" 42bar#buzz

⇩

indentifier string whitespace integer identifier comment

⇩

indentifier string            integer identifier

Abstract Syntax Tree

Goal: Convert the tokens to a tree

function parse(tokens) {
  const body = [];
  while (tokens.hasNext()) {
    body.push(parseStatement(tokens));
  }
  return {type: 'Program', body};
}

function parseStatement(tokens) {
  if (tokens.next().type === 'FunctionKeyword') {
    return parseFunction();
  } else if (tokens.next().type === 'Identifier') {
    return parseCallExpression();
  } else {
    throw new SyntaxError();
  }
}

function parseFunction(tokens) {
  const args = [];
  const body = [];

  tokens.consume('FunctionKeyword');
  tokens.consume('(');
  while (tokens.next() !== ')') {
    args.push(parseExpression());
  }
  tokens.consume(')');

  tokens.consume('{');
  while (tokens.next() !== '}') {
    body.push(parseExpression());
  }
  tokens.consume('}');
  return {type: 'FunctionDeclaration', args, body};
}

// ...

Recursive tree traversal!

Problem

That's a lot of code to write
Programmers are lazy

Solution

We're compiler writers
Let's make a shorthand language (DSL)
And a parser-generator

PEG.js

program
  = _ body:(s:statement _ {return s;})+ {
    return {type: 'Program', body: body};
  }

_ 'whitespace'
  = [ \t\r\n]*

statement
  = function
  / callExpression

function
  = 'function' _ '('
  _ args:(expr:expression _ {return expr;})*
  _ ')' _ '{'
  _ body:(s:statement _ {return s;})*
  _ '}' {
    return {
      type: 'FunctionDeclaration',
      args: args,
      body: body
    };
  }

expression
  = 'expr'

callExpression
  = 'fname' _ '('
  _ args:(expr:expression _ {return expr;})*
  _ ')' {
    return {
      type: 'CallExpression',
      args: args
    };
  }

Problem

Some constructs aren't "context free", and can't be parsed this way.

def foo():
    bar()

Example: Identation-based languages don't have clear start/end tokens. How do you tell when to stop parsing a construct?

Cheating

Extend the lexer, and insert indent/dedent tokens!

def foo():
    bar()

1,0-1,3:    NAME    u'def'
1,4-1,7:    NAME    u'foo'
1,7-1,8:    OP      u'('
1,8-1,9:    OP      u')'
1,9-1,10:   OP      u':'
1,10-1,11:  NEWLINE u'\n'
2,0-2,4:    INDENT  u'    '
2,4-2,7:    NAME    u'bar'
2,7-2,8:    OP      u'('
2,8-2,9:    OP      u')'
3,0-3,0:    DEDENT  ''
3,0-3,0:    ENDMARKER   ''

Try this at home using import tokenize!

Programming Languages 101

Code → Tokens → AST → Constraints → Transformer → Interpreter

Constrainer

Type checking
Errors on undefined variables
Verify argument counts
Stylistic contraints

JavaScript doesn't do any of this. But a linter does!

JS: What's broken?

Dynamic typing
Weak typing
== vs ===
Two null-ish types (undefined and null)
Callback hell
No block-level scoping
Implicit globals
Wasted bytes in whitespace
No modules
No classes
No private properties
Objects must have string keys
for..in
with blocks
ASI (Automatic Semicolon Insertion)
Function Expression and Statement forms
new keyword
Implicit and explicit eval
switch-case-break fallthrough
Octal literals

Fixing it

Add our own contrainer
Use ES2015 and ES2016
"Minify" our JavaScript
Don't write JavaScript in JavaScript

Round One

Minifiers and Linters

Low-hanging fruit: Minification

Remove comments and whitespace
Tokenize and strip
- Pro: Easy to implement
- Pro: Very fast
- Pro: Easy to perform on a stream
- Con: Missed opportunities

Modern Minification

Build an AST
Peephole optimization (syntactic find-replace)
"Mangle" local variable names
Remove obviously dead code

Advanced Minification

Static analysis of dynamic languages is hard.

var obj = {foo: 1, bar: 2};
console.log(obj[prompt('What property should I access?')]);

Akin to the halting problem

Make basic well-defined assumptions about the code
Add annotations to provide hints about usage
Moderate improvement, lots of extra work
Usually avoided

Build your own AST

Google Closure Compiler (GCC)
Uglify (including v2)
YUI Compressor
JSMin

SpiderMonkey/ESTree

Let's steal Mozilla's format

Popular Parsers

Esprima (ESTree)
Acorn (ESTree)
Babylon (Acorn fork + ES2015/2016)
Espree (Esprima fork + ES2015)
Shift (New AST format)

http://astexplorer.net/

Linting

Pattern-match parts of the AST
Generate warnings based on matches

Linting

module.exports = function(context) {

    return {

        "MemberExpression": function(node) {

            if (node.object.name === "console") {
                context.report(node, "Unexpected console statement.");
            }

        }
    };

};

Popular Linters

JSLint
JSHint
JSCS
ESLint
Babel (maybe)

Static typing

TypeScript (Language superset)
Flow (Language extension)
Closure Compiler (Magic comments)

Flow: Type Inference

/* @flow */
function foo(x) {
  return x * 10;
}
foo('Hello, world!');

hello.js:5:5,19: string
This type is incompatible with
  hello.js:3:10,15: number

Flow: Type Annotations

/* @flow */
function foo(x: string, y: number): string {
  return x.length * y;
}
foo('Hello', 42);

hello.js:3:10,21: number
This type is incompatible with
  hello.js:2:37,42: string

Programming Languages 101

Code → Tokens → AST → Constraints → Transformer → Interpreter

CoffeeScript

Uses jison parser-generator (Bison in JavaScript)
Changes syntax (ruby-like), but (usually) not semantics
Fresh start: can fix many JS issues
Influential in the ES2015 standard

Compile-to-JS Problems

Debugging
Debugging
Debugging

Source Maps

ES2015

function update(callback) {
  $.ajax({
    url : 'example.com',
    type: 'GET',
    success: function(data) {
      console.log(data.something);
      callback(data);
    }
  })
}

function update() {
  return $.ajax('example.com').then((data) => {
    console.log(data.something);
    return data;
  }
}

ES2016

function update() {
  return $.ajax('example.com').then((data) => {
    console.log(data.something);
    return data;
  }
}

async function update() {
  const data = await $.ajax('example.com');
  console.log(data.something);
  return data;
}

Problem?

JS-to-JS Compilation

Flow and Typescript
Traceur
Babel

Babel

Array comprehensions
Arrow functions
Async functions
Async generator functions
Classes
Class properties
Computed property names
Constants
Decorators
Default parameters
Destructuring
Exponentiation operator
For-of
Function bind
Generators
Generator comprehensions
Let scoping
Modules
Module export extensions
Object rest/spread
Property method assignment
Property name shorthand
Rest parameters
React
Spread
Template literals
Type annotations
Unicode regex

Miss pointers?

Emscripten

ASM.js

function GeometricMean(stdlib, foreign, buffer) {
  "use asm";

  var exp = stdlib.Math.exp;
  var log = stdlib.Math.log;
  var values = new stdlib.Float64Array(buffer);

  function logSum(start, end) {
    start = start|0;
    end = end|0;

    var sum = 0.0, p = 0, q = 0;

    // asm.js forces byte addressing of the heap by requiring shifting by 3
    for (p = start << 3, q = end << 3; (p|0) < (q|0); p = (p + 8)|0) {
      sum = sum + +log(values[p>>3]);
    }

    return +sum;
  }

  function geometricMean(start, end) {
    start = start|0;
    end = end|0;

    return +exp(+logSum(start, end) / +((end - start)|0));
  }

  return { geometricMean: geometricMean };
}

ASM.js

Strict subset of ES3
Works in every browser
Fully sandboxed
Firefox AOT compiles to native code

"usually within a factor of 2 slowdown over native compilation with clang"

JS Compilation Output

AST → Transform → ??? → JavaScript

String Concatenation
AST Reprinter

Recast

var output = recast.print(ast).code;

See Also: Effective JavaScript Codemods by @cpojer

Takeaway

Modern JS is pretty cool
JavaScript's faults are fixable
JS is an awesome compilation target

JavaScript Language Tooling Benjamin Woodruff http://benjam.info

JavaScript Language Tooling – Benjamin Woodruff

GatorLUG

JavaScript Language Tooling – Benjamin Woodruff

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();