CMU-CS-17-113
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-17-113

Reasonably Programmable Syntax

Cyrus Omar

May 2017

Ph.D. Thesis

CMU-CS-17-113.pdf


Keywords: Syntax, notation, parsing, type systems, module systems, macro systems, hygiene, pattern matching, bidirectional typechecking, implicit dispatch

Programming languages commonly provide "syntactic sugar" that decreases the syntactic cost of working with certain standard library constructs. For example, Standard ML builds in syntactic sugar for constructing and pattern matching on lists. Third-party library providers are, justifiably, envious of this special arrangement. After all, it is not difficult to find other situations where library-specific syntactic sugar might be useful [97]. For example, (1) clients of a "collections" library might like syntactic sugar for finite sets and dictionaries; (2) clients of a "web programming" library might like syntactic sugar for HTML and JSON values; (3) a compiler writer might like syntactic sugar for the terms of the object language or various intermediate languages of interest; and (4) clients of a “chemistry” library might like syntactic sugar for chemical structures based on the SMILES standard [16].

Defining a "library-specific" syntax dialect in each of these situations is problematic, because library clients cannot combine dialects like these in a manner that conserves syntactic determinism (i.e. syntactic conflicts can and do arise.) Moreover, it can become difficult for library clients to reason abstractly about types and binding when examining the text of a program that uses unfamiliar forms. Instead, they must reason transparently, about the underlying expansion. Typed, hygienic term-rewriting macro systems, like Scala's macro system [23], while somewhat more reasonable, offer limited syntactic control.

This thesis formally introduces typed literal macros (TLMs), which give library providers the ability to programmatically control the parsing and expansion, at "compile-time", of expressions and patterns of generalized literal form. Library clients can use any combination of TLMs in a program without needing to consider the possibility of syntactic conflicts between them, because the context-free syntax of the language is never extended (instead, literal forms are contextually repurposed.) Moreover, the language validates each expansion that a TLM generates in order to maintain useful abstract reasoning principles. Most notably, expansion validation maintains:

  • a type discipline, meaning that the client can reason about types while holding the literal expansion abstract; and
  • a strictly hygienic binding discipline, meaning that the client can always be sure that:
  1. spliced terms, i.e. terms that appear within literal bodies, cannot capture bindings hidden within the literal expansion; and
  2. the literal expansion does not refer to definition-site or applicationsite bindings directly. Instead, all interactions with bindings external to the expansion go explicitly through spliced terms or parameters.
In short, we formally define a programming language in the ML tradition with a reasonably programmable syntax.

298 pages

Thesis Committee:
Jonathan Aldrich (Chair)
Robert Harper
Karl Crary
Eric Van Wyk (University of Minnesota)

Frank Pfenning, Head, Computer Science Department
Andrew W. Moore, Dean, School of Computer Science




Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu