Why not use classic parser generators?

Overview

 

Classical tools like yacc, antlr, parsec, etc... all work the same way: you provide them some grammar specification and they produce some source code. This source code is the parser, ready to be compiled.

 

Despite the comfort, we cannot use these tools because of reasons deriving directly from the requirements:

  • Role of indentation
  • Scannerless parsing
  • Complex sub-languages in bodies
  • Dynamic syntax
  • Parrallel parsing

 

 

 

Role of indentation

 

Did you know that indentation sensitive languages cannot be expressed as context free grammars? Therefore, the most common workaround is to keep track of indentation in the tokenizer and insert fictive INDENT/DEDENT tokens to be able to use classic parsing techniques afterwards nevertheless.

 

However, we choosed a completely different approach by making indentation having a central role in our parser.

 

 

Scannerless line parsers

 

Since Arplan can mix languages in a single statement/line, we cannot separate the tokenizing and parsing process. Indeed, different languages recognize differently tokens. For some, ">>" might be one token, for others it might be two and yet for others it might be invalid. Hence, the tokenizing and parsing must be merged into a single process: scannerless parsing.

 

Well, it would be possible but in fact more complicated than scannerless parsing since a lot of effort should be but to switch tokenizers along with keeping track of what language we are in and when to swith back. Separating the parsing in two steps (tokenizing/parsing) is purely arbitrary and is usually done for convinience. In our case scannerless parsing is both more powerful and simplier.

 

 

Complex sub-languages in bodies

 

Classical tools are not suited to define multiple grammars and combine them in various ways. The fact that an indentation can give rise to parsing in a completely different language is hard to support by classical parser generator tools. First, importing grammars is not allowed and therefore one giant grammar including all sub-grammars should be done. This is hardly doable because of scannerless, indentation, inherent complexity... Even if it was, it would be a nightmare to maintain. Moreover, it lacks of independance, orthogonality, extendability...

 

 

Dynamic syntax

 

Since you can import syntaxes (grammars) like you import modules in Arplan, the parsing should take this into account. Depending on the syntaxes imported, the source code file should be parsed accordingly. Taking this into account cannot be done by classical parser generators.

 

Moreover, we need a way to easely use/modify/extend syntax. Or, better said, that the user can do it because adding custom syntax handling is a feature of the language.

 


Parallel parsing

 

Since end-of-line is also an end-of-statement in Arplan, it is possible to parse many lines in parallel! This is a novel feature not supported by classical tools which are purely sequencial.