In computer science, scannerless parsing (also called lexerless parsing) performs tokenization (breaking a stream of characters into words) and parsing (arranging the words into phrases) in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. A language grammar is scannerless if it uses a single formalism to express both the lexical (word level) and phrase level structure of the language.
Dividing processing into a lexer followed by a parser is more modular; scannerless parsing is primarily used when a clear lexer–parser distinction is unneeded or unwanted. Examples of when this is appropriate include TeX, most wiki grammars, makefiles, simple application-specific scripting languages, and Raku.
"Token classification" is unneeded which removes the need for design accommodations such as "the lexer hack" and language reserved words (such as "while" in C)
Grammars can be compositional (can be merged without human intervention) [a]
Since the lexical scanning and syntactic parsing are combined, the resulting parser tends to be more complicated and thus harder to understand and debug. The same will hold for the associated grammar, if a grammar is used to generate the parser.
The resulting parser tends to be significantly less efficient than a lexer-parser pipeline with regard to both time and memory.[1]
TREE-META Like META II also is scannerless having builtin lexer functions.
CWIC compiler for writing and implementing compilers. Has token rules as part of its language. Rules in CWIC were compiled into boolean functions returning success or failure.
a This is because parsing at the character level makes the language recognized by the parser a single context-free language defined on characters, as opposed to a context-free language of sequences of strings in regular languages. Some lexerless parsers handle the entire class of context-free languages, which is closed under composition.