parsing - Handling identifiers that begin with a reserved word -
i presently writing own lexer , wondering how correctly handle situation identifier begins reserved word. presently the lexer matches whole first part reserved word , rest separately because reserved word longest match ('self' vs 's' in example below).
for example rules:
reserved_word := self identifier_char := [a-z]|[a-z]
applied to:
selfidentifier
'self' matched reserved_word
, 'i' , onwards matched identifier_char
when whole string should matched identifier_char
s
the standard answer in lexer generators regex matches longest sequence wins. break tie between 2 regexes match exact same amount, prefer first regex in order in appear in definitions file.
you can simulate effect in lexer. "selfidentifier" treated identifier.
if writing efficient lexer, you'll have single finite state machine branches 1 state based on current character class. in case, you'll have several states can terminal states, , are terminal states if fsa cannot shift state. can assign token type each such terminal state; token type unique.
Comments
Post a Comment