parsing - Handling identifiers that begin with a reserved word -


i presently writing own lexer , wondering how correctly handle situation identifier begins reserved word. presently the lexer matches whole first part reserved word , rest separately because reserved word longest match ('self' vs 's' in example below).

for example rules:

reserved_word := self identifier_char := [a-z]|[a-z] 

applied to:

selfidentifier 

'self' matched reserved_word , 'i' , onwards matched identifier_char when whole string should matched identifier_chars

the standard answer in lexer generators regex matches longest sequence wins. break tie between 2 regexes match exact same amount, prefer first regex in order in appear in definitions file.

you can simulate effect in lexer. "selfidentifier" treated identifier.

if writing efficient lexer, you'll have single finite state machine branches 1 state based on current character class. in case, you'll have several states can terminal states, , are terminal states if fsa cannot shift state. can assign token type each such terminal state; token type unique.


Comments

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -