Given some program, can I identify what lexemes (or tokens) are present in it. For example, consider this program in elden (yes, that is what I am calling it)

main() = {
    if x == 4 {
        print "x is equal to 4";
    } else {
        print "x is not equal to 4";
    }
}

We need to identify that main is a keyword, there are parenthesis which contain arguments, left curly brace is a start of a function…and so on

We make a enum called Token to identify these “parts” of my program:

pub enum Token {
    Delimter(Delimeter),
    Operator(Operator),
    Literal(Type),
    Keyword(Keyword),
}

I have abstracted the details of the implementation for each, here is the overview:

  • A delimiter includes braces, brackets, semicolon. Slightly quirky delimiter is the double quote, the value inside it will be a string literal that we need to process when we encounter it. Otherwise, we won’t know if it is a variable or a literal
  • Operator includes add, subtract, multiply, equality, less than. Something to keep in mind when lexing: some operators are two characters like == or !=
  • Literal is any identifier (variable), string, number that is used (hard coded) in the program
  • Keywords are strings reserved to have a specific meaning and usage

How it works:

  • We scan each type of character until we get something that is not the same type. For example, if I am parsing “let x = 5”: scan let (stop since ’ ’ is encountered) scan ’ ’ (stop since a character is encountered) scan x (stop since ’ ’ is encountered) scan ’ ’ (stop since = is encountered) …
  • Once a specific token is scanned, we return (Token, remaining) from the function call. Then, we use the remaining to do the same thing. So:
Lexer("let x = 5") -> (Token(let), "x = 5") 
Lexer("x = 5") -> (Token(x), "= 5")
Lexer("x = 5") -> (Token(x), "= 5")
Lexer("x = 5") -> (Token(x), "= 5")

We can collect the tokens and we have: Token(let), Token(x), Token(=), Token(5)

Example output of token from running lexer:

[Keyword(Main), Delimiter(LeftParen), Delimiter(RightParen), Operator(Equal),
Delimiter(LeftBrace), Keyword(If), Literal(Identifier("x")), Operator(EqualEqual), 
Literal(Number(4)), Delimiter(LeftBrace), Keyword(Print), 
Literal(String("x is equal to 4")), Delimiter(SemiColon), Delimiter(RightBrace), 
Keyword(Else), Delimiter(LeftBrace), Keyword(Print), 
Literal(String("x is not equal to 4")), Delimiter(SemiColon), 
Delimiter(RightBrace), Delimiter(RightBrace)]