Last updated:
0 purchases
genericlexer 1.1.1
Generic Lexer
A generic pattern-based Lexer/tokenizer tool.
The minimum python version is 3.6
Original Author
Eli Bendersky with
this gist last modified on 2010/08
Maintainer
Leandro Benedet Garcia last modified on 2020/11
Version
1.1.0
License
The Unlicense
Documentation
The documentation can be
found here
Example
If we try to execute the following code:
from generic_lexer import Lexer
rules = {
"VARIABLE": r"(?P<var_name>[a-z_]+): (?P<var_type>[A-Z]\w+)",
"EQUALS": r"=",
"SPACE": r" ",
"STRING": r"\".*\"",
}
data = "first_word: String = \"Hello\""
data = data.strip()
for curr_token in Lexer(rules, False, data):
print(curr_token)
Will give us the following output:
VARIABLE({'var_name': 'first_word', 'var_type': 'String'}) at 0
SPACE( ) at 18
EQUALS(=) at 19
SPACE( ) at 20
STRING("Hello") at 21
As you can see differently from the original gist, we are capable of specifying multiple groups per token.
You cannot use the same group twice,
either per token or not because all the regex patterns are merged together to generate the tokens later on.
You may get the values of the tokens this way:
>>> from generic_lexer import Lexer
>>> rules = {
... "VARIABLE": r"(?P<var_name>[a-z_]+): (?P<var_type>[A-Z]\w+)",
... "EQUALS": r"=",
... "STRING": r"\".*\"",
... }
>>> data = "first_word: String = \"Hello\""
>>> variable, equals, string = tuple(Lexer(rules, True, data))
>>> variable
VARIABLE({'var_name': 'first_word', 'var_type': 'String'}) at 0
>>> variable.val
{'var_name': 'first_word', 'var_type': 'String'}
>>> variable["var_name"]
'first_word'
>>> variable["var_type"]
'String'
>>> equals
EQUALS(=) at 19
>>> equals.val
'='
>>> string
STRING("Hello") at 21
>>> string.val
'"Hello"'
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.