|
Lexical Analysis Libraries
Simple lexical analysis libraries for JavaScript and Python
|
view on
github
|
This is a set of lexical analizers for language tokenizing. Currently there are libraries for processing JavaScript, Python, CSS, and XML/HTML with source code in JavaScript and Python 2/3.
It was primary written to address some edge case JavaScript parsing issues found in several major applications (Notepad++, Firefox, Sublime Text, Github/Ace.) These cases usually involve regular expressions or sign-prefixed numbers.
Files named lex.* are the base classes; files named lexlang.* are the language descriptor generation files.
Example: lex.js and lexpy.js are the files needed for Python code processing running on a JavaScript interpreter
The general format for using these libraries is:
lex.* filelexlang.* filelexlang = lexlang.gen(lex);lex.Lexer object with the descriptor as the first argument, and the input string as the secondget_token method until it returns null (or language equivalent.)
When not returning a null value, get_token will otherwise return a Token object with 4 fields:
text – the token string
type – the type constant of the token
flags – flags for the tokenLexer
state – the state the token was generated in
INVALID,
KEYWORD,
LITERAL,
IDENTIFIER,
NUMBER,
STRING,
REGEX,
OPERATOR,
WHITESPACE,
COMMENT
INVALID,
KEYWORD,
LITERAL,
IDENTIFIER,
NUMBER,
STRING,
OPERATOR,
WHITESPACE,
COMMENT
INVALID,
WHITESPACE,
COMMENT,
STRING,
WORD,
OPERATOR,
AT_RULE,
SEL_TAG,
SEL_CLASS,
SEL_ID,
SEL_PSEUDO_CLASS,
SEL_PSEUDO_ELEMENT,
SEL_N_EXPRESSION,
NUMBER,
COLOR
COMMENT,
CDATA,
TEXT,
RAW_DATA,
TAG_OPEN,
TAG_CLOSE,
TAG_NAME,
ATTRIBUTE,
ATTRIBUTE_WHITESPACE,
ATTRIBUTE_OPERATOR,
ATTRIBUTE_STRING
Lexer):flags.MEMBER, // indicates the word is a member (identifier_word.member_word)flags.BRACKET, // this operator is a bracket of some sortflags.BRACKET_CLOSE, // this operator is a closing bracket... // Additional token flag constants can be found by opening the library's source
For additional help, view some of these test files, as examples are often more useful than wordy documentation.