I transpile AutoHotkey v2 code to C# using ANTLR4 and Roslyn. An example using only a few grammar elements, described by these rules:
singleExpression := singleExpression, for example a := 1 or a := b := 1. White spaces are optional and newlines are allowed on both sides of the assignment operator.
a:=
1
is valid. (a := 1) := 2 causes 2 to be assigned to a.a := 1 2 concatenates the digits to 12 and assigns to a. End of line is allowed only if the concatenation is inside parenthesis or brackets.
a := 1
hello
would be considered two statements: assignment of 1 to a and function call of hello function.
a := (1
2)
is considered one and a is assigned 12. Explicit concatenation is also possible with the . operator, in which case there must be a white space/newline on both sides of it. If there aren't white spaces on both sides then it's an object member access.::, followed by either another hotkey (on a separate line) or a statement. For example a::b := 1 means "create a hotkey for the key a which then assigns 1 to variable b". a::MsgBox triggers a function call for MsgBox.::, followed by another key. For example a::b creates functionality where pressing a sends b instead. A remap takes priority over hotkey, so if the second key identifier matches a key name it's considered a remap, otherwise a hotkey. a::MsgBox is a hotkey only because a key named MsgBox doesn't exist.I'm trying to write the grammar performant. The expression statement a := 1 repeated 300,000 times is parsed and executed by AutoHotkey in < 2 seconds, whereas the following simplified grammar takes about 5 seconds in C# only to parse. I'd consider acceptable parsing performance < 10 seconds.
Simple.g4:
grammar Simple;
options {
caseInsensitive = true;
}
program: sourceElements EOF;
sourceElements: sourceElement+;
sourceElement
: statement EOL
| hotkey EOL
| remap EOL
| EOL+
;
hotkey
: HotkeyTrigger WS? statement
;
remap
: RemapKey
;
statement
: expressionStatement
| functionStatement
;
expressionStatement
: singleExpression (s? ',' s? singleExpression)*
;
singleExpression
: singleExpression WS singleExpression
| singleExpression s '.' s singleExpression
| <assoc = right> singleExpression WS? ':=' WS? singleExpression
| primaryExpression
;
primaryExpression
: Identifier
| primaryExpression ('.' primaryExpression)+ // Member access
| DecimalLiteral
| '(' singleExpression ')'
;
functionStatement
: primaryExpression
| primaryExpression WS (singleExpression (WS? ',' WS? singleExpression?)*)
;
s: (WS | EOL)+;
RemapKey : HotkeyCharacter '::' HotkeyCharacter;
HotkeyTrigger : HotkeyCharacter '::';
OpenParen : '(';
CloseParen : ')';
Comma : ',';
Dot : '.';
Assign : ':=';
DecimalLiteral : '0' | [1-9] [0-9_]*;
Identifier : IdentifierStart IdentifierPart*;
WS : [\t ]+;
EOL : [\r\n]+;
UnexpectedCharacter : . ;
fragment IdentifierPart : IdentifierStart | [\p{Mn}] | [\p{Nd}] | [\p{Pc}] | '\u200C' | '\u200D';
fragment IdentifierStart: [\p{L}] | [$_];
fragment HotkeyCharacter
: 'F1'
| 'Enter'
| ~[`\r\n ]
;
Example C#:
using System.Text;
using Antlr4.Runtime;
using Antlr4.Runtime.Atn;
using System.Diagnostics;
namespace AntlrCSharp
{
class Program
{
private static void Main(string[] args)
{
try
{
string input = "";
StringBuilder text = new StringBuilder();
string filePath = @"test.txt";
try
{
string fileContent = File.ReadAllText(filePath);
text.Append(fileContent);
}
catch (FileNotFoundException)
{
Console.WriteLine($"The file at {filePath} was not found.");
}
catch (Exception ex)
{
Console.WriteLine($"An error occurred: {ex.Message}");
}
StartSimpleParser(text);
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex);
}
}
public static void StartSimpleParser(StringBuilder text)
{
Console.WriteLine("Start");
AntlrInputStream inputStream = new AntlrInputStream(text.ToString());
SimpleLexer simpleLexer = new SimpleLexer(inputStream);
CommonTokenStream commonTokenStream = new CommonTokenStream(simpleLexer);
SimpleParser simpleParser = new SimpleParser(commonTokenStream);
/*
foreach (var token in SimpleLexer.GetAllTokens())
{
Console.WriteLine($"Token: {SimpleLexer.Vocabulary.GetSymbolicName(token.Type)}, Text: '{token.Text}'" + (token.Channel == MainLexer.Hidden ? " (hidden)" : ""));
}
*/
simpleParser.ErrorHandler = new BailErrorStrategy();
simpleParser.AddErrorListener(new DiagnosticErrorListener());
simpleParser.Interpreter.PredictionMode = PredictionMode.LL_EXACT_AMBIG_DETECTION;
SimpleParser.ProgramContext programContext = simpleParser.program();
Console.WriteLine("Parsed");
MainVisitor visitor = new MainVisitor();
visitor.Visit(programContext);
Console.WriteLine("End");
}
}
}
This grammar has a few problems:
singleExpression s? ':=' s? singleExpression causes reportAttemptingFullContext error with LL_EXACT_AMBIG_DETECTION.RemapKey definition HotkeyCharacter '::' HotkeyCharacter means I have to separately parse it later in the visitor.How do I resolve these issues?
