/mar 9, 2015

Useful Maven Plugins for working with ANTLR 4 Grammars

By Asankhaya Sharma

ANTLR (ANother Tool for Language Recognition) is a Java based framework for generating parsers from user specified grammars. The latest major version (v4) of the tool is based on the Adaptive LL(*) parsing algorithm developed by Professor Terence Parr from the University of San Francisco. ANTLR 4 is even more useful for parsing if you are working with different target languages like Java, Ruby and JavaScript. A collection of grammars, that is readily available from their GitHub repository allows for easy integration in your Java project. In this article, I will describe two maven plugins that make it easy for developers to work with ANTLR 4 grammars.

A typical workflow for working with an existing grammar (say ECMAScript.g4) for ANTLR 4 consists of downloading the ANTLR 4 library (antlr-4.5-complete.jar) from the website and running it as follows:

java -jar antlr-4.5-complete.jar ECMAScript.g4
java ECMAScript*.java

From the given grammar file, ANTLR will first generate the corresponding lexer (ECMAScriptLexer.java) and parser (ECMAScriptParser.java) files. These files can then be added in your project or complied and used directly to parser the target language (ECMAScript in this case). Similarly there is a tool to test the grammars interactively from command prompt. It can be invoked as follows:

java org.antlr.v4.runtime.misc.TestRig ECMAScript r -tree

The -tree option will print the parse tree (on the command prompt) once the input is parsed. To show a dialog box that show the parse tree we can use -gui option.

As you can probably imagine when working with multiple grammars in the same project it becomes tedious to run the tools and test them like this. Fortunately there are two excellent maven plugins that can be used for this purpose.

Plugin for Generating Parsers

For generating the parsers we can use the ANTLR v4 Maven plugin. In the simplest configuration, just add the following to the project pom file.

    <plugins>
      <plugin>
        <groupId>org.antlr</groupId>
        <artifactId>antlr4-maven-plugin</artifactId>
        <version>4.5</version>
        <configuration>
          <listener>true</listener>
          <visitor>true</visitor>
        </configuration>
        <executions>
          <execution>
            <goals>
              <goal>antlr4</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      ...
    </plugins>

The default location for the grammar files is in the director src/main/antlr4/. The plugin will produce .java files for the generated parser in the output director target/generated-sources/antlr4/. The generated java files are automatically submitted for compilation by the plugin. You can also set the configuration in the plugin to generate a listener and visitor interface for the generated parsers. Thus, this plugin simplifies the task of generating parsers for multiple grammars in the same project. Another thing that is often required is, to test the generated parsers with a large number of test cases from a folder. The next plugin shows how to achieve that.

Plugin for Testing Parsers

The maven mojo for testing ANTLR 4 grammars is available from the GitHub repo. Unfortunately, there is no release on maven central, so you will need to build it and install the artifact in your local repository to use it. Once built, you can include the test plugin by adding the following to the pom file of the project.

    <plugins>
      <plugin>
        <groupId>org.antlr</groupId>
        <artifactId>antlr4test-maven-plugin</artifactId>
        <version>1.0-SNAPSHOT</version>
        <configuration>
          <packageName>com.sourceclear.parsers</packageName>
          <grammarName>ECMAScript</grammarName>
          <verbose>true</verbose>
          <showTree>true</showTree>
          <entryPoint>prog</entryPoint>
          <exampleFiles>src/test/examples/</exampleFiles>
          <grammarFiles>src/main/antlr4/com/sourceclear/parsers/</grammarFiles>
        </configuration>
      </plugin>
      ...
    <plugins>

You need to define the folder which contains the grammar file and the folder which has all the test cases (exampleFiles). You also need to provide the grammar name. The plugin will then test all the files from the test folder with the chosen grammar. Other options allow printing the parse tree (showTree) and showing detailed parsing information (verbose). One limitation of this plugin is that you can test only one grammar at a time as the configuration requires you to set the grammar name. However, it is still better than having to test all the examples manually.

Hopefully, this article will help you be more productive while working with ANTLR 4 grammars and you will also find these plugins useful. On more thing, the ANTLR runtime is not just limited to Java and there are implementations for several other languages including C#, Python and JavaScript.

Related Posts

By Asankhaya Sharma

Dr. Asankhaya Sharma is the Director of Software Engineering at Veracode. Asankhaya is a cyber security expert and technology leader with over a decade of experience in creating security products for industry, academia and open-source community. He is passionate about building high performing teams and taking innovative products to market. He is also an Adjunct Professor at the Singapore Institute of Technology.