ANTLR (ANother Tool for Language Recognition) is a Java based framework for generating parsers from user specified grammars. The latest major version (v4) of the tool is based on the Adaptive LL(*) parsing algorithm developed by Professor Terence Parr from the University of San Francisco. ANTLR 4 is even more useful for parsing if you are working with different target languages like Java, Ruby and JavaScript. A collection of grammars, that is readily available from their GitHub repository allows for easy integration in your Java project. In this article, I will describe two maven plugins that make it easy for developers to work with ANTLR 4 grammars.

A typical workflow for working with an existing grammar (say ECMAScript.g4) for ANTLR 4 consists of downloading the ANTLR 4 library (antlr-4.5-complete.jar) from the website and running it as follows:

java -jar antlr-4.5-complete.jar ECMAScript.g4
java ECMAScript*.java

From the given grammar file, ANTLR will first generate the corresponding lexer (ECMAScriptLexer.java) and parser (ECMAScriptParser.java) files. These files can then be added in your project or complied and used directly to parser the target language (ECMAScript in this case). Similarly there is a tool to test the grammars interactively from command prompt. It can be invoked as follows:

java org.antlr.v4.runtime.misc.TestRig ECMAScript r -tree

The -tree option will print the parse tree (on the command prompt) once the input is parsed. To show a dialog box that show the parse tree we can use -gui option.

As you can probably imagine when working with multiple grammars in the same project it becomes tedious to run the tools and test them like this. Fortunately there are two excellent maven plugins that can be used for this purpose.

Plugin for Generating Parsers

For generating the parsers we can use the ANTLR v4 Maven plugin. In the simplest configuration, just add the following to the project pom file.

    <plugins>
      <plugin>
        <groupId>org.antlr</groupId>
        <artifactId>antlr4-maven-plugin</artifactId>
        <version>4.5</version>
        <configuration>
          <listener>true</listener>
          <visitor>true</visitor>
        </configuration>
        <executions>
          <execution>
            <goals>
              <goal>antlr4</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      ...
    </plugins>

The default location for the grammar files is in the director src/main/antlr4/. The plugin will produce .java files for the generated parser in the output director target/generated-sources/antlr4/. The generated java files are automatically submitted for compilation by the plugin. You can also set the configuration in the plugin to generate a listener and visitor interface for the generated parsers. Thus, this plugin simplifies the task of generating parsers for multiple grammars in the same project. Another thing that is often required is, to test the generated parsers with a large number of test cases from a folder. The next plugin shows how to achieve that.

Plugin for Testing Parsers

The maven mojo for testing ANTLR 4 grammars is available from the GitHub repo. Unfortunately, there is no release on maven central, so you will need to build it and install the artifact in your local repository to use it. Once built, you can include the test plugin by adding the following to the pom file of the project.

    <plugins>
      <plugin>
        <groupId>org.antlr</groupId>
        <artifactId>antlr4test-maven-plugin</artifactId>
        <version>1.0-SNAPSHOT</version>
        <configuration>
          <packageName>com.sourceclear.parsers</packageName>
          <grammarName>ECMAScript</grammarName>
          <verbose>true</verbose>
          <showTree>true</showTree>
          <entryPoint>prog</entryPoint>
          <exampleFiles>src/test/examples/</exampleFiles>
          <grammarFiles>src/main/antlr4/com/sourceclear/parsers/</grammarFiles>
        </configuration>
      </plugin>
      ...
    <plugins>

You need to define the folder which contains the grammar file and the folder which has all the test cases (exampleFiles). You also need to provide the grammar name. The plugin will then test all the files from the test folder with the chosen grammar. Other options allow printing the parse tree (showTree) and showing detailed parsing information (verbose). One limitation of this plugin is that you can test only one grammar at a time as the configuration requires you to set the grammar name. However, it is still better than having to test all the examples manually.

Hopefully, this article will help you be more productive while working with ANTLR 4 grammars and you will also find these plugins useful. On more thing, the ANTLR runtime is not just limited to Java and there are implementations for several other languages including C#, Python and JavaScript.

Mark Curphey, Vice President, Strategy Mark Curphey is the Vice President of Strategy at CA Veracode. Mark is the founder and CEO of SourceClear, a software composition analysis solution designed for DevSecOps, which was acquired by CA Technologies in 2018. In 2001, he founded the Open Web Application Security Project (OWASP), a non-profit organization known for its Top 10 list of Most Critical Web Application Security Risks. Mark moved to the U.S. in 2000 to join Internet Security Systems (acquired by IBM), and later held roles including director of information security at Charles Schwab, vice president of professional services at Foundstone (acquired by McAfee), and principal group program manager, developer division, at Microsoft. Born in the UK, Mark received his B.Eng, Mechanical Engineering from the University of Brighton, and his Masters in Information Security from Royal Holloway, University of London. In his spare time, he enjoys traveling, and cycling.

Love to learn about Application Security?

Get all the latest news, tips and articles delivered right to your inbox.

 

 

 

contact menu