Annotation and conversion with brat: a technical note

Quick technical fix if you’re interested in trying out some of the tools developed for use with the brat annotation platform. I wanted to be able to convert brat annotations into BioC format. There’s a tool developed by Antonio Jimeno Yepes et al. for that purpose - it’s called Brat2BioC. This tool has the dependency of brateval, developed by the same group. I tried installing brateval first via maven as instructed, and it built just fine, but Brat2BioC refused to do so.

Image not explicitly related.

Image not explicitly related.

The solution? Turns out Brat2BioC is just looking for the wrong version of brateval. Edit pom.xml such that the line

<version>0.0.1-SNAPSHOT</version>

under

<artifactId>BRATEval</artifactId>

matches the actual version name of the brateval jar file. Then the build should work.

But what about running the thing? I have a set of annotated documents in brat standoff format (i.e., I have a set of .txt docs and corresponding .ann files) now in their own folder named “input”. After at least an hour of troubleshooting I still couldn’t get it to work. Part of the issue is Maven: it doesn’t seem to like loading local jar packages anymore (see this Stackoverflow post). Even avoiding Maven doesn’t seem to help, though. Java can just never seem to find the main class, which could happen for a variety of reasons, but in this case it just needed some very explicit CLASSPATH definitions. Having built BRATEval already as requested by the Brat2BioC README, I copied its jar into the Brat2BioC lib folder, then ran the following:

java -cp ./target/classes:BRAT2BioCConverter-0.0.1-SNAPSHOT.jar:./lib/BRATEval-0.1.0-SNAPSHOT.jar:./lib/bioc.jar:xstream-1.4.4.jar:xmpull-1.1.3.1.jar:xpp3_min-1.1.4c.jar au.com.nicta.csp.bbc.BRAT2BioC input output

This works just fine.

Lessons learned: even relatively simple format-conversion tools can be a headache to get working if you have to troubleshoot things like file locations.