Friday, March 22, 2013

EclipseCon 2013 - Last Minute Preview


Now that I know how to find the way from the Logan Intl Airport to the EclipseCon venue at the Seaport WTC in Boston, it's time for some shameless advertising. Since the conference schedule is again packed with deep technical content and all of you have only limited time, I think it's only fair to tell you in advance what you should expect from the sessions that I am giving (of course all of them are highly recommended ;-).


Mon 1:00PM - 4:00PM: Getting Started With Xtend
Monday is Tutorial Day. In the afternoon, Sven and I will give you a jump start with Xtend, a new programming language which makes the day-to-day tasks of us Java developers a real pleasure. We listened to you and prepared some new and entertaining tasks where you get your hands on Active Annotations and some new, interesting puzzlers. To put a long story short: If you want to learn about the hot new programming language that is developed at Eclipse, come and join our tutorial.

Wed 4:15PM - 4:50PM: Null-Safety on Steroids
I'm happy to tell you that the annotation based null-ness analysis of the Eclipse JDT is getting better and better. Since the recent milestones, they do include fields into the analysis and allow to inherit the null specification from super types so the analysis results become much more reliable and easier to adopt. Nevertheless, the JDTs approach is sometimes still based on assumption which I consider ... how shall I put that ... not really pragmatic. In this session, I want to outline the pros and cons of the current state of null analysis in Eclipse. Furthermore, I will talk about other approaches to tackle the occasional NPE that each of us developers is familiar with. I want to discuss the implications of the different solutions and offer advise on how to deal with them.

Thu 11:00AM - 11:35AM: Xtext - More Best Practices
My third session at this year's EclipseCon is a follow-up talk to another one that I gave in Reston, last year. In this years edition of the Xtext - Best Practices, I will focus on other topics, especially on the adoption of the Xbase expressions. If you want to learn more about those, I can also highly recommend Jan's talk on Java DSLs with Xtext on Tuesday.

Anyway, there are still some things to prepare and there is never enough time for polishing. Obviously there are a lot more interesting sessions scheduled than I can list here. I'm really looking forward to a great conference and an intense week packed with interesting discussions. See you in Boston!


Thursday, March 21, 2013

Pimp My Visitors

One of the most noteworthy features of Xtend 2.4 are the Active Annotations. They allow you to participate in the compile process of Xtend code to the Java source code. In fact, you implement some kind of a mini code generator and hook into the Xtend compiler - and all this via a lightweight library. And the IDE is fully aware of all the changes that you do to the generated Java artifacts. The astonishing about it, that this approach allows you to develop the annotation processor and the client code side by side.

Are you facing a repetitive coding problem and want to automate that? Nothing could be simpler. Just annotate your class and implement an Active Annotation and you are good.

Which brings me to design patterns. Those often require a lot of boiler plate code since these patterns describe a blue print on how several objects interact with each other to solve a specific problem. So they are quite useful but also verbose by definition. One of the most tedious examples is the visitor pattern. Since I actually like to use a visitor to handle heterogeneous data structures (you know, decoupling and externalized traversal can be quite convenient) I decided to write a small annotation that creates all the fluff around this pattern on the fly.

In order to implement a visitor, I just have to annotate the root type in the hierarchy and all the accept methods as well as the base class of the visitor implementation are automatically unfolded. You don't even have to define the base class for the visitor itself. The following small example basically expands to the very same, verbose Java code as in the example on Wikipedia .


Especially amazing is the fact, that this allows to define different kinds of visitors easily. Your accept-method has to take additional arguments? Just list them in the prototype method signature. You want to return a result? Nothing's easier than that - simply declare a different return type. The base implementation for all concrete visitors is already known? Just add the method body to the prototype and it will be moved to the visitor itself. Have a look at the complete example to see how beautiful this approach is. If you want to learn more about Active Annotations, you may want to dive into the documentation on xtend-lang.org and install Xtend to get your hands dirty.

Monday, January 21, 2013

Java Hacks - Changing Final Fields

Changing final fields? Really? Which may sound crazy at a first glance may be helpful in order to implement mocks or fix-up libraries that don't expose the state that you really wanted them to expose. And after all there is not always a fork me button available. But really: Final fields? Yes, indeed. You shall never forget: There is no spoon.

Let's consider the following data class Person that we want to hack.
Once a person was instantiated, it is not possible to change the value of the field name, is it?

Reflection To The Rescue

Fortunately - or unfortunately - Java allows to access fields reflectively, and if (ab)used wisely, it is possible to change their value, too - even for final fields. Key is the method Field#setAccessible. It allows to circumvent visibility rules - which is step one - and interestingly it also implicitly allows change the value of instance fields reflectively, if they were marked as accessible.
Modifying static fields is a little trickier. Even if #setAccessible was invoked, the runtime virtual machine will throw an IllegalAccessException (which I would expect anyway) because one 'Can not set static final my.field.Type field' even though that was perfectly ok for instance fields. And since there is still no spoon - there is a way out. And again it's based on reflection, but this one's a little trickier. If we don't just set the field accessible but also change its modifiers to non-final, it's ok to alter the value of the field.
This hack will allow to change the value of static fields, too. That is, as long as they are not initialized with literals that will be inlined by the compiler. Those include number literals and string literals, which are compiled directly into the call site to save some computation cycles (yes, refering to String constants from other classes does not introduce a runtime dependency to those classes). Despite those cases, other common static field types like loggers or infamous singletons can be easily modified at runtime and even (re)set to null.

The complete code looks like this and as promised, it will print the new name of the person to the console and the changed default name, too. But keep in mind: Don't do this at home!

Thursday, December 13, 2012

Fixed Checked Exceptions - The Xtend Way

Recently I stumbled across a post about checked exceptions in Sam Beran's Java 8 blog. What he basically described is a means to reduce the burden when dealing with legacy APIs that abused used Java's checked exceptions. His example is build around the construction of a java.net.URI which may throw an URISyntaxException.
Actually the URI class is not too bad, since it already provides a static factory URI#create(String) that wraps the checked URISyntaxException in an IllegalArgumentException, but you get the idea.

An Attempt to Tackle Checked Exception

Now, that Java will finally get lambda expressions with JSR 335, Sam suggests to use some utility class in order to avoid littering your code with try { .. } catch () statements. For example, Throwables#propagate could take care of that boilerplate:
Does that blend? I don't think so. That's still way too much code in order to deal with something that I cannot handle at all in the current context - and compared to the Java7 version, it's not much of an improvement either. The latter does not even carry the stacktrace so the actual code would more look like this:
According to the number of characters and taking into account that this snippet does not even tell the reader which sort of exception was expected, I would always go for the classic try { .. } catch ().

Or I'd Use Xtend.

Xtend will transparently throw the checked exception if you don't care about it. However, if you want to catch and handle it, feel free to do so. For the common other cases, the Xtend compiler uses the sneaky throw mechanism that is used in project Lombok, too. It just uses some generics magic to trick the Java compiler thus allowing to throw a checked exception without declaring it. You are free to catch that one whenever you want. There is no need to wrap it into some sort of RuntimeException just to convince the compiler that you know what you are doing.

By the way: You could of course use something like Throwables with Xtend, too:
That's what I consider fixing checked exceptions.

Tuesday, November 27, 2012

Performance Is Not Obvious

Recently there was a post in the Xtext forum about the runtime performance of a particular function in the Xtext code base:


Ed Merks suggested to rewrite the method to a loop iteration instead of a recursive function and to save one invocation of the method eContainer such as the new implementation shall become at "least twice as fast."


I really liked the new form since is much easier to debug and to read and from that point a the change is definitely worth it. However, as I recently did a deep dive into the runtime behavior of the JVM, I doubted that the change would have to much impact on the actual performance of that method. So I took the time and sketched a small caliper benchmark in order to double check my intuition.


As it turns out, the refactored variant is approximately 5 to 20% faster for the rare cases where a lot of objects have to be skipped before the result was found and takes the same time for the common case where the requested instance is found immediately. So it's not even close to the expected improvements. But what's the take-away?


Before applying optimizations it's worthy to measure the impact. It may be intuitive to assume that cutting the number of method invocation down to a fraction of the original implementation - after all it was a recursive implementation before - saves a lot of time but actually the JVM is quite powerful and mature at inlining and producing optimized native code. So I can only repeat the conclusion of my talk about Java Performance (which I proposed for EclipseCon Boston, too):
  • [..] Write Readable and Clear Code. [..]
(David Keenan)
  • [..] slavishly follow a principle of simple, clear coding that avoids clever optimizations [..] (Caliper FAQ)
  • Performance advice has a short shelf-life 
(B. Goetz)
From that point of view, the refactored implementation is definitely worth it, even though there is no big impact on the runtime behavior of the method.

Wednesday, November 14, 2012

Xtext Corner #9 - About Keywords, Again

In the last weeks, I compiled some information about proper usage of keywords and generally about terminals in Xtext:
  • Keywords may help to recover from parse errors in a sense that they guide the parser.
  • It's recommended to use libraries instead of a hard wired keyword-ish representation for some built in language features.
  • Data type rules are the way to go if you want to represent complex syntactical concepts as atomic values in the AST.
In addition to these hints, there is one particular issue that arises quite often in the Xtext forum. People often wonder why their grammar does not work properly for some input files but perfectly well for others. What it boils down to in many of these examples is this:
Spaces are evil!
This seems to be a bold statement but let me explain why I think that keywords should never contain space characters. I'm assuming you use the default terminals but actually this fits for almost all terminal rules that I've seen so far. There is usually a concept of an ID which is defined similar to this:

terminal ID: 
  ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9')*;

IDs and Keywords

IDs start with a character followed by an arbitrary number of additional characters or digits. And keywords usually look quite similar to an ID. No surprises so far. Now let's assume a keyword definition like 'some' 'input' compared to 'some input'. What happens if the lexer encounters an input sequence 'som ' is the following. It'll start to consume the leading 's' and has not yet decided which token to emit, since it could become a keyword or an identifier. Same for the 'o' and the 'm'. The trailing space is the character where it can finally decide that 'som ' contains two tokens: an identifier and a whitespace token. So far so good.

Let the Parser Fail - For Free

Now comes the tricky part since the user continues to type an 'e' after the 'm': 'some '. Again, the lexer starts with the 's' and continues to consume the 'o', 'm' and 'e'. No decision was made yet: it could still be an ID or the start of the keyword 'some input'. The next character is a space, and that's the crucial part here: If grammar contains a keyword 'some input', the space is expected since it is part of the keyword. Now, the lexer has only one valid alternative. After the space it is keen on consuming an 'i', 'n', 'p', 'u' and 't'. Unfortunately, there is no 'i' in the parsed text since the lexer already reached the end of the file.

As already mentioned in an earlier post, the lexer will never roll back to the token 'some' in order to create an ID token and a subsequent whitespace. In fact the space character was expected as part of single token so it was safe to consume it. Instead of rolling back and creating two tokens, the lexer will emit an error token which cannot be handled by the parser. Even though the text appeared to be a perfectly valid ID followed by a whitespace, the parser will fail. That's why spaces in keywords are considered harmful.

In contrast, the variant with two split keywords of the grammar works fine. Here, the user is free to apply all sorts of formatting to the two adjacent keywords, any number of spaces, line breaks or even comments can appear between them, are valid and handled well by the parser. If you are concerned about the convenience in the editor - after all, a single keyword with a space seems to be more user friendly in the content assistant - I recommend to tweak that one instead of using an error prone grammar definition.

Thursday, November 8, 2012

Xtext Corner #8 - Libraries Are Key

In today's issue of the Xtext Corner, I want to discuss the library approach and compare it to some hard coded grammar bits and pieces. The question about which path to choose often arises if you want to implement an IDE for an existing language. Most languages use a run-time environment that exposes some implicit API.

Just to name a few examples: Java includes the JDK with all its classes and the virtual machine has a notion of primitive types (as a bonus). JavaScript code usually has access to a DOM including its properties and functions. The DOM is provided by the run-time environment that executes the script. SQL in turn has built-in functions like max, avg or sum. All these things or more or less an integral part of the existing language.

As soon as you start to work on an IDE for such a language, you may feel tempted to wire parts of the environment into the grammar. After all, keywords like intboolean or double feel quite natural in a Java grammar - at least a first glance. In the long run it often turns out to be a bad idea to wire these things into the grammar definition. The alternative is to use a so called library approach: The information about the language run-time is encoded in an external model that is accessible to the language implementation.

An Example

To use again the Java example (and for the last time in this post): The ultimate goal is to treat types like java.lang.Object and java.util.List in the same way as int or boolean. Since we did this already for Java as part of the Xtext core framework, let's use a different, somehow artificial example in the following. Our dummy language supports function calls of which max, min and avg are implicitly available.
The hard-coded approach looks quite simple at first. A simplified view on the things will lead to the conclusion that the parser will automatically check that the invoked functions actually exist, content assist works out of the box and even the coloring of keywords suggests that the three enumerated functions are somehow special.

Not so obvious are the pain-points (which come for free): The documentation for these functions has to be hooked up manually, the complete signatures of them have to be hard-coded, too. The validation has to be aware of the parameters and return types in order to check the conformance with the actual arguments. Things become rather messy beyond the first quickly sketched grammar snippet. And last but not least there is no guarantee that the set of implicit functions is stable forever with each and every version of the run-time. If the language inventor introduces a new function sum  in a subsequent release, everything has to be rebuild and deployed. And you can be sure that the to-be-introduced keyword sum will cause trouble at least in one of the existing files.

Libraries Instead of Keywords

The library approach seems to be more difficult at first but it pays off quickly. Instead of using hard-coded function names, the grammar uses only a cross reference to the actual function. The function itself is modeled in another resource that is also deployed with the language.
This external definition of the built-in functions can usually follow the same guidelines as custom functions do. But of course they may even use a simpler representation. Such a stub may only define the signature and some documentation comment but not the actual implementation body. It's actualy pretty similar to header files. As long as there is no existing format that can be used transparently, it's often the easiest way to define an Xtext language for a custom stub format. The API description should use the same EPackage as the full implementation of the language. This ensures that the built-ins and the custom functions follow the same rules and all the utilities like the type checker and documentation provider can be used independently from the concrete invoked function.

If there is an existing specification of the implicit features available, that one should be used instead. Creating a model from an existing, processable format is straight forward and it avoids mistakes because there is no redundant declaration of the very same information. In both cases there is a clear separation of concerns: The grammar remains what it should be: a description of the concrete syntax and not something that is tight to the run-time. The API specification is concise and easy to grasp, too. And in case an existing format can be used for that purpose, it's likely that the language users are already familiar with that format.

Wrap Up

You should always consider to use external descriptions or header stubs of the environment. A grammar that is tightly coupled to a particular version or API is quite error-prone and fragile. Any evolution of the run-time will lead to grammar changes which will in turn lead to broken existing models (that's a promise). Last but not least, the effort for a seamless integration of built-in and custom functions for the end-user exceeds the efforts for a clean separation of concerns by far.

A very sophisticated implementation of this approach, can be explored in the Geppetto repository at GitHub. Geppetto uses puppet files and ruby libraries as the target platform, parses them and puts them onto the scope of the project files. This example underlines another advantage of the library approach: It is possible to use a configurable environment. The APIs may be different from version to version and the concrete variant can be chosen by the user. This would never be possible with a hard-wired set of built-ins.