Monday, April 17, 2017

Why DSLs?

A lot has been written about domain specific languages, their purpose and their application. According to the ever changing wisdom of wikipedia, a DSL “is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains.” In other words, a DSL is supposed to help to implement software systems or parts of those in a more efficient way. But it begs the question, why engineers should learn new syntaxes, new APIs and new tools rather than using their primary language and just get the things done?

Here is my take on this. And to answer that question, let’s move the discussion away from programming languages towards a more general language understanding. And instead of talking abstract, I’ll use a very concrete example. In fact one of the most discussed domains ever - and one that literally everyone has an opinion about: The weather.

We all know this situation: When watching the news, the forecaster will tell something about sunshine duration, wind speed and direction, or temperature. Being not a studied meteorologist, I can still find my way through most of the facts, though the probability of precipitation always gives me a slight headache. If we look at the vocabulary that is used in an average weather forecast, we can clearly call that a domain specific language, though it only scratches the surface of meteorology. But what happens, when two meteorologists talk to each other about the weather? My take: they will use a very efficient vocabulary to discuss it unambiguously.
Now let’s move this gedankenexperiment forward. There are approximately 40 non-compound words in the Finnish language that describe snow. Now what happens, when a Finnish forecaster and a German news anchor talk about snowy weather conditions and the anchorman takes English notes on that? I bet it is safe to assume that there will be a big loss of precision when it comes to the mutual agreement on the exact form of snowy weather. And even more so, when this German guy later on tries to explain to another Finn what the weather was like. The bottomline of this: common vocabulary and language is crucial to successful communication.

Back to programming. Let’s assume that the English language is a general purpose programming language, the German guy is a software developer and the Finnish forecaster is a domain expert for snowy weather. This may all sound a little farfetched, but in fact it is exactly how most software projects are run: A domain expert explains the requirements to a developer. The dev will start implementing the requirements. Other developers will be onboarded on the project. They try to wrap their head around the state of the codebase and surely read the subtleties of the implementation differently, no matter how fluent they are in English. Follow-up meetings will be scheduled to clarify questions with the domain experts. And the entire communication is prone to loss in precision. In the end all involved parties talk about similar yet slightly different things. Misunderstandings go potentially unnoticed and cause a lot of frustration on all sides.

This is where domain specific languages come into play! Instead of a tedious, multi-step translation from one specialized vocabulary to a general purpose language and vice versa, the logic is directly implemented using the domain specific terms and notation. The knowledge is captured with fewer manual transformation steps; the system is easier to write, understand and review. This may even work to the extent that the domain experts do write the code themselves. Or they pair up with the software engineers and form a team.

As usual, there is no such thing as free lunch. As long as your are not Omnilingual, you should probably not waste your time learning Finnish by heart, especially when you are working with Spanish people next week, and the French team the week thereafter. But without any doubt, fluent Finnish will pay off as long as your are working with the Finns.

A development process based on domain specific languages and thus based on a level of abstraction close to the problem domain can relief all involved people. There are fewer chances for misunderstandings and inaccurate translations. Speaking the same language and using the same vocabulary naturally feels like pulling together. And that’s what makes successful projects.

Monday, April 10, 2017

Moving on

After an exciting journey of 15 months as the Director Engineering at SMACC, I decided to move on. It was not an easy decision to make, though it’s still one that I wanted to make. In the past year I made many new friends, met great people, and had the chance to work in a super nice team. It was a great time with plenty of challenges, important learnings and great fun. But I also realized that I was missing the time as a technical consultant. Language engineering always was and still is a strong passion of mine. So I figured it’s about time to move on and refocus. Xtext, Eclipse, Language oriented programming - exciting times ahead. Keeping you posted ...

Friday, November 6, 2015

Improved Grammar Inheritance

Since the very first day of Xtext, it was possible to extend another grammar to mixin its rule declarations to reuse or specialize them. For most use cases that was straightforward and a perfect match. For others it was rather cumbersome so far because the original declaration was no longer reachable from the sub-language. Copy and paste was the only solution to that problem. The good news? The situation changes with Xtext 2.9 significantly.
The newly introduced super call allows to override a rule and still use its super implementation without the need to duplicate it. Along with super, Xtext 2.9 also provides a way to call inherited or locally declared rules explicitly. Explicit rule calls will overrule the polymorphism that is usually applied in the context of grammar inheritance. As a language library designer you get fine grained control over the syntax, even if your language is supposed to be extended by sub-languages.
But let's look at an example:
grammar SuperDsl
  with org.eclipse.xtext.common.Terminals
  'element' name=ID
  'thing' name=SuperDsl::ID
terminal ID: ('a'..'z')+;

grammar SubDsl with SuperDsl
    super // 1; or super::Element
  | 'element' name=super::ID // 2
  | 'element' name=Terminals::ID // 3
terminal ID: 'id';
Here we see different use cases for the super call and also for qualified rule calls.  The  first occurrence of super (1) illustrates the shortest possible notation to reach out to the super implementation. If you override a rule and want to use the original declaration in the rule's body, you can simply call super from there.
It is also possible to use a qualified::RuleCall. Qualified invocations point directly to the referenced rule. The qualifier can either be a generic super qualifier (2) or an explicit language name (3). The latter provides a way to skip the immediate super language and reach out to its parent. This offers great flexibility. You can ensure that you call the rule from your own grammar, even if a sub-language will override the declaration. The benefit is illustrated by the rule Thing. It calls the ID declaration from SuperDsl explicitly thus it will also do so from the SubDsl. As long as you do not explicitly override the declaration of Thing, its syntax will not change in any inheritor from SuperDsl.
Long story short: super calls add a lot more flexibility for language mixins and greatly reduce the need to copy and paste entire rule bodies in the sub-language. Go ahead and download the latest milestone to give it a try!

Thursday, October 22, 2015

The Xtext Grammar Learned New Tricks

Since the Xtext 2.9 release is around the corner - and you've for sure read about the upcoming support for IntelliJ IDEA or Xtext editors in the browser -, it's time to unveil some of the new features of the Xtext grammar language itself. In a nutshell the enhancements address a couple of long standing feature requests and non-critical issues that we had. But especially complex grammars sometimes required duplicated or repetitive parts to implement the language syntax. We felt that it was about time to get rid of these idioms.
Long story short: In the next version the grammar language will support a couple of new features:
  1. /* SuppressWarnings[all] */: The severity of errors and warnings in a grammar file can be customized on a per project level since Xtext 2.8. But sometimes you don't want to disable the validation rule completely just to get rid of one particular false positive (False positive?!? you think? Stay tuned, I'll elaborate on that in a separate post). For that purpose it's now possible to mute a certain validation rule for a selected element, a rule or the entire grammar.
  2. super calls and grammar mixins: Xtext 2.9 renders our former advise 'You have to copy the parent rule into your sub-language' obsolete. Eventually it is possible to simply use a super call instead.
  3. A long standing feature request for the grammar described a means to extract common parts of parser rules without screwing up the ecore model. The newly introduced parser fragments allow to normalize production rules that formerly required copy'n'paste, e.g. due to left factoring. Fragments even sport smarter inference of the ecore model when it comes to multiple inheritance.
  4. Last but not least, the new JavaScript specification was an inspiration for conditional alternatives in a grammar definition. Advanced language use cases may require to enable or disable a decision path deep down in some recursive chain of rule calls. Until now there was no concise way to support something like that. This limitation led often to dozens of copied rules if a syntax required to support conditionally enabled or disabled branches. Parameterized rule calls remove that limitation and enable much more expressive notations.
I'll explain all these new features in-depth in a short blog series to make sure that every bit of it gets proper attention. Make sure to follow-up if you are curious about them.

Monday, November 3, 2014

After EclipseCon is Before EclipseCon

Now that the EclipseCon Europe 2014 is over, it's time to focus on the next big community event: EclipseCon North America 2015 - especially since the deadline for the call for paper is already approaching. Better get your session proposal ready soon, if you want to share something new, cool, interesting or enlightening with your peers. If you are really fast, you may even reach the deadline for the early bird picks. Chances won't get better.
In other words: San Francisco. March 2015. EclipseCon. Submit now!

In case you don't know it yet: EclipseCon North America will again feature theme days that focus on special topics, one of those will be dedicated to Xtext. If you want to share insights about your application of domain-specific languages, how you solved challenges in your language implementation or how you use the framework in general, I can only encourage you to submit a talk for the Xtext track.
As every year, I expect EclipseCon to be a great community event with deep technical content. Like nowhere else, you can get in touch with the committers of the various Eclipse projects, discuss solutions and have a great time. So even if you don't plan to submit a proposal, make sure to save the date: March 9 - 12 in sunny California, EclipseCon NA!
Still not convinced? Check out the impressions from past EclipseCons and see what you are going to miss!

Monday, October 6, 2014

Musing about Eclipse UX Metaphors: The Blocking Build


For the upcoming version of Xtext we are revising the approach to building. It appears to be promising to rethink the overall lifecycle of the Xtext builder to aim at:
  1. Better user experience by introducing a non-blocking build infrastructure
  2. Improved performance due to improved parallelization
  3. Incremental builds also on the command line

The Problem

The Xtext framework implements an Eclipse builder and is thereby immediately affected by the builder's user experience metaphor (even bad experience is still experience). Whenever a file is saved or a project is explicitly built, the user is essentially locked out from doing work in the editor.

Go Home, Eclipse! You're drunk!
That's not because the editor isn't generally usable during the build. But it turns out, that it becomes quite a habit to Eclipse users to save early and save often. As soon as you wrote some code and you save the file that you're working on, the builder kicks in and tries to validate the new file state. Since you are continuing to edit, it's quite likely that you hit save again and are confronted with that modal dialog with greetings from the 90s. Of course you don't see this message all the time when you save a file since the incremental build is usually quite fast, but when you see it, it is definitely not what you expected.

Some Background

Generally speaking, the Eclipse builder is responsible to produce artifacts from the source files in a project. There may be different builders configured for the very same project and the term artifact does not only describe compilation results in the form of new files, but also validation markers. While a builder is running for a project, it holds an acquired lock not only for that project including its contained files and folders but for the entire workspace. This ensures that there are no intermitted events that remove or modify any state on disk (details have been discussed here). And this is where the trouble starts from the users perspective.
On the one hand, the locking prevents from unexpected modifications within Eclipse, on the other hand it gets in the way of users since they can no longer work without interruption. The thing was apparently designed to ensure consistency within the workspace between sources and compilation result. But if you look into the dirty corners, the paid price is way too high. The blocking mechanism introduces only the impression of safety but can never guarantee it. Literally every external process may still perform I/O operations on the very same files and the build would go bananas since the state known to Eclipse is no longer in sync with the actual state on disk. But that's probably another can of worms that is not subject of this post. Instead let's focus on ways to improve the situation which may lead to a more responsive UI.

Action Items

For Xtext, we are currently analyzing how we can change the way we build files and projects. Rather than getting in the way of the user, we are thinking about performing the build in the background without unnecessary blocking. The main goal with that regards is to move the complete build out of the coarse grained project lock and break it into manageable, smaller pieces. E.g. as soon as the files are loaded, they don't need to be locked anymore. In the validation phase only the markers are written but not the entire files. For incremental builds, only a small subset of files needs to be considered in the first place.
This breakdown of locking is desirable on various level. First and foremost, the user experience would be improved a lot since Xtext would present fewer blocking dialogs to the user. Another positive effect is that the build and its lifecycle would be essentially decoupled from the Eclipse builder and its related UI components. By factoring out the build cycle, Xtext can support incremental compilation on the command line, too.
In times of many-cores, it also becomes more and more interesting to parallelize the build to go full throttle with todays CPUs. The leverage the potential there, the build process itself has to be analyzed carefully. The Xtext build inherently runs in multiple passes that are currently strictly sequential, especially in the context of Eclipse projects. These steps are performed for each individual project during a build.
  1. First of all, the resources in a project have to be indexed to collect all the reachable names and derive an index structure that can be seen as a global symbol table.
  2. After all symbols and names are known, the cross references are resolved and their resolution state is cached as reference descriptions. Currently also the validation is performed on that stage but that can be seen as step 2.5
  3. The last step is the code generation. All resources are processed to create derived artifacts from them.
There are already means in Xtext to perform some steps in parallel. E.g. the loading of files into memory for stage (1) can be done in parallel rather than sequentially since Xtext 2.0. In the future, we want to improve on that and allow a lot more parallelization. Given that the build would be decoupled from the Eclipse builders lifecycle, we could index all the resources in the workspace at the same time. In phase (1), there is no need for one project to wait for another. Multiple projects would be processed in parallel rather than sequentially. Also the reference resolution can be done in parallel - at least if the projects do not depend on each other transitively. For the code generation, there is already support for parallelization since Xtext 2.7, but there's still room for improvements, e.g. we could not only generate resources within a single project in parallel but also run the full build concurrently for multiple projects.
But there's even more that we are discussing right now about the way Xtext projects are build within Eclipse. We are looking into means to preserve the index state if a project is closed by the user, for example. Instead of rebuilding the entire project, the builder state would be available immediately after the project is reopened, similar as with plain Java projects. Also the general handling of archives and resources in these archives is under review. For bigger projects, it may pay off to have precomputed information available that is packaged together with the resources in the archive.
In the end, the overall goal is to improve the perceived performance and the responsiveness of the IDE. Never ever should a user action be blocked by some task the IDE is performing in the background. The build should also be decoupled from the Eclipse infrastructure. With that regards, the contracts for each build step have to be sharpened and of course correctness should not be traded for concurrency. Exciting times!

Friday, October 3, 2014

Testing multiple Xtext DSLs

Recently there was a question in the Eclipse forum about testing multiple Xtext languages. The described scenario involves two language and one should have cross references to the other. Since this usecase caused some headaches in the past, Michael Vorburger provided a patch (Thanks for that!) that adds information about that particular topic to the official Xtext documentation. The updated docs are available since the recent 2.7 release. To provide some additional guidance, I hacked a small example that can be used as a blueprint if you want to get a jump start on that issue. This example also documents briefly how Xtext's unit test facilities can be used for effective language testing.

Key to testing the infrastructure of a DSL is the setup of a proper test environment. The Guice services have to be registered, EMF has to be initialized and obviously everything has to be cleaned up again after a test has been executed. For that purpose, Xtext provides a dedicated Junit4 test runner that uses an injector provider to do the initialization. The nice thing about that approach is that you can directly inject the necessary services into your unit test class. Exaclty as you are used to in your production code, too.

When it comes to testing multiple DSLs, basically the same infrastructure can be used, though you have to make sure, that all involved languages are properly initialized. For that purpose, a custom injector provider has to be derived from the one, that was already generated for your language. The to-be-written subclass needs to takes care of all the prerequisites and register the dependent languages. This mainly involves delegation to their injector providers.

Now that the setup is ready, we can test cross references between multiple DSLs. It is important to know that these references are only properly resolved if all models are available in the same resource set. That's why we need to use an explicit resource set in the tests. Besides that, it's the programming model that you know from Xtext and EMF in general.

A complete examplary test is available on Github.