As soon as you start to work on an IDE for such a language, you may feel tempted to wire parts of the environment into the grammar. After all, keywords like int, boolean or double feel quite natural in a Java grammar - at least a first glance. In the long run it often turns out to be a bad idea to wire these things into the grammar definition. The alternative is to use a so called library approach: The information about the language run-time is encoded in an external model that is accessible to the language implementation.
An ExampleTo use again the Java example (and for the last time in this post): The ultimate goal is to treat types like java.lang.Object and java.util.List in the same way as int or boolean. Since we did this already for Java as part of the Xtext core framework, let's use a different, somehow artificial example in the following. Our dummy language supports function calls of which max, min and avg are implicitly available.
Not so obvious are the pain-points (which come for free): The documentation for these functions has to be hooked up manually, the complete signatures of them have to be hard-coded, too. The validation has to be aware of the parameters and return types in order to check the conformance with the actual arguments. Things become rather messy beyond the first quickly sketched grammar snippet. And last but not least there is no guarantee that the set of implicit functions is stable forever with each and every version of the run-time. If the language inventor introduces a new function sum in a subsequent release, everything has to be rebuild and deployed. And you can be sure that the to-be-introduced keyword sum will cause trouble at least in one of the existing files.
Libraries Instead of KeywordsThe library approach seems to be more difficult at first but it pays off quickly. Instead of using hard-coded function names, the grammar uses only a cross reference to the actual function. The function itself is modeled in another resource that is also deployed with the language.
If there is an existing specification of the implicit features available, that one should be used instead. Creating a model from an existing, processable format is straight forward and it avoids mistakes because there is no redundant declaration of the very same information. In both cases there is a clear separation of concerns: The grammar remains what it should be: a description of the concrete syntax and not something that is tight to the run-time. The API specification is concise and easy to grasp, too. And in case an existing format can be used for that purpose, it's likely that the language users are already familiar with that format.
Wrap UpYou should always consider to use external descriptions or header stubs of the environment. A grammar that is tightly coupled to a particular version or API is quite error-prone and fragile. Any evolution of the run-time will lead to grammar changes which will in turn lead to broken existing models (that's a promise). Last but not least, the effort for a seamless integration of built-in and custom functions for the end-user exceeds the efforts for a clean separation of concerns by far.
A very sophisticated implementation of this approach, can be explored in the Geppetto repository at GitHub. Geppetto uses puppet files and ruby libraries as the target platform, parses them and puts them onto the scope of the project files. This example underlines another advantage of the library approach: It is possible to use a configurable environment. The APIs may be different from version to version and the concrete variant can be chosen by the user. This would never be possible with a hard-wired set of built-ins.