Xtext cross references and scoping – an overview (Part 4)
Scoping Relevant Concepts
As mentioned before, cross referencing in Xtext is based on names. We also take for granted that scoping is about the visibility of elements that may be referenced. We now consider some use cases in order to collect the requirements to be met by a “scope” and refine its basic structure. Please note that the content of this post is not intended to be 100% implementation detail correct (or about how to actually implement a scope provider), but about understanding (some of) the basic cross referencing concepts. It is strongly recommended to thoroughly work through the technical documentation concerning this topic (more than once).
Context
It should be clear that “the scope” is nothing that can be defined for a model as a whole. Cross references have different target types, variables defined within a loop should not be visible outside the loop, etc. So, the scope depends on the context. It matters from where one is looking.
Local Scoping
Assume that a cross reference can only be established to objects defined within the same resource (read: file) and that names are unique throughout the resource. In this case it would suffice to come up with a flat map (there are no name conflicts) pointing from the name of an object to the object itself (as they are in memory anyway). Given a name, the linker looks up the corresponding value in the map and links to that object. The content assist engine proposes all syntactically valid keys.
We first drop the unique name assumption. Names can appear more than once, a common example are counter variables. It might be possible to stick with a flat map and add only “the correct” element, but it seems more sensible to give the scope a bit more structure. Nested blocks are a common concept. A statement is contained in a loop, the loop is contained in a method, the method is contained in a class, the class has a super class and so on. With respect to scopes a similar structure could be used, in particular because often the visibility of elements is tightly connected with these blocks. If your statement uses a variable, you first check if (looking from the statement) a matching declaration is found in the innermost relevant block (e.g. if is the variable defined within the loop). If so you are done (and the element would automatically shadow one with the same name in an outer block). If not, you check if (now looking from the loop) the declaration is found in the next outer block (the method). If not, you check if (looking from the method) the declaration is found in the next outer block (the class). And so on.
This shifting of perspective (looking from the next outer block) could be implemented by the following structure: rather than implementing the scope as a flat map, have a map along with a parent (outer) scope. If an element cannot be found in the map, recursively look at the parent scope. This is quite sensible as the outer scope will often be reusable (the variables visible starting at the loop will be the same for all statements within the loop, only the “inner” scopes for each statement in the loop are different).
Of course this does not yet solve the problem of elements having the same name within one block. We will not consider this issue here. In the standard configuration, Xtext will let you know if there are duplicate names and mark them as errors. If you want to allow duplicate names, you better know what you are doing.
Global Scoping
Generally, we want to have cross references beyond resource boundaries. If no matching element is found within the resource then the “global scope” (a scope for the world outside the resource) is queried. Note that from the technical point of view, the global scope could just be an outer scope to the outermost local scope layer.
In the old days, when there were only URI imports, things were quite simple. In your model, you had to actually point to the resource containing the object to be referenced. That way, when creating the (global) scope, the imported models were loaded and the scope could be filled with the mappings from name to actual object.
This cannot work for namespace imports (or referencing via qualified names without any imports at all). Why? Because you cannot hold all models in memory whose elements are potentially referenced via a qualified name. Note that unlike a platform/plugin/classpath/relative/whatever-URI a qualified name does not carry any information at all about where to look for the corresponding element. Approaching this problem bottom up, we now put into the scope not a name to object mapping but rather a name to description-of-an-object mapping. That description (IEObjectDescription
) is the object reduced to its essential information with respect to cross referencing: name, type, URI (i.e. a way of locating the object if that should become necessary).
Note that this works quite well for our two main use cases linking and content assist. Given a name, the linker queries the scope, obtains an IEObjectDescription
, resolves the URI to obtain the actual object and links to it (or installs a proxy with the URI to be used for actually resolving the object). The content assist engine proposes all syntactically valid names as before.
What do we have so far: Scopes are nested. The name lookup strategy is “check if the innermost scope has an entry for that name, if so: done, otherwise: check the the outer scope (as long as there is one)”. A useful (local) scoping strategy is having a scope layer for each layer of the containment hierarchy. If we are at the root of the resource and have not found a candidate, the outer scope is a global one, meaning one representing the world outside the resource. Rather than providing actual model objects, the scope provides IEObjectDescription
s whose main feature is to hold information on where to find the actual object.
You spot the difficulty in this modification: you still have to make available all the IEObjectDescription
s to be used for the scoping! (The following sections will briefly summarise how issues related with that requirement are addressed by the Xtext framework.)
Index
This is where the index enters the stage. We already stated that it is not feasible to hold all resources in memory. The idea is that an index keeps accessible enough information about each resource (even if it is not in memory at the moment) in order to make cross referencing work. This information is summarised in an IResourceDescription
(one for each resource). In particular it provides a list of exported objects (in form of IEObjectDescription
s). That is, the global scope can now simply be fed with exported objects of each IResourceDescription
held in the index. We will always say index instead of IResourceDescriptions
(“unfortunate” naming for talking about the concepts) in order to avoid confusion with the plural form of IResourceDescription
.
Of course at some point a resource has to be loaded in order to create the IResourceDescription to be used in the index. The Xtext framework does that during the build of a project. As a first approximation on startup and after model changes the index is initialised and updated.
There is (not only) one Index
It is important to understand that the index infrastructure is used by all Xtext projects. That is, you don’t have a private index for each language. This makes sense as a developer (you) might not know in advance which other projects may want to reference an object the language. So you should think of the index as one big information center for all objects that can be referred to. But of course, you may want to have some influence on what goes to the index and what is used from the index on a language by language basis.
It is just as important to know that actually there is not the index. There are several implementations, sometimes working in parallel. For example, different index implementations are usually used when in an Eclipse environment or when running an MWE(2)-workflow. Also within Eclipse different index implementations may be active, for saved or closed resources on one hand and open dirty resources (modified file in an open editor) on the other. This allows linking to work nicely even if the linked object is not yet saved.
So if you want to access the index yourself, be it for modifying scoping, validation or something else, it is important NOT!! simply to inject IResourceDescriptions
(this is an extremely common mistake causing linking problems). In Xtext 2 you should use (read: inject) the ResourceDescriptionsProvider
, in Xtext 1 you might look at the AbstractGlobalScopeProvider
in order to see how to obtain the “correct” index.
How are the exported objects determined?
The IResourceDescription.Manager
is responsible for creating the resource description. By default everything that has a name is also exported. This is OK for simple out-of-the-box support for local and global cross referencing, but you should think about overriding the default implementation (or rather binding an IDefaultResourceDescriptionStrategy
) for performance and memory optimisations. Often you will give names to many elements that should be referable locally (e.g. local variables, in order to get nice out-of-the-box cross referencing behaviour), but should in fact not be globally visible.
Containers
Think of the index as flat. In a sense this flatness is necessary here as it is impossible for the index to have a sophisticated structure that would fit the need of all languages. One could say that global visibility is not universal. In other words, just because an element is defined somewhere does not mean that it should be visible from everywhere. Looking at Java, a class must be on the classpath in order to be visible. Analogously, you may want to require a dependency to the project containing the element to be referenced or a similar criterion. Containers (IContainer
, IContainer.Manager
, etc.) are intended to model these language specific requirements. The idea is to make a resource belong to a container (from “your” language point of view, it may be a different container from the point of view of another language) and define which containers are visible from a resource. That way a fine grained visibility control is possible. There are several default implementations, e.g. one based on the Java class path (it is only possible to reference an object if the containing resource is on the class path of the project containing the referring model).