XML Configurations Part 2
These are some further notes (also see XmlConfigurations).
Definitions
A configuration is a declarative specification, represented in xml, of such things as how to run a scenario, the parameters for the models, a database connedction, or a particular series of geoprocessing steps.
A project represents the application of UrbanSim to a given region, with a particular choice of geography, baseyear, and so forth. A project can emcompass various alternative scenarios, different possible sets of geoprocessing steps, and so forth.
A perspective is a subset of a project, tailored to the interests and access permissions of a particular person. We won't support perspectives in the first version but probably will later.
These definitions are used on this page, but are different from the ones used in XmlConfigurations -- the older wiki page hasn't been updated yet. (Might as well let the design settle down first.)
Some Design Choices
To help focus the design, here is another slice at organizing the space, this one by certain key choices.
What's in the GUI
To avoid thrashing, let's assume the GUI has the current tabs (perhaps renaming "run manager" to "scenario manager", perhaps adding some more tabs later, e.g. "estimation manager").
We've decided that what gets shown in the GUI is a single project (which might include multiple scenarios, processing scripts, and so forth).
This is in contrast to (for example) showing all the projects a user might be working on, or just one scenario or data processing sequence.
Sharing Configurations
The GUI and its configuration storage structures should at a minimum support two kinds of configurations:
- We need to be able to distribute default and template configurations from CUSPA for people to build on. These would be part of the Opus/UrbanSim distribution. We need to allow for easy updating. This should be done with a subversion update (ideally with a command available in the GUI, but it could be separate at first).
- There should also be either personal or organization-wide configurations (or both).
Given this, the default CUSPA configurations should be in different files from the personal or organizational ones, to make updating easier.
If we're going to support at least this, we might as well allow more general and flexible kinds of sharing, so that there could be both organization-wide configurations and personal configurations. Initially we'll support saving these using subversion; later we could add support for saving them in a database.
How much is in each XML file?
Here are three possibilities:
- one XML file contains an entire project
- one XML file contains the contents of a tab for a given project (so a project would be represented using 4 xml files)
- one XML file contains a particular scenario, set of data processing scripts, etc - in this case a project would be represented by many xml files
The second option (representing a project with 4 xml files) doesn't seem to have any advantages over the other two, so let's drop that one. Advantages of many xml files are that you have less likelihood of collisions when checking things into the repository, and it may enable finer-grained sharing. Advantages of one xml file per project are that the GUI code is simpler, and loading and saving projects is simpler. The two design options below vary depending on which of the remaining possibilities is chosen.
Inheritance and Composition
There are two different mechanisms for combining configurations in the current code:
- inheritance (one configuration can have another configuration as a parent)
- composition (using 'include' commands)
The specific kind of inheritance used is like that in prototype-based rather than class-based object oriented languages (in case anyone cares ...). This seems appropriate for the domain.
The 'include' command currently operates between files: one xml configuration file can include the entire contents of another xml file. This could be made more fine-grained if need be, so that you could include just some elements rather than the whole thing, and so that you could share parts within one xml file. (Caution: the more flexible you make it, the more danger of getting tangles that are very hard to understand and maintain.)
I think we need inheritance, since it is a good way to support default configurations. Depending on how we structure things we might or might not also need some flavor of 'include' command.
Recommendation: support both inheritance and 'include'. Which kind of 'include' depends on how much is in an xml file. If xml files are small (just one scenario for example), use the current version of include (which grabs an entire file to include, rather than a part). If a project is stored in one xml file, use the finer-grained include.
Yet Another Design (Option A)
Here is yet another design based on these choices.
There are 4 different kinds of xml files, one each tab. So a model configuration looks like this:
<?xml version='1.0' encoding='UTF-8'?> <model> .... </model>
A scenario configuration looks like this:
<?xml version='1.0' encoding='UTF-8'?> < scenario> .... </scenario>
and so forth.
Here is a typical layout of files and directories:
projects
urbansim_gridcell
data_manager
data_config.xml
model_manager
model_config.xml
scenario_manager
scenario_config.xml
results_manager
results_config.xml
eugene_gridcell
data_manager
data_config.xml
model_manager
model_config.xml
scenario_manager
scenario_config1.xml
scenario_config2.xml
results_manager
results_config.xml
psrc_gridcell
....
psrc_parcel
data_manager
data_config1.xml
model_manager
model_config1.xml
model_config2.xml
scenario_manager
scenario_config1.xml
scenario_config2.xml
scenario_config3.xml
results_manager
results.xml
I've included a default data_manager config for uniformity -- this might actually be empty initially. (Later it will have useful scripts that are inherited.)
Note that the default urbansim_gridcell project has only one configuration per tab, but in general you can have several.
When you browse to e.g. psrc_parcel, in the model_manager tab the GUI shows the available model configurations model_config1.xml and model_config2.xml; and similarly for the other tabs.
Parent configurations are shown in the GUI as nodes, collapsed by default. If you expand it, you see all the inherited nodes, but in grey (because they can't be edited). There is a right-click command that copies a node down to the child, at which point it can be edited (but no longer has a relation to the parent).
To handle storing into subversion and updating, a project directory will have storage information associated with it. This could just be the .svn file -- the GUI could have commands to update from the repository, commit to the repository (if you've got permissions for that), etc.
References among XML Files
Right now references are handled using relative or absolute paths. So in the above directory structure a reference from scenario_config1.xml to model_config1.xml would be ../model_manager/model_config1.xml. Since we know what kind of configuration is expected in a slot, we could default some of this, and just write it as model_config1.xml.
For references to parents, these could be either the complete relative path, a complete absolute path starting just under projects, or could be just the name of the other project and the name of the particular configuration (this last is probably more clever than it's worth though). So here are the ways a scenario in eugene_gridcell would refer to the default in urbansim_gridcell:
- ../../urbansim_gridcell/scenario_manager/scenario_config.xml
- urbansim_gridcell/scenario_manager/scenario_config.xml
- urbansim_gridcell/scenario_config.xml
Do we want to allow more directory structure, either among the projects, the configurations in a tab, or both? Not that hard to support, but maybe we should start with the simple, fixed version.
Yet Another Design (Option B)
Here is another version of the design. This one uses just one xml file per project. This results in a simpler directory structure, simpler GUI code, more complex XML parsing, and larger and more complex xml files.
The directory structure is now simple:
projects
urbansim_gridcell.xml
eugene_gridcell.xml
psrc_gridcell.xml
psrc_parcel.xml
....
Each project file has 4 major nodes, corresponding to the 4 tabs in the GUI. You open a project file, and when you open a tab the GUI selects the appropriate part of the xml to show.
Each of the major nodes can have several parts - for example, for the scenario manager, there could be several different scenarios. Each individual scenario would have an "executable" attribute so that the GUI knows to make that option available on the right click menu. In addition, one scenario can inherit from another.
Here's a skeleton of the xml for a project:
<?xml version='1.0' encoding='UTF-8'?>
<opus_project>
<parent>parentproj.xml</parent >
<description>a project ...</description>
<data_manager>
....
</data_manager>
<model_manager>
....
</model_manager >
<scenario_manager>
<scenario>
....
</scenario>
<scenario>
....
</scenario>
</scenario_manager >
<results_manager>
....
</results_manager >
</opus_project >
Question: do we want to use these tags, or should everything be 'item'?
Inheritance in the One-XML-File-Per-Project Design
This is more complicated than in Option A. We want to allow one scenario to inherit from another scenario in that same project, and we also need to allow a scenario to inherit from a default configuration, which would be stored in a different xml file. Inheritance is probably most important for scenarios, but we also want to have inheritance for model configurations. For the data manager tab, we want some sort of composition, to make it easy to share scripts as well as have personal ones -- this could be done with inheritance, or it could be a different compositional mechanism (since I'm guessing this won't typically involve overriding things, just combining sets).
Here are some possibilities for handling inheritance in this design. Let's consider the case of how one scenario refers to a parent scenario
- The parent scenario might be in the same project, or in a different one. If it's in a different project, we will need a path to the other project and then to the particular scenario in that project.
- Same as 1, but parent projects are special and always have just one scenario. In this case if the parent scenario is in a different project we just need a path to the project.
- Inheritance relationships across projects are handled at the project level, rather than at the level of individual scenarios. For example, the psrc_parcel project would inherit from the urbansim_parcel project. Then the default scenario in the urbansim_parcel project (which might be called say default_scenario) would show up in the list of scenarios in psrc_parcel, although marked in grey as inherited. To inherit from this in another scenario in psrc_parcel you'd just name it, as with any other scenario in that project.
Right now I like the 3rd option. In fact we could even think of the relationship among projects as being composition rather than inheritance, and not allow name collisions (overriding names). Instead you just get the union. This makes it straightforward to have one project that has several other projects as sources of defaults -- this might be good for the data manager tab, if there are different places that people are contributing useful scripts.
Late-breaking idea: actually thinking of putting projects together using composition works well. Think of a project like a Python package. Then in effect psrc_parcel does this:
from urbansim_parcel import *
It's clear how to extend this to import from multiple other projects.
Additional Features
templates vs defaults -- explain difference. Could provide templates if we want.
