XML Configuration Files Structure
Here are some notes on how to organize the xml configuration files, including several design options, followed by a recommendation at the end of which one to select.
Desired Features
- All aspects of a project (estimation, model configuration, results, etc) are in directories under the given project, or in a single xml file.
- Integrated with subversion for storage. (Ideally, support a database for storage as well.) Want to have both project information that is shared by all members of the team and individual configurations.
Option 1
In this option, a project consists of multiple xml files, assembled using 'include' commands. The GUI also shows the tree structure of the directories containing the project directories.
projects
standard_projects
urbansim_gridcell
data_manager
....
model_manager
residential_development_project_location_choice_model.xml
commercial_development_project_location_choice_model.xml
....
defaults
development_project_location_choice_model_defaults
run_manager
main.xml
advanced
datasets_to_cache_after_each_model.xml
datasets_to_preload.xml
other.xml
data
baseyear_database.xml
cache_configuration.xml
services_configuration.xml
model_system
....
results_manager
....
urbansim_parcel
....
eugene_gridcell
....
psrc_gridcell
....
psrc_parcel
....
personal_projects
psrc_parcel_with_enhanced_location_choice_model
....
....
The top-level directory is projects. Under that are subdirectories, sub-subdirectories, and so forth. Eventually you get down to the directories that correspond to a particular project. In the example above, urbansim_gridcell is a project, psrc_gricell is a project, etc. The GUI knows that something is a project directory if it includes a hidden file .opusproject. So for example the psrc_gricell directory includes a file .opusproject. For now the contents of this file are ignored (maybe later we'll have a use for it).
Within each project directory there are subdirectories corresponding to the tabs in the GUI: data_manager, model_manager, run_manager, and results_manager. This particular structure is required, and given a project the GUI code for each tab knows to look in that subdirectory for its xml files.
In the example above, run_manager is fleshed out -- the subdirectories under it are the ones that are currently in the inprocess directory, except for a couple of changes. Changes: the contents of the models directory is moved to the model_manager tab; and the file baseline.xml is renamed to main.xml -- having a convention that this is the name for the main file -- analogous to index.html in web pages -- seems useful.
Note that the directory structure is not necessarily the same as what gets turned into an old-style configuration and passed to other parts of the system -- for example, main.xml under run_manager has relative paths to the other components, and these get assembled into one old-style configuration (as in the current code). Also the 'parent' field will in general span across projects: for example, the parent of the eugene_gridcell project is urbansim_gridcell (which holds the general defaults for gridcell projects).
Projects need to know how to update and commit to subversion. This would be done with hidden .svn files in the project directories. The standard_projects directory is for shared configurations that are part of the Opus release, and would be in a standard_projects project in svn. The personal_projects directory would be checked into a personal subdirectory of a personal_configurations project, or into the sandbox. Issue: how should we organize our subversion repository exactly to support this? The Atlantis MPO that wanted to have projects shared across the agency could have another directory atlantis_projects, at the same level as standard_projects and personal_projects .
Tree View
Right now the tree view in the GUI shows the tree for just a single xml file in a window; you can open other xml windows. In this option, all of the directories and files under e.g. run_manager are in a tree view as well. Then you could see all the parts that are available, and collapse or expand them as needed. In this case replace the right-click action to open a new tree pane with something that says "highlight and expand this part of the tree (e.g. for a parent). One thing that is potentially a problem with this design is that it doesn't distinguish between the tree of directories and files, and the tree of xml within a file. (Mostly this wouldn't matter to the user, but it does make a difference for example when checking into subversion with respect to resolving conflicts.) We could have some fairly subtle distinction in the tree view between nodes that represent directories and files, and nodes that are parts of the xml structure within a file. But overall these all live in a big tree view.
Extensions
We could also have project templates, e.g. urbansim_parcel_template. Their purpose in life is to be copied as the starting point for a new project. But users may find it just as convenient to use an existing real project for this, e.g. psrc_parcel. Proposal: start without these. It will be easy to add them later if we want, since they don't require any new functionality.
Option 2
Here we just show one project in the GUI at a time, rather than showing the tree view of different projects. For parents, show them inline but greyed out (because they can't be edited); have a command that copies nodes from the parent down into the current project if you want to edit them. (Actually we had this as an earlier design but haven't kept it alive.) A given project would still have multiple xml files that are composed (as in the urbansim project currently).
Otherwise this is like Option 1 -- in particular, we compose a project from multiple xml files.
Option 3
In this option we always have a 1:1 mapping between projects and xml files --- there is just one xml file per project. The GUI just shows one project at a time - if you want to look at another project you open it in a separate GUI.
The xml file has elements corresponding to the tabs in the GUI -- when you open a project, the GUI picks out that element to show in the currently selected tab. Here is the skeleton of the xml file for the default UrbanSim configuration in this scheme:
<project>
<data_manager>
</data_manager>
< model_manager >
</model_manager >
< run_manager >
<data>
....
</data>
<model_system>
....
</model_system>
</run_manager >
< results_manager >
</results_manager >
</project>
This is simpler than Options 1 and 2 with respect to both the GUI implementation and storing projects.
The reasons for having multiple files are:
- to make it easier to share parts of the xml
- to have fewer collisions when different people edit a configuration and store into subversion
- more opportunities for sharing among multiple projects
- to make the xml configurations more understandable when reading or editing the xml directly (rather than with the GUI)
Sharing within a project can be accomplished another way, by changing the include command to reference other parts of the xml tree rather than other files. Would we will want a capability of including external files?? This would allow different ways of doing sharing, in addition to just the inheritance link. Right now I'd say no -- the configurations get too intertwined and hard to understand. (This is certainly an issue with the configurations represented as Python code.)
Collisions may still be an issue - certainly they will be more likely.
Regarding readability of the xml, this is also an issue, but not a huge one -- the GUI should become the standard way to manipulate the xml.
How to Edit the Parent Configuration
A project could still have a link to an external parent (e.g. psrc_parcel would inherit from urbansim_parcel), and we need a way to inspect and potentially edit the parent configuration. There are several reasonable choices here:
- show the parent inline but greyed out (because they can't be edited); have a command that copies nodes from the parent down into the current project if you want to edit them (as suggested in Option 2).
- have a command to open another pane on the parent configuration (as in the current GUI)
- have a command to open another GUI window on the parent configuration
Option 4
Like Option 3, but the GUI also shows the tree structure of projects. (One objection to Option 3 may be that there isn't a tool to browse through all of the projects -- doing so is left up to the Finder or File Explorer or whatever of the underlying operating system.)
Recommendation
Right now I recommend we go with Option 3, with the possibility of going to Option 4 later. (This is basically add-on functionality.) The main tradeoff between that and Options 1 and 2 are more opportunities for sharing in Options 1 and 2, and less chance of collision when multiple people are editing; versus a much simpler GUI appearance and implementation and simpler rules for loading and storing projects for Option 3.
