This is a working page for describing the AZ-SMART Data Model
Download the latest data model diagram
https://trondheim.cs.washington.edu/svn/cuspa/az_smart/datamodel/AZ-SMART_composite_data_model.vsd
https://trondheim.cs.washington.edu/svn/cuspa/az_smart/datamodel/AZ-SMART_composite_data_model.png
General Notes on Tables and Diagram
The data model diagram and the following description was designed to show the tables and fields needed for a full AZ-SMART model run. In most cases the fields and tables listed are required by the model system, but there are exceptions. Examples of exceptions are the many of the readable descriptions such as '_types' tables and '_name' fields (see Naming Conventions). While these tables and fields are technically ignored by the model system, they are desirable for readability. They also represent typologies that must be created. Effort has been made to list the exceptions in the brief narrative descriptions below.
The data model diagram and the following descriptions were designed to be database agnostic, that is, these tables could exist in any DBMS or flat file structure(e.g. MSSQL, MySQL, MS Access, or .csv + shapefiles). Therefore, DBMS specific fields like OBJECTID (from ArcSDE) are generally left out of the diagram, with the exception of when the table really does represent geometry. Other details not present here concern data type specifics such as precision and scale. Some DBMS systems do not require these details (e.g. Personal Geodatabase).
In addition to required and optional tables and fields, others fields and tables may be added and would either be ignored (default behavior) or could be utilized with code or configuration changes to the model system. One example of new fields to be utilized would be to add additional development constraints to the development_constraints and parcels tables, such as 'is_in_urban_growth_boundary.'
Naming Conventions
- Tables are named in all lower case letters with underscores where spaces are desired
- For maximum clarity all table and field names are fully spelled out with a few exceptions(e.g. sqft)
- Most tables are standard database tables, the exceptions are those with '_geo' appended to the end. The '_geo' tables are actual feature classes or shapefiles that contain the shape geometry
- Tables ending in '_types' are look-up tables to give readable names to the numeric codes used elsewhere in the database, although these tables may occasionally contain information used by the model system when it makes sense
- Fields ending with '_id' are primary or foreign keys and are sequentially numbered integers
- Fields ending with '_name' are typically readable descriptions
List of Tables
Each table is outlined here with a brief narrative description and a field list with a description. For field data types, see the data model diagram above. In the field descriptions, every attempt has been made to show key relationships with a >, and a hyperlink to the related table.
parcels
This table contains attributes about parcels.
- Fields
- parcel_id - unique identifier
- shape - refers to the actual parcel geometry in the geodatabase
- zone_id - id number for the zone that the parcel's centroid falls within
- ownership_type_id - identifies the category of the parcel owner
- land_use_type_id - identifies the land use of the parcel
- city_id - id number for the city that the parcel's centroid falls within
- county_id - id number for the county that the parcel's centroid falls within
- plan_type_id - id number that identifies the parcel's plan type
- mpa_id - id number for the mpa that the parcel's centroid falls within
- census_block_id - id number for the census block that the parcel's centroid falls within
- raz_id - id number for the raz that the parcel's centroid falls within
- parcel_sqft - square feet of the parcel as an integer
- assessor_parcel_id - original tax assessor's id number
- tax_exempt_flag - identifies parcel as tax exempt or not
- is_in_flood_plain - indicates whether or not a parcel is in a flood plain
- is_on_steep_slope - indicates whether or not a parcel is on a steep slope
- is_in_fault_zone - indicates whether or not a parcel is in a fault zone
- is_in_archaeological_area - indicates whether or not a parcel is in an archaeological area
- centroid_x - state plane x coordinate of parcel centroid
- centroid_y - state plane y coordinate of parcel centroid
- land_value - value of the land from the assessor
buildings
This table contains attributes associated with buildings.
- Fields
- building_id - unique identifier
- parcel_id - identifies which parcel the building sits on
- building_type_id - integer identifying the building type
- land_area - describes the land area occupied by the building itself + associated infrastructure (e.g. lawn, parking lot, etc.)
- non_residential_sqft - the total sqft of non residential space in the building
- residential_units - the number of residential units in the building
- sqft_per_unit - the average number of sqft per unit of residential space
- year_built - the year the building was constructed
- building_quality_type_id - describes building quality
- improvement_value - the improvement value from the assessor or other source
households
This is a table of households. It contains one record for each household in the modeled region. This information is typically estimated or "synthesized." There are household synthesizers available that can generate the information needed for this table from census data. Usually source data for this table consists of SF1, SF3, CTPP, and PUMS data from the U.S. Census.
- Fields
- household_id - unique identifier
- building_id - identifies which building the household lives in *One way to assign households a building is by using a location choice model whereby each household probabilistically selects a building based on its and the building's characteristics
- persons - number of persons in the household
- workers - number of workers in the household
- age_of_head - the age of the head of household
- income - the income of the household as an integer
- children - the number of children in the household
- cars - the number of cars owned by the household
jobs
This table contains a record for each job in the modeled region. This information can be obtained from commercial or government sources.
- Fields
- job_id - unique identifier
- building_id - identifies which building the business occupies
- One way to assign jobs to buildings is to geocode the businesses which gives them a parcel_id, then assigning them "up" to the building_id record according to the building that sits on the parcel
- A location choice model could also be utilized
- job_sector_type_id - identifies the sector of the business (e.g. SIC, NAICS, other)
- building_type - corresponds to job_building_types table, determines if the job is home based or not
zones
This table refers to Traffic Analysis Zones (TAZs). It could hold any attributes recorded or maintained at a TAZ level, it would also link up (via the zone_name) to the actual feature class geometry.
- Fields
- zone_id - unique identifier (likely the same as taz_id)
- shape - refers to the actual parcel geometry in the geodatabase
- taz_id - unique identifier for linking to TAZ geometry
ownership_types
This is a lookup table that gives readable values to ownership types. Ownership types could be any typology that one may want to impose, for instance, state lands, federal lands, tribal lands, etc. Once identified, ownership types could be treated in a special manner in the modeling system.
- Fields
- ownership_type_id - unique identifier
- ownership_type_name - readable name to identify ownership type
land_use_types
This is a lookup table to give land uses readable names. Land use types could be any typology that a modeler may want to represent.
- Fields
- land_use_type_id - unique identifier
- generic_land_use_type_id - identifies the more aggregate land use type that this land use type falls into
- land_use_type_name - readable name to identify land use type
- land_use_code - MAG/PAG land use code
- land_use_type_description - readable description of the land use type
- unit_name - readable name that is used by the model so that it can interpret what units are used for different land use types: building_sqft for non-residential and mixed-use buildings, parcel_sqft for vacant parcels, residential_units for residential buildings
generic_land_use_types
This is a lookup table for a more general or aggregate level of land use types.
- Fields
- generic_land_use_type_id - unique identifier
- generic_land_use_type_name - readable name that identifies generic land use types
- generic_land_use_type_code - MAG/PAG level 1 land use code
cities
This table refers to cities within the study area. It could hold any attributes recorded or maintained at a city level, it would also link up (via the city_name) to the actual feature class geometry.
- Fields
- city_id - unique identifier
- shape - refers to the actual parcel geometry in the geodatabase
- city_name - readable name to identify the city
counties
This table refers to counties within the study area. It could hold any attributes recorded or maintained at a county level, it would also link up (via the county_name) to the actual feature class geometry.
- Fields
- county_id - unique identifier
- shape - refers to the actual parcel geometry in the geodatabase
- county_name - readable county name
- county_fips - FIPS code for county
plan_types
The plan types table identifies the various plan types derived from the general plan. These are things like 'Low Density Single Family Residential,' 'Mid-Rise Commercial,' or 'High Density Mixed Commercial/Single Family.' The values (e.g. min/max FAR or DU/acre) that specify what these plan types mean are spelled out in the development_constraints table.
- Fields
- plan_type_id - unique identifier
- plan_type_name - readable name that describes the plan type
mpas
This table refers to MPAs within the study area. It could hold any attributes recorded or maintained at a MPA level, it would also link up (via the mpa_name) to the actual feature class geometry.
- Fields
- mpa_id - unique identifier
- mpa_name - readable name to ide
- shape - refers to the actual parcel geometry in the geodatabasentify the MPA
census_block_lookup
This table allows for looking up which census geography a particular parcel falls within. It could be used for variable aggregation purposes (e.g. summarize population density at the block group level).
- Fields
- census_block_id - unique identifier, integer sequentially assigned
- block_id - the nationally unique full 15 digit census block identifier which concatenates the state, county, tract, block group, and block numbers into one. For example a value of 530330201002999 would identify: '53' (WA State), '033' (King County), '020100' (tract), '2' (block group), and '999' (block).
- block_group_id - the nationally unique full 12 digit census block group identifier which concatenates the state, county, tract, and block group numbers into one.
- tract_id - the nationally unique full 11 digit census tract identifier which concatenates the state, county, and tract numbers into one.
- county_id - the unique county FIPS code
razes
This table refers to RAZs within the study area. It could hold any attributes recorded or maintained at a RAZ level, it would also link up (via the raz_name) to the actual feature class geometry.
- Fields
- raz_id - unique identifier
- shape - refers to the actual parcel geometry in the geodatabase
- raz_name - identifier to link up to feature class geometry
building_types
This table identifies a typology of building uses. Often this will be the same as the land use, but in some cases it may make sense for it to be different. For instance, in the case where 4 buildings reside on a parcel, and 3 of them are residential with 1 being commercial, the land use might be residential but the buildings identified with their own use.
- Fields
- building_type_id - unique identifier
- generic_building_type_id - identifies the more aggregate building type this building type falls into
- building_type_name - readable identifier for building type
- is_residential - identifies whether or not the building type is residential
- unit_name - identifies the units used by the modeling system for this type of building (e.g. building_sqft, parcel_sqft, or residential_units)
generic_building_types
This table identifies a more aggregate level of building types.
- Fields
- generic_building_type_id - unique identifier
- generic_building_type_name - readable name for generic building type
job_sector_types
This table identifies business sectors that the model utilizes.
- Fields
- job_sector_type_id - unique identifier
- business_sector_name - readable name identifying business sector
- sqft_per_employee - the number of sqft each employee occupies
- This can be based on the business_sector_id by assuming a sqft/employee for each sector
development_constraints
This table represents constraints on new development. There are two parts to this table that are not immediately apparent just by examining the attributes:
- Selection criteria fields - These fields could be any parcel attribute that the modeler wishes to use to constrain development. Many examples will be geographical in nature: is_in_floodplain or city_id. Other examples could represent other parcel attributes such as ownership_id. These fields are expandable to include any attribute of a parcel the modeler may want to restrict development by. The same fields must exist in the parcels table for the models to find matches in this table.
- Constraint fields - These fields indicate the nature of development allowed under a certain plan_type. The minimum and maximum fields are informed by the constraint_type field which indicates the unit of measurement of the constraint (e.g. units per acre or FAR).
The models are designed for the most restrictive constraints to be used in the case of multiple matches on selection criteria.
- Fields (selection criteria)
- plan_type_id - identifier for the plan type
- ownership_type_id - allows constraints to be implemented on a particular ownership type
- city_id - allows particular cities to have their own specific development constraints
- county_id - allows particular counties to have their own specific development constraints
- is_in_floodplain - allows floodplains to have their own specific development constraints
- is_on_steep_slope - allows steep slopes to have their own specific development constraints
- is_in_fault_zone - allows fault zones to have their own specific development constraints
- is_in_archaeological_area - allows archaeological areas to have their own specific development constraints
- Fields (constraints)
- generic_land_use_type_id - specifies which land use type is allowed
- constraint_type - indicates the unit of measure for the 'minimum' and 'maximum' fields
- minimum - number indicating the minimum # allowed
- maximum - number indicating the maximum # allowed
- Fields (other)
- development_constraint_id - unique identifier
travel_data
This table contains travel data generated by an external travel model. The fields represented below are the approximate a minimum number of fields on travel times, but additional travel information can be leveraged by the model system. The table would be updated throughout a simulation run by the external travel model, but would need to be populated for the base year.
- Fields
- from_zone_id - a TAZ identifier from the zones table
- to_zone_id - a TAZ identifier from the zones table
- am_single_vehicle_to_work_travel_time - the time taken for a single occupant vehicle traveling to work during the AM peak time
- am_transit_to_work_travel_time - the time taken for a transit trip to work during the AM peak time
development_project_proposals
A record in this table, when combined with one or more records in the development_project_components table, represents a "known" development project. This table would be populated with projects known to be coming in the future. This table would also be populated during a simulation run for projects that are not yet complete, in other words, projects that are in the middle of developing according to their velocity function. It is entirely possible for a simulation run to happen without pre-populating this table with records.
- Fields
- development_project_id - unique identifier
- development_template_id - indicates the development template that represents the project
- far - floor to area ratio of the project
- percent_open - the percent of the land area of the project accounted for by "overhead" uses such as rights of way or open space
- development_status - this represents active, proposed, or planned developments
- parcel_id - indicates the parcel_id on which the development occurs
- start_year - the year in which this project is expected to begin building
- built_sqft_to_date - the number of non-residential sqft built in the current simulation year
- built_units_to_date - the number of residential units built in the current simulation year
development_project_proposal_components
A record in this table represents a portion of a development project identified in the development_projects table. In some sense a single record here is meant to represent a single building, or part of a building. Therefore individual records here do not necessarily represent single free-standing buildings, although they are mostly treated that way. This table allows for the flexible representation of mixed uses to occur on a parcel. Examples include multiple free-standing buildings with different uses, a single building with multiple uses inside of it (a single record for each use), or further complex representations of mixed use.
- Fields
- development_project_component_id - unique identifier
- development_project_id - identifies which development the project belongs to
- velocity_function_id - identifies the rate or function by which the project develops over time
- percent_of_building_sqft - identifies the percentage of the building that this component takes up
- 100% would indicate a free-standing building with a single use
- Several records with percent_of_building_sqft adding up to 100% would indicate a multiple use single building.
- construction_cost_per_unit - the per unit construction cost for residential uses only
- sqft_per_unit - the square footage per residential unit
- building_type_id - indicates the building type of this particular component
- land_area - the land area "claimed" by the building component
- This includes not only the building footprint but also additional land used such as yards, parking lots, etc.
- residential_units - the number of residential units in the building component
development_templates
This table, along with corresponding records in the development_template_components table, represents development templates. This table is roughly equivalent in nature to the development_projects table. This table is meant to be a repository of development templates to be drawn from when simulating new development and needs to be populated with a number of templates for the modeling system to utilize.
- Fields
- development_template_id - unique identifier
- percent_open - the percent of the land area of the project accounted for by "overhead" uses such as rights of way or open space
- min_land - minimum amount of land in square feet to be utilized for this development
- max_land - maximum amount of land in square feet to be utilized for this development
- density_type - a readable name that describes the 'density' field: units per acre, FAR
- density - indicates the density of the development
- land_use_type_id - specifies the land use type for the development template
- development_type - a readable name that describes the type of development this record represents (e.g. SFR-parcel, MFR-apartment, MFR-condo, etc.), this field is not used by the model and is there to make the table more readable
development_template_components
This table is roughly equivalent to the development_project_components table and represents buildings or parts of buildings to be included in a particular development template.
- Fields
- development_template_component_id - unique identifier
- development_template_id - indicates which development template this component belongs to
- velocity_funtion_id - indicates the velocity function used by this template
- building_type_id - indicates the building type of this particular component
- percent_of_building_sqft - identifies the percentage of the building that this component takes up
- construction_cost_per_unit - the per unit construction cost
- building_sqft_per_unit - the square footage per residential unit
velocity_functions
This table is designed to hold the velocity functions that specify the rate at which development is built out.
- Fields
- velocity_function_id - unique identifier
- annual_construction_schedule - this field will contain a numbered list of this form: [25, 50, 75, 100]
- a particular development_template_component or development_project_component will have one velocity_function_id attached to it
- annual_construction_schedule could take the form [0, 0, 0, 33, 66, 100] to offset a particular component a number of years from the development_project start_year
building_quality_types
This table identifies building quality types.
- Fields
- building_quality_type_id - unique identifier
- building_quality_name - readable name identifying building quality
target_vacancies
This table contains vacancy rates for each building type for each simulation year. The developer model, in each year, will check to see what the current vacancy rate is per building type then compare that against the corresponding target vacancy rate. If the current rate is higher than what is in this table, the developer model will not develop anything. If it is lower, and development constraints allow it, the developer model attempts to develop the area using templates until the target vacancy rate is met.
- Fields
- year - year in which vacancy rate applies
- building_type_id - identifies which building type the vacancy rate and year applies to
- target_vacancy_rate - the vacancy rate applied
employment_adhoc_sector_groups
See explanation of employment_adhoc_sector_group_definitions table.
- Fields
- group_id - unique identifier
- name - a readable name defining an employment group
employment_adhoc_sector_group_definitions
This table, combined with employment_adhoc_sector_groups, defines an adhoc aggregation of employment sectors. These employment group definitions are adhoc in the sense that there is no defined way to aggregate certain sectors together. Groups such as 'retail' or 'services' are fairly ambiguous. This table structure allows disaggregate employment sectors to be aggregated, and have overlapping categories. These tables are mainly used for variables and indicator calculations and not in the simulation system. For instance, we may want to ask the question of a simulation: what is the growth in services employment within the CBD? These tables allow us to define what 'services' employment actually is, then calculate these indicators.
- Fields
- job_sector_type_id - the job sector type (not unique, a job sector may appear in more than one group)
- group_id - the group id that the sector belongs to
household_characteristics_for_ht
This table provides the household transition model with information it needs to add new households in each simulation year.
- Fields
- characteristic - includes categories about households such as income, workers, children
- minimum - the representative minimum number that this category represents
- maximum - the representative maximum number that this category represents
demolition_cost_per_sqft
This table provides information to the developer model about the costs of demolition by building type. These numbers are used to calculate the cost of demolition of existing development so that a more accurate cost of redevelopment can be calculated.
- Fields
- building_type_id - building type
- demolition_cost_per_sqft - cost in dollars per sqft of demolition
building_sqft_per_job
This building contains information on the amount of space each job will take in a particular building type, by zone.
- Fields
- zone_id - the zone the record applies to
- building_type_id - the building type the record applies to
- building_sqft_per_job - the sqft per job each job will take in a particular building type in a particular zone
job_building_types
This table specifies which jobs are home based or not. This table currently exists as a convenience to the way the code is structured and could be moved directly into the jobs table.
- Fields
- id - unique identifier
- name - readable name (home_based or non_home_based)
- home_based - boolean value corresponding to the 'name' field
annual_relocation_rates_for_households
This table breaks down households into groups based on head of household age and income level, then assigns a probability of a household characterized in this way relocating in a given simulation year. The age and income level groups are arbitrary, and the rate of relocation can be set to zero to eliminate all relocation.
- Fields
- hh_rate_id - unique identifier
- age_min - minimum age of head of household
- age_max - maximum age of head of household
- income_min - minimum income of household
- income_max - maximum income of household
- probability_of_relocating - the probability that a household with these characteristics will relocate
annual_relocation_rates_for_jobs
This table assigns a probability of a job in a particular sector relocating in a given simulation year. The rate of relocation can be set to zero to eliminate all relocation.
- Fields
- job_rate_id - unique identifier
- job_sector_type_id - corresponds to the job_sector_types_table
- job_relocation_probability - the probability that a job in this sector will relocate
Control totals tables
It is expected that these tables are updated throughout a simulation run by an external modeling system. These tables are required by the model system and would need to be populated for the baseyear, but after that would be updated by the external modeling system throughout the simulation run.
Coefficients and Specification tables
These tables would be populated prior to a simulation run by a model estimation process. One coefficients and one specification table are needed per model that requires an estimation.
Specific questions from MAG addressed
- Business table questions
- This has changed to a 'jobs' table since DRAM/EMPAL provides employment control totals
- sqft_per_employee has moved to the job_sector_types table, we can make an informed assumption about this number unless there are data available to help us with this
- Rather than store NAICS codes and text in the jobs table, those are spelled out in the job_sector_types table
- OPEN QUESTION: the employment control totals table is currently contains generalized job types (e.g. "office" employment) whereas the jobs table will have a job_sector_type which corresponds to an actual NAICS identifier. This issue needs to be reconciled.
- Buildings table questions
- CUSPA:Your suggestion for a total_residential_sqft field, this is available to be computed at any time as you have shown by labeling it as 'derived,' however, the model system itself will not make use of this field, hence it is not in the data model
- CUSPA:The building_quality field was changed to building_quality_type_id and a corresponding building_quality_types table was added, this was an oversight of mine
- Development projects and components tables
- MAG: can we combine these tables?
- CUSPA: the short answer is no. The idea behind having separate 'X_projects' and 'X_project_components' tables (in both projects and templates) is to accommodate complex mixed use representations, and roughly correspond to the data structure used by SAM-IM (e.g. the develop and developc structure). It is possible to combine these into one table, but it would result in duplication of data and would also not correspond with the templates data structure.
- MAG: can we combine these tables?
- Control totals table questions
- MAG: Why do we need control totals at both MPA and RAZ level? Control totals should be flexible interms of the geography.
- CUSPA: We do not, we just wanted to show that the system is flexible in terms of control total geography
- MAG:How does the model deal with Industry Sector - Do we want to feed the model with Industry Sector Control Totals at County/MPA level and Land Use sector control totals at MPA/RAZ level?
- MAG: How will the model deal with non-resident population (seasonal - more than 2 weeks, transient - less than 2 weeks)
- CUSPA: The short answer is that it doesnt right now. We will need to talk about what is desired in this area by MAG.
- CUSPA: A suggestion for this and the various control total geographies is to simply use the current SAM-IM submodel outputs in the AZ-SMART Phase 1 model system. We can write scripts that incorporate the outputs from these submodels into AZ-SMART, this way MAG would not have to change the way these models are maintained. We will need to collectively decide what the model system should do with the submodel outputs.
- MAGDoes the model need group quarter control totals by type of group quarter (prisons, nursing homes, dormitories, military,etc.)
- CUSPA: We suggest using the current submodel structure from SAM-IM, see answer to question above.
- MAGIt is possible that the travel model zones are a different geography or extent. How will this be dealt with?
- CUSPA: Our suggestion is to decide if AZ-SMART will deal with across the entire travel model region. MAG/PAG will need to get the data to populate the baseyear database from other counties in this case. For those areas of the land use region that are not currently covered by the travel modeling, new TAZ areas will need to be defined with assumptions about how long it takes to get to those areas.
- MAG: Why do we need control totals at both MPA and RAZ level? Control totals should be flexible interms of the geography.
