Institute for Software Research
School of Computer Science, Carnegie Mellon University


Simulation Validation for Societal Systems

Alex Yahja

September 2006

Ph.D. Thesis

Keywords: Simulation, multi-agent, social-network, validation, model improvement, knowledge-based systems, ontology, inference, knowledge management, causality, hypothesis building, hypothesis testing, experiment, empirical artificial intelligence

Simulation models, particularly those used for evaluation of real world policies and practices, are growing in size and complexity. As the size and complexity of the model increases so does the time and resources needed to validate the model. Multi-agent network models pose an even greater challenge for validation as they can be validated at the individual actor, the network, and/or the population level. Validation is crucial for acceptance and use of simulations, particularly in areas where the outcomes of the model will be used to inform real world decisions. There are, however, substantial obstacles to validation. The nature of modeling means that there are implicit model assumptions, a complex model space and interactions, emergent behaviors, and uncodified and inoperable simulation and validation knowledge. The nature of the data, particularly in the realm of complex socio-technical systems poses still further obstacles to validation. These include sparse, inconsistent, old, erroneous, and mixed scale data. Given all these obstacles, the process of validating modern multi-agent network simulation models of complex socio-technical systems is such a herculean task that it often takes large groups of people years to accomplish. Automated and semi-automated tools are needed to support validation activities and so reduce the time and number of personnel needed.

This thesis proposes such a tool. It advances the state of the art of simulation validation by using knowledge and ontological representation and inference. Advances are made at both conceptual and implementation or tool level.

A conceptualization is developed on how to construct a reasoning system for simulation validation. This conceptualization sheds light on the relationships between simulation code, process logic, causal logic, conceptual model, ontology, and empirical data and knowledge. In particular, causal logic is employed to describe the cause-and-effect relationships in the simulation and "if-then" rules closely tied to the cause-and-effect relationships encode how causal parameters and links should change given empirical data. The actual change is based on minimal model perturbations. This conceptualization facilitates the encoding of simulation knowledge and the automation of validation. As a side effect, it also paves a way for the automation of simulation model improvement.

Based on this conceptualization, a tool is developed. This tool, called WIZER for What-If Analyzer, was implemented to automate simulation validation. WIZER makes the model assumptions explicit, handles a complex model space and interactions,captures emergent behaviors, and facilitates codification and computer-processing of simulation and validation data. WIZER consists of four parts: the Alert WIZER, the Inference Engine, the Simulation Knowledge Space module, and the Empirical/Domain Knowledge Space module.

The Alert WIZER is able to characterize simulation data with the assistance from statistical tools it can semantically control, compare the data to the empirical data, and produce symbolic or semantic categorization of both the data and the comparison. The Inference Engine is able to perform both causal and "if-then" rule inferences. The causal inferences capture the core workings of the simulations, while the "if-then" rule inferences hint at which model parameters or links need change given the symbolic categories from the Alert WIZER. Both kinds of rule inferences have access to ontology.

The Inference Engine is in the form of a forward-chaining production system but with knowledge-based and ontological conflict resolution. It performs minimal model perturbations based on knowledge bases and ontology. The perturbations result in new parameter values and/or meta-model values best judged to move the simulator closer to validity for the next cycle of simulation. Both the simulation knowledge space and the domain knowledge space are in the form of a graph, with nodes representing entities, edges representing relationships, and node attributes representing properties of the entities. Knowledge-based and ontological reasoning is performed on both knowledge spaces. A simple hypothesis can be formed by search and inference in the knowledge bases and ontologies.

Several validation scenarios on two simulation models are used to demonstrate that WIZER is general enough to be able to assist in validating diverse models. The first model is BioWar, a city-scale multi-agent social-network of weaponized disease spread in a demographically realistic population with naturally-occurring diseases. The empirical data used for the WIZER validation of BioWar comes from the National Institute of Allergy and Infectious Disease and other sources. The second model is CONSTRUCT, a model for co-evolution of social and knowledge networks under diverse communication scenarios. The empirical data used for the WIZER validation of CONSTRUCT comes from Kapferer's empirical observation of Zambia's tailor-shop's workers and management.

The results of BioWar validation exercise show that the simulated annual average influenza incidence and the relative timing of the peaks of incidence, school absenteeism, and drug purchase curves can be validated by WIZER in a clear and concise manner. The CONSTRUCT validation exercises produce results showing that the simulated average probability of interaction among workers and the relative magnitude of the change of the simulated average probability of interaction between different groups can be matched against empirical data and knowledge by WIZER. Moreover, the results of these two validation exercises indicate the utility of the semantic categorization ability of the Alert WIZER and the feasibility of WIZER as an automated validation tool. One specific CONSTRUCT validation exercise indicates that "what-if" questions are facilitated by WIZER for the purpose of model-improvement, and that the amount of necessary search is significantly less and the focus of that search is significantly better using WIZER than using Response Surface Methodology.

Tools such as WIZER can significantly reduce the time for validation of large scale simulation systems. Such tools are particularly valuable in fields where multi-agent systems are needed to model heterogeneous populations and diverse knowledge, such as organizational theory, management, knowledge management, biomedical informatics, modeling and simulation, and policy analysis and design.

246 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by