RiskGroupsArchitecture offline WYSIWYG location

CategoryTopic RogerMateer (Developer) MaitRaag (Customer)

Abstract

This page contains a description of the architecture of the RiskGroups Modgen microsimulation model as it stands at Release 1 (changeset 42:e15eaa7dfbdc) in the riskgroups repository. An older description of the same model (changeset 15:82fc488c4f6d) is available on the RiskGroups wiki page. Although it is out of date, it gives a fair complementary description of the nature of the system to this page. Unlike that page, this one tries to focus on balancing the conflicting goals of providing a description which is specific enough to allow the reader to use it to make sense of the model's design while being general enough not to require too much maintenance as the implementation details change. It also aims to be a forum for discussion about the architecture, so please feel free to add commentary and/or questions to it, annotating them dialogue-style with your name, so that we can distinguish "official description" from the discussion surrounding it. For example//
  • This is sample "official" content describing the system
MaitRaag: is this a question? RogerMateer: this is the answer. ---- Features that make this Modgen model different from those usually encountered are as follows. Let me know where you'd like me to elaborate on any feature, or if you have questions about a feature that i've failed to cover.
  • It makes quite extensive use of C++ classes which do not correspond to actors (although Modgen also implements actors as classes).

  • It stores the parameters and information about the running state of the system in a system of abstract data types (ADTs) defined in TimeSeries.mpp. Value represents a multidimensional array of floating point numbers, where the number and sizes of each of the dimensions can be specified at run-time. It can be thought of as a map (in the mathematical sense) from Index instances (which are another ADT representing a list of natural number indices) to floating point numbers. It was originally implemented using the Standard Template Library map template class instantiation, but that implementation was demonstrated to be orders of magnitude too slow and was replaced by an implementation encoding the contents of an Index instance in the single index of a flat array of floating point numbers. Finally, the TimeSeries ADT is a map from Time (essentially a wrapper for a single floating point value representing a point in calendar time) to Value. This allows one to represent the evolution of a multidimensional array of numbers over time, including such niceties as being able to linearly interpolate a Value at a Time between the Times of two stored Values or outside the Time range of stored Values.

  • It uses mappings (Map and its subclasses), allowing the details of different deterministic models to be mapped into the same general internal model. A single Map instance is created before the simulation starts and its various Value and TimeSeries entries (representing the parameters of the internal model) are populated using the parameters of the chosen external model input from the scenario .dat file. This is done so that the internal model parameters can be checked or viewed at appropriate points in the executable run (currently at the beginning and end of a simulation). Unfortunately, that theory doesn't translate too well into practice because some of the internal parameter arrays are so large that displaying them in the current generic way in the log file doesn't really give the log file reader a good idea of their structure. Map also contains a few other components which are not Value or TimeSeries instances.

  • Map contains a DirectedContactsEventTime instance, which handles the correspondence between the chosen method of specifying how directed contact rates vary over time and the method needed by the internal model.

  • Map also contains a PrevalenceResponse instance, which maintains a TimeSeries history of the sizes of all subpopulations so that it can compute the behavioural response factor for any event time method that needs it. Most probably the naive way this is done is the major cause of the dismal performance that the system exhibits (only about 4000 events per hour on my machine), but that hypothesis needs to be experimentally confirmed.

  • Finally, Map contains three behaviour mapping arrays X,Xsend,Xrecv, enumerating for which transmission-mode and pair of gender-risk-groups which behaviour should be presented to the PrevalenceResponse instance to determine behavioural response for symmetric, directed-sending and directed-receiving contacts, respectively.

  • There is a legacy StatusToPrevalenceConverter class, whose purpose is to take the contents of the DiseaseStatus table after a simulation and produce the corresponding atomic and aggregate HIV prevalence values to populate the FooDiseasePrevalence table (for external model Foo). Most probably PrevalenceResponse can now do something quite similar with the population TimeSeries history it stores. Once we see how the outputs correspond, maybe StatusToPrevalenceConverter can be removed.

  • It makes use of a Stream class, whose purpose is to allow messages about the state of the system to be dumped efficiently to a log file (initally named Stream(log).txt, but renamed by the build superstructure to SCENARIO(streamlog).txt at the end of a run of scenario SCENARIO). Its method of constructing messages from strings and variables closely mimics the standard cout iostream model familiar to any C++ programmer. Each message is automatically annotated with its source code location, the time at which it is produced relative to the program, and the delta time since the last message was produced. Such annotations make it relatively easy to interact with the program (over the course of a sequence of preferably short test-run cycles) to isolate the causes of performance problems in the absence of a profiler.

  • It makes use of a custom-made simple assertion-based testing infrastructure which can be used both in test and production code. A test condition which we expect to be true is tested, and a message is sent to the log file iff the condition proves to be false. Test code organised into suites (currently just Unit and Acceptance) using a Test actor and function hooks. The Unit suite is run before the simulation starts, and is used to test various properties of the various abovementioned classes which make up the system. The Acceptance suite is run after the simulation (iff the simulation itself is run), and uses the contents of output tables to see whether some observed aspect of the overall simulation's behaviour corresponds (typcially in some statistical sense) to some expectation about its behaviour which can be couched in terms of the internal model's parameter values.

  • The build superstructure uses a simple recursive file inclusion mechanism to build conventional scenario .dat files from common pieces. This is essential to preserve maintainability of scenarios while constructing fairly elaborate scenario sequences, such as we will need for the validation process of Release 2.

  • The build superstructure supports a command allowing the developer to autogenerate event families prior to compilation. This is not done automatically prior to every compilation because there are tradeoffs between compilation and run performance on the one hand, and family case coverage on the other. At some point, it would probably be good to investigate whether a form of the Gillespie method can be used to eliminate this inconvenience by multiplexing the members of an event family into a single composite event per family. The essence of the Gillespie method seems to be in figuring out how the event time of the composite event is constructed from the event times of its component family members, and how to choose which family member should be invoked next. This is currently prioritised as a somewhat distant future feature of RiskGroups, but the longer we put it off, the longer we have to deal with the current inconveniences, and their impact on development and possibly run speed...
Right now i can't think of anything else to mention. If you can, let me know.
«Main Page  • Queries? Email: Roger Mateer  • Last Modified: 2011/08/15  • All rights reserved © SACEMA 2011