RiskGroupsReleasePlan offline WYSIWYG location

CategoryTopic RogerMateer (Developer) MaitRaag (Customer)

Abstract

This page contains the collaboratively evolving plan for the RiskGroups project. It is structured according to the adaptive planning recommendations explained in ReleasePlanning, progressively zooming in from a more general, broader view at longer timescales to a more detailed, narrower view at shorter timescales. Anything about the plan is open to discussion and revision by the stakeholders. (If you're reading this, you qualify.) This page also contains a description of the iteration and release history. This is not open to revision because it is supposed to be a record of the sequence of past iterations and releases as they actually happened. This page has the following structure to accommodate that information. Please respect this structure in any revisions you make.
FUTURE

  • Project Vision (for entire project)

  • Some features and all release dates (for next 6 months)

  • All features and most of the stories (for next 3 months)

  • Estimated prioritised stories (for next 4 weeks)
PRESENT

  • Detailed requirements and customer tests (for current 1 week iteration)
PAST

  • Iteration and release history (for past iterations and releases)
Minimum Marketable Features are expressed as FEATURE: description In our situation, a Minimum Marketable Feature is the smallest set of functionality that provides value to some user of the model. It is ideally a minimal coherent collection of Stories, aiming towards some common useful end goal. Stories are expressed as:
  • STORY(points): description.
A Story
  • is a self-contained, individual element of the project.

  • typically represents one or two days of work.

  • is a concise, Customer-centric description of something you care about.

  • is an invitation to further discussion between Developer and Customer.

  • should be devised following the INVEST principle (Independent Negotiable Valuable Estimable Small Testable; see this AgileForAll article for more detail).
Please note: Both features and stories are placed in order of increasing priority for the current or some subsequent release, with the stories pertaining to a given feature listed under that feature. This is done so that the current iteration can just take the highest priority story from the bottom of the list. Sorry if it's a bit confusing, but i had to get a lot of timing and priority information into one consistent format, without having an ever growing past obscure too badly the tactically more important present and future. ----

Project vision (for entire project)

Note: this is an initial draft vision which needs to be discussed.

What?

RiskGroups is a general Modgen model of HIV+TB transmission in a population divided into risk-gender groups differentiated by the sets of behaviours they engage in. These behaviours affect the modes of disease transmission they are exposed to and the degree of transmission risk associated with each mode. It has certain up-front design features which seem to be worth keeping, but their merits are up for discussion. These include:
  • event families (code generation is now less of a blunt instrument, so the performance overhead of generated events can be accurately minimised, and the idea of implementing multiplexed events using an explicit Gillespie algorithm no longer seems necessary to try to mitigate that performance overhead)

  • symmetric contacts vs directed contacts, with the latter being implemented as a sort of contact market whose time-dependent behaviour can be specified in different complementary ways

  • mappings from context specific risk-group models into the general internal model structure, allowing cross-validation with diverse deterministic models in the risk-group family.

Why?


  • Why should we microsimulate risk-group models at all?

     ⋄ to get an idea of variability in results
This goal is beyond the reach of a regular deterministic ODE/PDE model, but is something that a stochastic ODE/PDE model (using random variables for population sizes) could also do. It might also be possible to obtain variability results from a more sophisticated ODE/PDE model which Carel Pretorius pursued in his PhD: rather than modelling population sizes directly, it somehow (and i'm unsure of my terminology here) uses random processes, Markov chains, and the Fokker-Planck and/or Master equation techniques to model the time evolutions of the mean and variance of population sizes.
  • So why should we use an individual-level microsimulation rather than one of these alternative techniques?

  • Why should RiskGroups be a general model with mappings instead of multiple specific models?

Success criteria?

Meeting the microsimulation needs of specific contexts in a timely fashion while not needlessly compromising the spirit of generality expressed above. The current primary context in which these needs exist is the Estonian risk groups project, for which MaitRaag is the Customer. ----

Some features and all release dates (for next 6 months)

These are blue-sky features beyond the 6 month range: FEATURE: The riskgroups build script implements a means of comparing the behaviours of RiskGroups and Carel's PDE Estonia model.
  • STORY: RiskGroups needs a new mapping to set it up for comparison with Carel's PDE Estonia model.

  • STORY: The riskgroups build script needs access to an executable implementation of Carel's PDE Estonia model. (Implementing a PDE model seems to be a non-trivial exercise.)
FEATURE: RiskGroups implements the 3-state HIV system (HIV-, HIV+, AIDS) with progression to AIDS dependent on time since infection, as found in Carel's PDE Estonia model. FEATURE: RiskGroups implements the standard 3-state TB system (Susceptible, Latent, Active) found in CellMM and Carel's PDE Estonia model.

Release 4 (2011/12/15)

FEATURE: Submit paper for publication (FIXME: by when exactly?).

Iteration 4.11 (Week starting 2011/12/11)

Iteration 4.10 (Week starting 2011/12/04)

Iteration 4.9 (Week starting 2011/11/27)

Iteration 4.8 (Week starting 2011/11/20)

Iteration 4.7 (Week starting 2011/11/13)

Iteration 4.6 (Week starting 2011/11/06)

Iteration 4.5 (Week starting 2011/10/30)

Iteration 4.4 (Week starting 2011/10/23)

Iteration 4.3 (Week starting 2011/10/16)

Iteration 4.2 (Week starting 2011/10/09)

Iteration 4.1 (Week starting 2011/10/02)

----

All features and most of the stories (for next 3 months)

Release 3 (2011/09/30)

Iteration 3.13 (Week starting 2011/09/25)

FEATURE: The RiskGroups microsimulation model can produce outputs which appropriately track the outputs produced by Mait Raag's deterministic Estonian risk groups model as it currently stands, both for convenient synthetic scenarios and for the realistic scenarios required for the paper. Time permitting, the following STORYs may contribute to this feature, but they are subordinate in priority to the more formal validation sequence that follows them:
  • STORY: There is a general issue with various types of events theoretically being able to be defined either at the individual or at the population level. The way this is done at the moment is somewhat ad-hoc with respect to event type. For instance, introduction of new actors is defined both at the population level (IntroductionEvent) and at the individual level (BirthEvent). (The BirthEvent family, by the way, should probably be parameterised by more attributes of parent and child-to-be than it currently is.) Also, DirectedContacts are implemented with a hybrid mechanism of specifying population level rates or specifying sender or receiver individual level rates and having them converted into a population level mechanism which is supposed to take changing population group sizes into account. How well this mechanism works and how elegant the implementation is is all up for debate.

  • STORY: Explore and resolve the possible conflicts between the ways the microsimulation and deterministic models have chosen to deal with ambiguity regarding disagreements between senders and receivers of directed contacts concerning prophylactic use or behavioural response.

  • STORY: RiskGroups' handling of time needs an overhaul. Internally, time should be represented abstractly, using a compile-time-fixed number of abstract time periods instead of years. Externally, calendar start and end year bounds should be user-specifiable parameters. The calendar year bounds should be used to convert an internal simulation time (expressed in periods) into the corresponding calendar year value whenever this is needed for interpolating a TimeSeries (which all express annual rates as a function of calendar year). The number of abstract internal periods and the calendar year bounds should also be used to convert supplied annual rates (whether TimeSeries or fixed rates) into internal per-period rates, so that the accuracy of a microsimulation can be smoothly traded for run-time by increasing the number of internal periods (aka decreasing its step-size).

  • STORY: Needle sharing rate is a gender-independent time-dependent parameter. In order to convert it to the gender-dependent parameter that the internal model needs, each gender-dependent parameter value needs to be the gender-independent parameter value scaled by each of the appropriate gender "prevalences" in the two risk groups concerned. Currently this is somewhat faked by using the gender "prevalences" at time 0 (assuming that they remain constant throughout the simulation), whereas what we ideally want is the parameters to adapt according to the gender "prevalences" at the current time t. To do this properly, it would be necessary to set up a mechanism allowing a time-dependent parameter to inform its value from information (in this case gender "prevalence" information) repeatedly polled during the simulation.
----

Estimated prioritised stories (for next 4 weeks)

NOTE: We're officially only supposed to estimate and prioritise stories 4 weeks ahead, but i pretty much have a set top-priority course for a longer time period: completing the validation. I said previously that if i had this project to start over from scratch, that rather than build up lots of model attributes in parallel phases where any customer value is only delivered after an unspecified long period of time, i would take one simple model attribute and complete it end-to-end, so that the system has been validated as being able to run a model with that attribute (and can automatically be revalidated whenever the implementation changes). Then we could move on to expanding the model a step with the next attribute. A series of such end-to-end steps would be performed, gradually converging the system on a valid working microsimulation implementation of Mait's full deterministic model. I then realised that this doesn't have to be wishful thinking about what we should have done, and that the system's development would probably benefit from having such a sequence of validated model attribute STORYs imposed on it. This strategy does seem to be improving the quality and completeness of the system. Below is a sequence of STORYs involving validation of various combinations of model attributes, which sequence divides the full model up into subsystems characterised by different transmission modes, tests each subsystem separately and then integrates it into a growing cumulative system which converges on the full model. This sort of validation stage structure is employed to try to meet our need to test several salient points of the huge space of possible scenarios in the severely limited time frame. In each of these STORYs, to "validate a scenario" means to do everything that it takes to ensure that the microsimulation model can be automatically verified to match a deterministic model encoding our intuitive understanding of the scenario in question. If feasible, this may include obtaining a steady state for the scenario, so that this can be used as a starting point for any subsequent scenario that builds on it, aiding our understanding of the behaviour of what has been added in the subsequent step. A scenario consists of some combination of the following model attributes, and is named according to the list of attributes it possesses. A scenario is considered to possess an attribute if some of the relevant model parameters are convenient, arbitrary non-zero synthetic values, and not to possess the attribute if all of the relevant model parameters are zero. The attributes are: A There is a starting population distribution over risk group (I,P,M,S,C,G), gender, HIV status B All populations are subject to introductions and background mortality C HIV positives are additionally subject to HIV mortality D HIV is transmitted via heterosexual consensual sex: (i) I <-> I (ii) P <-> P (iii) G <-> G E Heterosexual consensual sex behaviourally responds to HIV prevalence of entire population or of IDUs if they are involved F HIV is transmitted via needle sharing: I <-> I G Needle sharing behaviourally responds to HIV prevalence of IDUs H HIV is transmitted via heterosexual consensual sex: (i) G <-> P (ii) P <-> I I HIV is transmitted via homosexual consensual sex : M <-> M J Homosexual consensual sex behaviourally responds to HIV prevalence of MSMs K HIV is transmitted via heterosexual consensual sex: (i) G <-> M (ii) M <-> P L HIV is transmitted via heterosexual commercial sex : S <-> C M Heterosexual commercial sex behaviourally responds to HIV prevalence of entire population N HIV is transmitted via heterosexual consensual sex : (i) G <-> C (ii) C <-> P O Risk-group migrations are allowed The validation STORYs are:

Iteration 3.12 (Week starting 2011/09/18)


  • STORY(2 points): Improve the validation to include the features necessary for realistic scenario runs. Possible issues include introduction of time dependence (for which most of the machinery exists, but things might not quite hang together perfectly), and the need to simulate a large general population in order to prevent artifacts in the significantly smaller risky populations of interest.

Iteration 3.11 (Week starting 2011/09/11)


  • STORY(1 point): Validate scenario ABCDEFGHIJKLMNO, an integrated het+nee+hom+com model with het+nee+hom+com behavioural response and het+nee+hom+com links and risk-group migration ("the full model").

  • STORY(1 point): Validate scenario ABCDEFGHIJKLMN, an integrated het+nee+hom+com model with het+nee+hom+com behavioural response and het+nee+hom+com links.

Iteration 3.10 (Week starting 2011/09/04)


  • STORY(1 point): Validate scenario ABCDEFGHIJKLM, an integrated het+nee+hom+com model with het+nee+hom+com behavioural response and het+nee+hom links.

  • STORY(1 point): Validate scenario ABCDEFGHIJKL, an integrated het+nee+hom+com model with het+nee+hom behavioural response and het+nee+hom links.

Iteration 3.9 (Week starting 2011/08/28)


  • STORY(1 point): Validate scenario ABCLM, a com model with com behavioural response.

  • STORY(1 point): Validate scenario ABCL, a model of heterosexual commercial sex HIV transmission ("com model").

Iteration 3.8 (Week starting 2011/08/21)


  • STORY(1 point): Validate scenario ABCDEFGHIJK, an integrated het+nee+hom model with het+nee+hom behavioural response and het+nee+hom links.

  • STORY(1 point): Validate scenario ABCDEFGHIJ, an integrated het+nee+hom model with het+nee+hom behavioural response and het+nee links.

Iteration 3.7 (Week starting 2011/08/14)


  • STORY(1 point): Validate scenario ABCDEFGHI, an integrated het+nee+hom model with het+nee behavioural response and het+nee links.

  • STORY(1 point): Validate scenario ABCIJ, a hom model with hom behavioural response.

Iteration 3.6 (Week starting 2011/08/07)


  • STORY(1 point): Validate scenario ABCI, a model of homosexual consensual sex HIV transmission ("hom model").

  • STORY(1 point): Validate scenario ABCDEFGH, an integrated het+nee model with het+nee behavioural response and het+nee links.

Iteration 3.5 (Week starting 2011/07/31)


  • STORY(1 point): Validate scenario ABCDEFG, an integrated het+nee model with het+nee behavioural response.

  • STORY(1 point): Validate scenario ABCDEF, an integrated het+nee model with het behavioural response.
----

Detailed requirements and customer tests (for current 1 week iteration)

FIXME: feel free to provide input here if you disagree

Iteration 3.4 (Week starting 2011/07/24)

Unfortunately, i got sick during the previous iteration, so my performance was not what it should have been, and the ETA of the end of the validation has slipped another week (so it now stands at the end of iteration 3.12). However, i still think there is probably nothing significantly new to these two stories, so i hope i will be able to complete them...
  • STORY(1 point): Validate scenario ABCFG, a nee model with nee behavioural response.
to do.
  • STORY(1 point): Validate scenario ABCF, a model of needle sharing HIV transmission ("nee model").
to do. ----

Iteration and release history (for past iterations and releases)

FIXME: i should probably include iteration demo videos here once i get those working properly

Iteration 3.3 (Week starting 2011/07/17; velocity = 1 point/iteration)

I've decided to try not to let the ETA of the end of the validation slip any further this iteration (so it still stands at the end of iteration 3.11). This seems to be reasonable to accomplish by completing the 3 stories below because there is really only one potential difficulty involved in all of them: getting behavioural response to cooperate. The second two stories should hopefully not involve any new concepts beyond those which will have been encountered upon completion of the first story. So, it's a challenging plan, but maybe i can pull it off...
  • STORY(1 point): Validate scenario ABCFG, a nee model with nee behavioural response.
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario ABCF, a model of needle sharing HIV transmission ("nee model").
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario ABCDE, a het model with het behavioural response.
Done. I earn 1 point. There has been a slight modification to the visual output: There are a number of cases where a given population group's behaviour doesn't change from one deterministic scenario to the next, and it was difficult to see when and how this was happening. Since a given population always starts from a given value, irrespective of scenario, and its course is completely determined by that fact and the evolution of its derivative, i decided to augment the derivative plots by replacing a plot of the derivative in a given colour with a plot of the derivative in the neutral colour (a dashed line for the PNG Gnuplot terminal type) together with an indentification band in the given colour with a width unique to each deterministic scenario. This was the only way i could persuade GNUplot to reliably allow us to see, when derivative curves overlap, which ones are hidden underneath which others. If two derivative curves are identical, they will merge into a single trajectory, but their identification bands will have different widths and will all still be visible. I have also, for the sake of efficiency, changed the notion of what it means to run a validation for a given scenario. We need to check that all previous scenarios in the sequence still behave correctly, but it is too resource consuming to check the entire previous sequence at the full replicate resolution. So we only do a two replicate check for each previous scenario, just to ensure that nothing we did to get the new scenario to work broke any old ones. For the current situation, that corresponds to the following four visualisations:
  • visualisation output for scenario A with 2 replicates

  • visualisation output for scenario AB with 2 replicates

  • visualisation output for scenario ABC with 2 replicates

  • visualisation output for scenario ABCD with 2 replicates
Then, we do a full replicate-resolution check of the current scenario, visualising the effect of increasing the number of replicates. This corresponds to the following five visualisations:
  • visualisation output for scenario ABCDE with 2 replicates

  • visualisation output for scenario ABCDE with 4 replicates

  • visualisation output for scenario ABCDE with 8 replicates

  • visualisation output for scenario ABCDE with 16 replicates

  • visualisation output for scenario ABCDE with 32 replicates
It should be fairly clear from these outputs that everything sort of matches, but that there are limits to the ability of increasing the replicate resolution to improve the accuracy of the match. In some cases, the replicates tend to diverge over time, drowning out our ability to distinguish between deterministic scenarios which may end up converging to the same attractor. So things are sort of okay for now, but as we add more deterministic scenarios, the visualisation test will become less and less useful, and we will start having to rely more on the statistical tests.

Iteration 3.2 (Week starting 2011/07/10; velocity = 1 point/iteration)

I'm hoping that the various teething troubles i experienced with the contact events will not be a regular feature of all the coming validation steps. So i still have some confidence (although a bit less than last week) that the velocity of 2 points per iteration remains achievable. At that velocity, this week's ETA of the validation is at the end of iteration 3.11. I've slipped a week...
  • STORY(1 point): Validate scenario ABCDE, a het model with het behavioural response.
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario ABCD, a model of heterosexual consensual sex HIV transmission ("het model").
Done. I earn 1 point. The visualisation output for scenario ABCD shows that the microsimulation fairly faithfully follows the deterministic predictions for the correct scenario. The output has changed from the previous visualisation in that what previously occupied 1 row now occupies 2 rows, and that another initial step has been added (now in the first 2 rows) showing the time evolution of the derivatives of the deterministic predictions. A number of further problems had to be overcome to make this validation work: (1) A bug which manifested as wildly erratic overshooting behaviour of the microsimulation model in its attempt to track the deterministic version of ABCD was isolated. It turned out to be a glitch in Value::interpolateGet() which caused the contact rate sometimes to be NaN, and the microsimulation to follow the prediction of ABC instead of ABCD in such circumstances. (2) Once that was fixed, the microsimulation behaved in a more civilised manner, but still did something different from deterministic ABCD. This turned out to be a bug in Ticker::timeSCE(), which did the conversion from an individual level contact rate (which the .dat file supplies) to a population level one (which a universal Ticker-based event needs) by incorrectly scaling by the size of the whole population instead of the population of the current initiating group. (3) Once that was fixed, the microsimulation seemed to be tracking the behaviour of deterministic ABCD, except using twice the contact rate. This turned out to be a difference in interpretation of contact rates between the deterministic model (which views them as the number of contacts per unit time that an individual is involved in) and the microsimulation model (which views them as the number of contacts per unit time that an individual initiates). This is resolved by noting that every contact has one initiator and one recipient, and, since we're doing random mixing without consideration of dominance/submission of actors, every actor is assumed to initiate half of the contacts it is involved with. So a factor of 0.5 was incorporated into timeSCE() to account for this.

Iteration 3.1 (Week starting 2011/07/03; velocity = 0 points/iteration)

I still think a velocity of 2 points per iteration is achievable. That pace (together with the addition of a 2 point STORY to move from synthetic to realistic scenarios) puts this week's ETA of the validation at the end of iteration 3.10. We need to keep an eye on how this estimate evolves over time, while working on drafting the paper "on the side".
  • STORY(1 point): Validate scenario ABCDE, a het model with het behavioural response.
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario ABCD, a model of heterosexual consensual sex HIV transmission ("het model").
Not completed. I earn 0 points. Trying to validate this scenario has thrown me a series of curve balls, chiefly because ABCD is the first exposure of the validation to symmetric contact events. The first curve ball was that the deterministic predictions for scenarios ABC and ABCD were extremely close to each other, so it was practically impossible to resolve which of them the microsimulation was tracking. I've managed to resolve this problem by taking much more extensive advantage of the fact that these validation scenarios are synthetic, so i am at liberty to choose parameter values for my convenience in completing the validation. What i've done is to set up all the scenarios to ensure that the initial derivatives of each of them are widely separated. (I've also added plots of the evolution of the derivatives over time to the GNUplot output, so this separation can be seen.) This is obviously not a guarantee that the evolved curves will end up being separated from one another over the entire simulation, but it has so far had reasonable success (certainly more than tweaking parameter values at random would have had). The calculation of appropriate synthetic parameter values is presented in detail in scenarios/Organisation.txt. The second curve ball was that symmetric contact events were initially taking about 500ms per call to each of the event time and event implementation functions. This led to these calls alone typically consuming about 90% of the entire simulation time, and making it very difficult to do runs with enough symmetric contact events to see interesting behaviour. It took me a while to recognise, but this overhead was actually caused by my inadvertently doing a lot of unnecessary busy work in Value::interpolate(), calling Index::decode() and Index::encode() 3072 times each per Value::interpolate() call, when it was not in fact necessary to call them at all. This simple optimisation led to about one order of magnitude improvement in simulation run time... However, the system managed to throw me a third curve ball: the symmetric contact event time and implementation functions both seem to have a memory leak, leading to the simulation failing to complete with an out-of-memory error if i try to get too ambitious with the number of such events in a simulation. I have not yet had a chance to resolve this problem.

Release 2 (2011/06/30)

IMPORTANT: I'm caught in the Iron Triangle: "Quality, Scope, Time, pick any two." I'm not sacrificing Quality, because there's no point in accumulating technical debt by producing unmaintainable and/or junk code, because that just sends my productivity to zero over time. I've specified a Scope up front with the validation sequence, and although the content of that sequence is open to negotiation, it hasn't yet been negotiated. Therefore, i have no absolute control over Time and am only able to pay lip-service to release dates. For Release 2 that isn't a problem. For Release 3 that probably also won't be a problem. But for Release 4, there is a genuine external deadline of the submission of a paper. We need to complete the validation sequence and produce the paper using results from the resulting system by the time the paper needs to be submitted. This requires that we keep an eagle eye on Time-creep, and regularly renegotiate Scope as required to ensure that we meet the deadline.

Iteration 2.13 (Week starting 2011/06/26; velocity = 0 points/iteration)

Completing what i did last iteration has put me in a much better position to make more consistent progress with the validation stories. The pace of 2 points/iteration at last looks genuinely achievable. Let's see how it goes.
  • STORY(1 point): Validate scenario ABCDE, a het model with het behavioural response.
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario ABCD, a model of heterosexual consensual sex HIV transmission ("het model").
Not completed. I earn 0 points. I managed to get the system into a position where it is convenient to explore the impact of attribute D in scenario ABCD. This entailed reorganising the input scenario .dat file pieces, adding quite significantly to the deterministic model and reorganising the microsimulation event generation system to increase the flexibility with which event family generation can be specified. I did runs of scenario ABCD before and after the generation of the required symmetric contact events, but didn't have a chance to see anything more than that each symmetric contacts event imposes a non-trivial performance penalty. I don't know if the implementation is correct because i haven't looked yet. I'm still optimistic about the ability of the work to progress smoothly, but i was quite unenergised this iteration and so didn't make the progress i should have been able to make.

Iteration 2.12 (Week starting 2011/06/19; velocity = 1 point/iteration)

I'm going to continue with what i started last week, refining GNUplotting and getting the model to run faster. Hopefully these two actions will make it more feasible to make progress with the validation process at the desired pace. I'm revising the desired pace down to 2 points/iteration, since a lot of experience is telling me this will be more realistic even to try to achieve than 3 points/iteration. This has me overrunning the release date by 8 weeks. Since we are using releases just as an opportunity to revisit the plan and think about where to go next, i think we should include in that discussion where we see this validation process going and how it will fit in with the coming need to produce a paper by the end of the year...
  • STORY(1 point): Validate scenario ABCD, a model of heterosexual consensual sex HIV transmission ("het model").
Just started. I earn 0 points.
  • STORY(1 point): Validate scenario ABC, a birth-death model with HIV presence and mortality. This is the base of subsystem testing scenarios.
Done. I earn 1 point. The visualisation output for scenario ABC shows all the Gender (0=female,1=male), Riskgroup (0=I,1=P,2=M,3=S,4=C,5=G), Hivstatus (0=-ve, 1=+ve) possibilities across the columns, and the sequence of steps taken in comparing the models down the rows. The first row shows the 32 microsimulation replicates superimposed on the three deterministic scenarios (A,AB,ABC). The second row summarises the replicates into their mean +/- 1 standard deviation bands. The third row converts each of the deterministic model outputs into their (signed) t-statistics with respect to these deviation bands. The fourth row converts these t-statistics into the raw p-values using Student's T-distribution, superimposing the chosen significance level. The fifth row shows the Bonferroni correction of these p-values, again superimposing the chosen significance level. The intuitive interpretations are as follows: For the first two rows, the replicates/deviation bands should most closely track deterministic model ABC (labeled DGRH[2 g r h]PM for the values of g,r,h concerned), and this does seem to be the case. In the HIV negative cases, deterministic model AB (labelled DGRH[1 g r h]PM) is equivalent to deterministic model ABC, and is hidden underneath it. For the third row, deterministic model ABC should have the t-statistics with the smallest absolute values. This also seems to be the case. For the fourth and fifth rows, we expect the p-values to be within the chosen significance threshold for deterministic model ABC (indicating no significant difference) and outside the threshold for deterministic models A and AB. This seems to be quite close to true for the unadjusted p-values, but less convincingly so once the Bonferroni correction is taken into account. Part of the problem seems to be that all scenarios start off with exactly the same initial population sizes, so it is hard to distinguish them until they have diverged sufficiently from one another. I have not yet attempted to implement False Discovery Rate, but i think relying on the first two rows of the visualisation will be more productive for now in uncovering problems. In fact, i iterated over this process about half a dozen times, finding various anomalies in the code using the first two rows in order to get to the point i'm presenting.

Iteration 2.11 (Week starting 2011/06/12; velocity = 0 points/iteration)

In light of what happened last iteration, i think i need to invest some time both in producing a GNUplot visualisation of the comparison situation and in improving the performance of the model runs themselves. I think i can realistically complete this and attempt another validation of A, AB, ABC in this iteration, hopefully this time around successfully distinguishing between AB and ABC. This puts me 4 weeks behind schedule, provided that i can subsequently ramp up to the 3 points/iteration desired pace. If delivery of the complete validation is required by the end of June, then we need to reconsider what its scope should be...
  • STORY(1 point): Validate scenario ABC, a birth-death model with HIV presence and mortality. This is the base of subsystem testing scenarios.
Not completed. I earn 0 points. I had some teething troubles with setting up the suitable GNUplot outputs, and did not have chance to iron them all out before i would have had to rerun the validation sequence up to ABC at sufficient resolution to be able to visualise the outcome clearly. I also made no progress on streamlining the performance of model runs (by temporarily commenting out those parts of the system that are not yet needed for the current validation subsequence).

Iteration 2.10 (Week starting 2011/06/05; velocity = 0 points/iteration)

Things are not looking good for being ready by the original release date. The current projection (still at the desired but as yet unrealised pace of 3 story points per iteration) now has me coming in three weeks late. Nothing to do but keep plugging away at it. I can't even predict when the outcome of a particular validation attempt may next throw me an implementation curve ball like it did last week...
  • STORY(1 point): Validate scenario ABCDE, a het model with het behavioural response.
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario ABCD, a model of heterosexual consensual sex HIV transmission ("het model").
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario ABC, a birth-death model with HIV presence and mortality. This is the base of subsystem testing scenarios.
I finished creating the ability to run multiple deterministic models (currently to meet the various requirements of scenarios A, AB, ABC) and compare them with the replicates of the chosen scenario of the microsimulation model. I then performed a validation run of scenarios A, AB, ABC, using 10 replicates and 2 threads, hoping that the extra replicates would yield a better chance of statistically resolving between AB and ABC, and hoping that 2 threads would make better use of the 2 CPU cores on my laptop. Performance first: Unfortunately 2 threads seems counterintuitively to be slower than 1 thread. The validation run took 10.75 hours to complete the three scenario sequence described. I suspect that 2 threads are slower than 1 because each replicate uses enough RAM that having two of them on the go at once is more likely to intrude into swap space, inducing a significant performance penalty. Also, not yet having a second computer on which to run these long demanding runs makes it impossible for me to continue to work during such a run, because TDD requires regular use of the compiler and simultaneous use of the compiler and running of models would take the machine dangerously close to overheating, which would require the run to be restarted if it occurred. This means that i have to divert some of my attention to improving the abysmal performance of the microsimulation, so that these runs can be more conveniently interleaved with my work... Not an ideal situation. The situation would be better if i had a second machine on which to perform runs, but i'm waiting for the Stellenbosch University IT department on that point. And correctness? I hoped that 10 replicates for each scenario would provide some more resolving power for the statistical comparison tests. Each deterministic model in A, AB, ABC was compared with each microsimulation model in A, AB, ABC for each replicate use case from 2, 3, up to all 10 replicates. The ideal is that the closest matches should occur when the deterministic and microsimulation models are working on the same scenario, and that these matches would probably come into better view as more replicates were included in the comparisons. Here are the summarised results: MIC denotes the microsimulation scenario DET denotes the deterministic scenario BCMP(N) denotes the Bonferroni-Corrected minimum P value when N replicates were used in the comparison
MICDETBCMP(2) BCMP(3) BCMP(4) BCMP(5) BCMP(6) BCMP(7) BCMP(8) BCMP(9) BCMP(10)
A A 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00
A AB 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00
A ABC 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00 1.#QNAN000e+00
AB A 3.18045025e-02 2.29029788e-03 1.56838719e-04 1.13043040e-05 1.49886719e-05 2.91932548e-06 1.44237602e-06 3.31814369e-07 6.56225168e-08
AB AB 1.04680098e-09 1.11022302e-16 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
AB ABC 1.04680098e-09 1.11022302e-16 0.00000000e+00-2.22044605e-16 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
ABCA 2.08653050e-02 9.33679776e-04 7.74271172e-05 6.18296617e-06 1.49886719e-05 2.91932548e-06 1.44237602e-06 3.31814369e-07 6.56225168e-08
ABCAB 1.04680098e-09 1.11022302e-16 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
ABCABC 1.04680098e-09 1.11022302e-16 0.00000000e+00 0.00000000e+00 0.00000000e+00-2.22044605e-16 0.00000000e+00 0.00000000e+00 0.00000000e+00 All p-values are NaN for microsimulation scenario A because all replicates are identical, meaning that the population standard deviation is zero and the t-statistic is NaN. So that is as expected. The following pattern occurs for both microsimulation scenarios AB and ABC: as the number of replicates increases, the difference from deterministic scenario A becomes more significant, as we might expect. But the differences from deterministic scenarios AB and ABC become more significant more quickly than that from deterministic scenario A. So this is telling us (i) that there seem to be serious mismatches between deterministic and microsimulation models for AB and for ABC, and (iii), that there is either no discernible difference between the microsimulation scenarios AB and ABC or between the deterministic scenarios AB and ABC, or both. This means i earn 0 points because i failed to distinguish scenarios AB and ABC. But why? Unfortunately, although the log file contains all the information needed to hunt down the answer(s) to that, it is exceedingly difficult to visualise what's going on, to the point where i'm seriously considering investing time in implementing a GNUplot visualisation of the situation. GNUplot would be better than "just plotting graphs in Excel" because the situation is sufficiently complicated that a lot of tedious manual organisation would be required to do so every time, and this can quite straightforwardly be automated using GNUplot. GNUplot also has more flexibility than Excel charts (multiplots and colour choices particularly come to mind), making it possible to produce graphs of the situation that are easier to understand by inspection. The guru has to be invoked to check the output, and we want to make life as easy as possible for him...

Iteration 2.9 (Week starting 2011/05/29; velocity = 0 points/iteration)

Having adhered to strict TDD with my approximately 2 minute cycle time seems to be a bit tedious at times, but i still think it is better than the alternative of trying to take steps which are too big and force assumptions to be made which can later come back to bite. The going is slow but steady with TDD, so i'm going to keep adhering strictly to it. The projection has me coming in two weeks late, but hopefully the focus will now be on the details of the model without any further significant background tasks to delay me.
  • STORY(1 point): Validate scenario ABCDE, a het model with het behavioural response.
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario ABCD, a model of heterosexual consensual sex HIV transmission ("het model").
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario ABC, a birth-death model with HIV presence and mortality. This is the base of subsystem testing scenarios.
Not finished. I earn 0 points. I quite quickly set up the scenario and deterministitic model requirements for running scenario ABC. But when i did so, i discovered the system could not find significant differences between scenarios AB and ABC. Mait and i had a discussion about this: he suggested i implement the False Discovery Rate alternative to Bonferroni correction, and he agreed to my suggestion to implement the ability to compare the microsimulation outputs for a given model to the deterministic outputs for all models in the validation sequence in order to discover such failures to detect difference more automatically. I subsequently spent the rest of the week working on implementing the latter, but did not finish in time to do a meaningful rerun of the scenario A, AB, ABC subsequence.

Iteration 2.8 (Week starting 2011/05/22; velocity = 1 point/iteration)

This is a repeat of the last iteration's stories. The current projection has me coming in one week late again. From now on, i will make an effort to adhere to strict TDD and see where this takes us...
  • STORY(1 point): Validate scenario ABCD, a model of heterosexual consensual sex HIV transmission ("het model").
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario ABC, a birth-death model with HIV presence and mortality. This is the base of subsystem testing scenarios.
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario AB, a simple birth-death model.
Done. I earn 1 point. I adhered to strict TDD, and spent most of the iteration finalising the automatic mechanism for comparing the two models' outputs. However, i did have time to construct the scenario setup and the deterministic model for scenario AB. I first tried running the scenario before changing the deterministic model from the constant one which worked for scenario A, in order to see whether the automatic model comparison mechanism would notice the problem. It did. I then thought i had applied the required correction to the deterministic model and reran it. But the mechanism noticed another problem. I then thought a bit more carefully about how i had been adding to the deterministic model definition and was able to correct the second error it was pointing out. I then ran the model a third time and it was satisfied that any differences between the two models were no longer significant (at hard-coded level 0.05). So the comparison mechanism seems to be working adequately. I was doing runs with 4 replicates over 3 years and it took about 450MB and 30 minutes per run. I'm not yet at the stage where it will be necessary to get a steady state for each scenario before moving onto the next, because it should be possible for a while to get scenarios to pass simply by adding pieces to the deterministic model, and checking that they both evolve the same way from the same initial conditions. Hopefully i will have the second machine available soon so i can start running longer and more thorough tests; until then i need to interleave development with test runs because my laptop can't handle both at once. This will gradually become a more significant burden until the second machine arrives...

Iteration 2.7 (Week starting 2011/05/15; velocity = 0 points/iteration)

Having now completed the bulk of the behind-the-scenes stuff (although some bugs, misconceptions, etc may still need to be ironed out), i anticipate that most of the work will now be focused on actually inspecting the behaviours of the two models under various conditions and deciding whether they match and what to do about it whenever we decide they don't. As a result, i'm hoping i will be able to work through 3 scenarios every iteration, and doing so will have me coming in just on time for the release. Exactly how thoroughly i will be able to test these scenarios now depends on how quickly a second machine (more capable than my laptop) becomes available for me to devote to this task; i am in the process of trying to organise this.
  • STORY(1 point): Validate scenario ABCD, a model of heterosexual consensual sex HIV transmission ("het model").
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario ABC, a birth-death model with HIV presence and mortality. This is the base of subsystem testing scenarios.
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario AB, a simple birth-death model.
Not completed. I earn 0 points. I've been having trouble converting the system from one where a scenario is run and one must visually inspect the outputs afterwards to try to decide how the two models compared into one where the system itself has a mechanism for reliably making that decision automatically. I initially tried to write this mechanism without following strict TDD (because, after all, how hard could it possibly be?!?), but found myself about half-way through the iteration getting so frustrated with being unable to debug the resulting code that i decided to start it over from scratch using strict TDD. I then made some progress towards developing it correctly, but didn't manage to finish in time to run the required scenarios using it. Maybe if i had faced the supposed tedium of developing it using strict TDD (with a too-long cycle time of 2 minutes) from the outset i could have finished in time and have something to show for this iteration. I didn't think it worthwhile to run the iteration's scenarios without this mechanism in place; doing so would just have distracted me from making progress with implementing the mechanism because (i) i can't run scenarios and develop on my laptop simultaneously for fear of overheating it, and (ii) having to interpret the results manually afterwards would have distracted me further...

Iteration 2.6 (Week starting 2011/05/08; velocity = 1 point/iteration)

I haven't yet completed the behind-the-scenes stuff. I have an initial deterministic model ready to use, and i have figured out most of the details of how to do the multi-point t-test that will be required to compare the outputs of the deterministic model and microsimulation replicates. But i have yet to fix all the wiring that will get the results of the two types of model into in a suitable form to carry out the comparison. I will stop trying to second-guess the completion-time prediction that the estimates make; they will just have to speak for themselves over time: they currently have me coming in two weeks late for this release, but i'm still fairly optimistic that at least the first three validation stories will be easy to deliver on once i've finished setting up the comparison mechanism.
  • STORY(1 point): Validate scenario ABC, a birth-death model with HIV presence and mortality. This is the base of subsystem testing scenarios.
Not touched. I earn 0 points.
  • STORY(1 point): Validate scenario AB, a simple birth-death model.
Not completed. I earn 0 points.
  • STORY(1 point): Validate scenario A, an arbitrary fixed starting population distribution over all risk-groups, genders, HIV statuses, subject to no events whatsoever.
Done. I earn 1 point. By "done", i mean that scenario A is validatable, and i am very confident that the test would pass if it was to be run at full scale (whatever we decide that should mean). I have created the ability to (i) run the microsimulation and deterministic models for the same scenario, (ii) get the results of all microsimulation replicates into a combined TimeSeries structure where i can extract the appropriate sample means and standard deviations across replicates, (iii) use those and the deterministic population means to construct t-statistics and (iv) use a general t-table function to turn those into p-values. Since scenario A is a special case where there is no variability, the microsimulation replicates all have the same values, so their standard deviations are all zero, so the t-statistics and hence p-values are all undefined. So i can't see yet how to do the Bonferroni correction. I did a run of scenario A with 4 replicates over 4 years and a population of size 168, and it took about 45 minutes and about 400MB to establish that the two models are exactly equivalent in this case. I repeatedly compute the sample statistics for 2, 3, ... up to all of the replicates, so that we can in principle efficiently get some feel of the impact of adding replicates to the degree of confidence we would have in the matching of model outputs, but we won't be able to draw any conclusions from that until we run the first scenario which actually has some variability (scenario AB). I had to run the aforementioned version of scenario A several times, ironing out various bugs i encountered along the way. One interesting one was discovering artifacts of using certain values for PopulationScalingFactor: if a value of PopulationScalingFactor is used which would yield a non-integral scaled value for any of the component starting population sizes, then the mechanism which creates the starting population before a replicate runs rounds the number of actors created for the subpopulation down to the greatest lower bound whole number of actors. This would sometimes cause a large relative mismatch between the initial conditions for the two models, so i have added a test which checks for this condition and warns the user when it is about to occur, but lets the run continue anyway.

Iteration 2.5 (Week starting 2011/05/01; velocity = 0 points/iteration)

I'm somewhat refreshed after my time off, so hopefully this will be the iteration where i complete the behind-the-scenes stuff and start delivering on the validation STORYs. I'm keeping the schedule and estimates as they are for now, since i haven't yet seen what's involved in delivering a validation STORY. The estimates currently have me coming in a week late for this release; hopefully i'll be able to rectify that situation in time.
  • STORY(1 point): Validate scenario ABC, a birth-death model with HIV presence and mortality. This is the base of subsystem testing scenarios.
Not done. I earn 0 points.
  • STORY(1 point): Validate scenario AB, a simple birth-death model.
Not done. I earn 0 points.
  • STORY(1 point): Validate scenario A, an arbitrary fixed starting population distribution over all risk-groups, genders, HIV statuses, subject to no events whatsoever.
Not done. I earn 0 points.

Iteration 2.4 (Week starting 2011/04/24; velocity = 0 points/iteration)

I'm taking time off from Friday 2011/04/22 to Saturday 2011/04/30 inclusive, so this iteration won't happen.

Iteration 2.3 (Week starting 2011/04/17; velocity = 0 points/iteration)

I didn't have a good iteration here, partly because the time was cut short and i was severely underperforming due to not having had time off in 16 months, and partly because i was still busy doing stuff behind the scenes in preparation for doing these stories. What i did accomplish was to adapt an existing Runge-Kutta engine to use TimeSeries and Values; but i have not yet set up the actual deterministic model or the statistical test for comparing microsimulation and deterministic model outputs.
  • STORY(1 point): Validate scenario AB, a simple birth-death model.
Not done. I earn 0 points.
  • STORY(1 point): Validate scenario A, an arbitrary fixed starting population distribution over all risk-groups, genders, HIV statuses, subject to no events whatsoever.
Not done. I earn 0 points.

Iteration 2.2 (Week starting 2011/04/10; velocity = 0 points/iteration)

This story turned out to be bigger than i was able to manage in one iteration. If i want to stabilise my velocity (which would be a good thing for estimating purposes), i would need to figure out how to split stories into smaller pieces, while preserving their characteristic of providing Customer value. However, i quite like the way the stories are set out for this release and am not sure how they could be broken up into smaller pieces. I anticipate that the subsequent stories will be rather shorter once a minimalist validation infrastructure is set up (i'm guessing for now 1 point each). So i'm just carrying an estimate-reduced copy of this story over to the next iteration, and hoping that my velocity will stabilise (i'm guessing at 2.0 points/iteration) over the coming iterations because the subsequent stories are smaller.
  • STORY(2 points): Validate scenario A, an arbitrary fixed starting population distribution over all risk-groups, genders, HIV statuses, subject to no events whatsoever. This STORY is really about setting up a minimalist validation infrastructure.
Not finished. I have made progress with modifying the build superstructure and the organisation of the include file components in the scenarios folder, but i have not yet added the scenarioA deterministic model implementation (or its underlying RungeKutta engine) or the statistical model comparison test to the code. I earn 0 points.

Iteration 2.1 (Week starting 2011/04/03; velocity = 2.5 points/iteration)

I completed what i set out to do for a change. (This should be the norm, by the way...) I will be using the rest of this iteration for slack: trying to sort out performance problems, dealing with technical debt, trying to get a handle on what next week's STORY will involve, etc.
  • STORY(1 point): Write RiskGroupsArchitecture, a wiki page which gives somebody studying the source code help in understanding it, without being so specifically detailed that it requires extensive maintenance as the code base changes.
Done. I've uploaded a draft to the wiki. It is supposed to be the start of an interactive discussion about the architecture, including reasons for its current nature and possible directions for its future evolution. I earn 1 point.
  • STORY(0.5 points): Get the code into a condition where it can be committed as a revision worthy of the release 1 feature statement.
Done. Committed a codebase with severe performance problems, but a suitable point from which to start validating. I earn 0.5 points.
  • STORY(1 point): Create an agile plan for release 2.
Done. I earn 1 point.

Release 1 (2011/03/31)

FEATURE: RiskGroups is a working microsimulation model which theoretically meets the description of the Estonian deterministic model, as the latter currently stands. It has not been tested and has performance issues which hinder testing.

Iteration 1.13 (Week starting 2011/03/27; velocity = 0.5 points/iteration)

Crunch time. The first STORY i scheduled last week proved to be considerably more complex than i had expected it to be, involving fairly significant code rewriting to work around an apparent Modgen limitation (which i am liasing with the Modgen developer to pin down). I would still like to try to complete the remaining STORYs for the FEATURE as it currently stands, but there's quite a lot still to do to achieve this. I'm trying my best...
  • STORY(2 points): Use Mait's R code to implement a parallel deterministic model in C++. The most straightforward way of doing this seems to be to implement the deterministic model inside the RiskGroups Modgen code, so that it has easy access to all the same scenario parameters that the stochastic model uses. Then a parameter can be added to scenarios which dictates whether or not the deterministic model should be run.
Not touched. I don't earn any points.
  • STORY(0.5 points): Instrument Stream with absolute and delta timestamps to start getting a handle on any performance bottlenecks.
Done. I earn 0.5 points.
  • STORY(0.5 points): Produce log dumps of Map (post-setup and post-simulation) and StatusToPrevalenceConverter, so that their outputs can be compared.
Incomplete, so i don't earn any points.
  • STORY(1 point): Liaise with Statistics Canada to isolate the true cause of the apparent limitation that Modgen has in only allowing 64 event types per actor type, and finish restructuring the event family setup around the relevant true limitations of Modgen.
Confusing, because i saw the apparent limitation, but was unable to isolate what i was convinced was the cause in a minimal test model. I have been changing the main model's code and have not seen the problem i thought i saw again, although it may still be lurking somewhere. Since this STORY is not complete, i don't earn any points for it.

Iteration 1.12 (Week starting 2011/03/20; velocity = 1.0 points/iteration)

i'm putting both stories in here because i can see i'm going to need to try at least to get started on the deterministic model implementation in this iteration, because the next iteration is only half a week long.
  • STORY(2 points): Use Mait's R code to implement a parallel deterministic model in C++. The most straightforward way of doing this seems to be to implement the deterministic model inside the RiskGroups Modgen code, so that it has easy access to all the same scenario parameters that the stochastic model uses. Then a parameter can be added to scenarios which dictates whether or not the deterministic model should be run.
Not done.
  • STORY(2 points): Resolve the potential issues standing in the way of the microsimulation model running to completion and producing usable results. These issues may include: (i) when running DirectedPopulationContactRateTest, why are all the input rates NaN? (ii) why does the simulation only seem to produce results for years 0 and 1 instead of the full specified range 0..20? (iii) how do the contents of Map at the end of the simulation relate to the outputs produced post-simulation by StatusToPrevalenceConverter? (iv) how can the outputs of the microsimulation model (for now, and later the deterministic model) best be produced to be usable for comparison purposes? (The choice is between writing everything to tables and outputting everything in a textual dump of Map into the log file.) (v) try to set up some sort of profiling to see if there are any obvious performance bottlenecks that can easily be resolved.
(i) The problem turned out to have been a subtle error in the way the interpolation of TimeSeries works that was masked in the way i had been testing it. Essentially, when the requested point in time is outside the range of defined values (as it was because i had not yet implemented the Time::toCalendarYear() method properly), it would assume that the requested time point had an existing Value attached to it, and would then interpolate between the actually non-existent Value at this time point (retrieved as a Value containing only NaNs) and the extreme existing Value it was closest to. The correction was simply to use the extreme Value that it was closest to instead of trying to interpolate when not between two genuinely existing Values. This error had some quite far-reaching ramifications, including causing some events to fail because NaN rates were retrieved. We now explicitly test in all event time methods that the rate returned is not NaN. This problem has been resolved. (ii) The problem seems to be that Modgen internals somehow impose a limit of 64 different events per actor type: When the Ticker's IntroductionEvent family contained 64 members, there was evidence that all of its 64 members were being applied, but the TickEvent wasn't. But when the IntroductionEvent family was reduced in size, TickEvent was suddenly being applied. This limitation seems to require me to split the event families over collections of universal actors with names encoding the risk group indices involved in the families. This problem is in the process of being resolved. (iii) Not done. (iv) Not done. (v) Not done yet, but i have investigated that VS 2008 Professional doesn't come with a profiler (that requires VS 2008 Team Suite), so i can't do statistical or instrumented profiling. However, there was a technique i used in DEBI (associating all log output with absolute and delta timestamps) that seems to be fairly straightforward to implement (in Stream) and should allow some performance problems to be uncovered by interactively manually instrumenting the code to trace where it spends the bulk of its time during shortish toy simulation runs. Not ideal, but i may have to pursue this tactic a bit if the performance of simulations becomes unbearably slow.

Iteration 1.11 (Week starting 2011/03/13; velocity = 2.0 points/iteration)


  • STORY(2 points): Modify the implementation to match the new ModelTransformation document. This mostly requires ensuring (i) that the user-specifiable parameters are correct, (ii) that MapEst maps them correctly into the internal general model, and (iii) that the PrevalenceResponse class and behaviour groups are appropriately used to modify symmetric and directed contact rates in the internal model. (Part (ii) may be sufficient to deal in passing with whatever i meant by the old story "Incorporate supplied prophylactic factors into SymmetricTransmissionProbability, ReceiverInfectingTransmissionProbability, SenderInfectingTransmissionProbability at beginning of simulation.")
i have tried to follow the spirit of this story, but in the process i have encountered some issues with actually running the microsimulation model to completion and having it produce usable outputs. i am regarding this story as done, but am spawning another one to deal with these issues. concerning the letter of this story: (i) the user specifiable parameters are all present and in the correct forms, but their values are not yet correct, and we may need to resolve some misunderstandings about what these values mean before this can be fixed. i think such discussions should be deferred to the validation process in the next release. (ii) i'm fairly confident that MapEst is set up correctly, except for the commercial sex directed contact rates: i have the three possibilities for specification of contact rates, and i'm not sure i'm using them all correctly. (iii) has been done.
  • STORY(0 points): Finish setting up the PrevalenceResponse usage. This should now be very quick: I just have to deal with some compiler errors and actually call the PrevalenceResponse::getBehaviouralResponse() method in the appropriate ways in the appropriate places in the DirectedContacts and SymmetricContacts modules.
done.

Iteration 1.10 (Week starting 2011/03/06; velocity = 2.0 points/iteration)


  • STORY(1.5 points): Do performance testing to investigate why Map*::setupDirectedContacts() takes so long. It may be useful to reimplement Value to use a heap-allocated double array (perhaps flat with a suitable one-to-one Index to index correspondence) rather than the map it currently crawls along with when creating Value arrays with many entries. Use the array implementation if it proves significantly faster.
done.
  • STORY(0.5 point): Finish ensuring that PrevalenceResponse is correctly used in the rest of the program.
almost done. a zero point STORY has been put into the next iteration to tie up a few loose ends.

Iteration 1.9 (Week starting 2011/02/27; velocity = 0.5 points/iteration)

although the last iteration demonstrated that my velocity is quite resilient to wishful thinking about the pace i can achieve, i do still need to keep aiming to improve it. i'm going for 3 points this time.
  • STORY(2 points): Modify the implementation to match the new ModelTransformation document. This mostly requires ensuring (i) that the user-specifiable parameters are correct, (ii) that MapEst maps them correctly into the internal general model, and (iii) that the PrevalenceResponse class and behaviour groups are appropriately used to modify symmetric and directed contact rates in the internal model. (Part (ii) may be sufficient to deal in passing with whatever i meant by the old story "Incorporate supplied prophylactic factors into SymmetricTransmissionProbability, ReceiverInfectingTransmissionProbability, SenderInfectingTransmissionProbability at beginning of simulation.")
still not started...
  • STORY(1 point): Finish implementing PrevalenceResponse and use it in the rest of the program. To finish the implementation (i) add the update mechanism with user-specifiable frequency, and (ii) find a computationally efficient way to return the behaviour-dependent response (i.e., one that doesn't put the expensive exponentiation call inside some very tight inner loop).
not finished yet. i have added the update event. i have attempted to set up a test of the computational efficiency of the behaviour-dependent response, and have as a preliminary test compared the efficiency of the simple method with one that treats the null behaviour specially. i'm not sure how to interpret the statistical significance of the results, so i figured i should put that task on the backburner rather than trying hard to implement some result caching mechanism without having any way of telling whether my efforts are bearing any meaningful fruit. instead i figured i should get on with using the PrevalenceResponse class in the rest of the program. at this point i saw a big refactoring opportunity in the Map* modules, and essentially spent the rest of the iteration turning these modules into a class hierarchy so that code common among them could be pulled up into their common base class (in Map.mpp). having this class hierarchy also allows the Map mechanism to be unit tested: at this point i discovered a major performance problem in setting up the MapId subclass (which seemed the natural choice to use for testing purposes). then my time for the iteration ran out. (note that i only commit to the repository once i have finished a STORY so that the respository never contains code that fails to compile or pass tests. so, unfortunately, you can't see what i'm talking about yet. is that a bad thing?)

Iteration 1.8 (Week starting 2011/02/20; velocity = 1.0 points/iteration)

i'm setting an ambitious target for this iteration (4 points when my velocity shows i manage about 1.5/week), but the release date looms and i need to try to jack up the pace...
  • STORY(2 points): Modify the implementation to match the new ModelTransformation document. This mostly requires ensuring (i) that the user-specifiable parameters are correct, (ii) that MapEst maps them correctly into the internal general model, and (iii) that the PrevalenceResponse class and behaviour groups are appropriately used to modify symmetric and directed contact rates in the internal model. (Part (ii) may be sufficient to deal in passing with whatever i meant by the old story "Incorporate supplied prophylactic factors into SymmetricTransmissionProbability, ReceiverInfectingTransmissionProbability, SenderInfectingTransmissionProbability at beginning of simulation.")
not started.
  • STORY(2 points): Use the new TimeSeries to implement the weighted-sum-of-prevalences PrevalenceResponse class. This should be relatively straightforward once TimeSeries is functioning properly. A GRH TimeSeries will represent the population states by gender, risk-group and HIV status over time. As new data comes in (at a user-specifiable frequency), this population state TimeSeries will be updated. As behavioural response requests are made, maybe separate TimeSeries with appropriate signatures will be computed to track the various prevalences over time and possibly a behaviour group-dependent behavioural response factor over time. Care needs to be taken to ensure that this very intensively used class doesn't cause unnecessary performance bottlenecks...
i made some progress on this. population sizes and prevalences (including various aggregates) are trackable. however, i encountered a bug in TimeSeries which i had trouble isolating. i went on a bit of a tangent improving the flexibility of log outputs, so that i could have better eyes to see the bug. (so there is now a Stream class which mimics the C++ standard ostream class, which for some reason i couldn't get working.) having the flexibility to output data as i wanted while tests are run helped me to isolate the bug (which turned out to have to do with a misinterpretation of how map::insert works in the case of an existing key), and then it was easy to resolve. however, i'm still not finished with the implementation of PrevalenceResponse nor have i used it yet in the rest of the program.
  • STORY(0 points): Correct the few remaining incorrect uses of TimeSeries. This should take no more than an hour or two to finish.
done.

Iteration 1.7 (Week starting 2011/02/13; velocity = 1.5 points/iteration)

i've been sick for part of this iteration, but i think i've made reasonable progress anyway...
  • STORY(2 points): Finish implementing the various classes associated with TimeSeries functionality and replace all current uses of TimeSeriesGRH and TimeSeriesTGRGR with the new TimeSeries.
the TimeSeries implementation is now completely Test Driven Developed, and i'm quite satisfied that all is well with it for the forseeable future. it has undergone several design revisions in the process and it now relies neither on generated code nor any new template classes. (it uses the STL vector and map container template classes, but it doesn't define any new ones, since this could apparently have had a significant impact on build cycle time). i almost completed correcting all uses of TimeSeries throughout RiskGroups, but didn't quite manage it so i don't get the full 2 points...

Iteration 1.6 (Week starting 2011/02/06; velocity = 0.5 points/iteration)

my velocity was lower for this iteration because i've been discovering and attempting to deal with technical debt. perhaps i could have been better focused on "just delivering the story", but since i had made the story up myself (so no Customer expectation was riding on it), i considered this to be more important to get right first. as always, if you feel i'm going off track somehow, let me know by editing the above plan.
  • STORY(1 point): Complete implementation of the weighted-sum-of-prevalences behavioural response mechanism.
not done yet. i tried to figure out how to use TimeSeriesGRH (which was better than the TimeDependentParameter originally developed, but was still not developed using Test Driven Development) and realised i didn't understand how to test it properly to verify that its behaviour was as i expected it to be. this frustrated me so much that i decided to reimplement TimeSeries from the ground up using strict TDD. there is now a single abstract TimeSeries class whose functionality (which is primarily figuring out which stored Values to use to perform an interpolation for a given Time and constructing the appropriate interpolated Value) is independent of the exact details of the signature of Value. Value in turn will develop either into a class hierarchy or into a template class, so that values with different signatures (at least GRH and TGRGR) can be accommodated.
  • STORY(2 points): Modify the implementation to match the new ModelTransformation document.
not done yet

Iteration 1.5 (Week starting 2011/01/30; velocity = 0.5 + 1 = 1.5 points/iteration)

this iteration, i've tried using the Energised Work practice, and it has helped me to put in more hours than the previous iteration, with less strain. that hasn't translated into increased velocity for this iteration, but i do think it has potential to do so, especially if i manage to sort out the other issue i'm having... i'm experiencing a problem with the TDD build cycle for RiskGroups being about an order of magnitude too long to practice TDD in the ideal way (with really tiny test or functionality increments). this means that i have to do significantly more with the result of each compile cycle, which introduces quite significant cognitive overhead and makes it much harder to avoid straying from flow either into boredom or anxiety. unfortunately it seems to be the nature of C++ that compile cycles are quite long, even if the program is made separate compilation friendly.
  • STORY(0.5 points): Finish modifying and updating ModelTransformation.
done. behaviours use a separate index set from risk groups. behavioural response factor is generally defined. needle sharing has been split by gender to conform with consensual sex (so they both consistently inform how symmetric contacts should be defined).
  • STORY(2 points): Implement the weighted-sum-of-prevalences behavioural response mechanism.
this has taken me longer than i expected, because i recognised the opportunity to reuse the functionality in what was TimeDependentParameter (now TimeSeries), and needed to do quite an extensive refactoring to split the class from storing multiple parameters in one instance to storing one parameter per subclass instance with one subclass per parameter signature (TimeSeriesGRH implements GENDER x RISK_GROUP x HIV_STATUS ; TimeSeriesTGRGR implements TRANSMISSION_MODE x GENDER x RISK_GROUP x GENDER x RISK_GROUP). having done this, i'm now in a position to reuse an instance of TimeSeriesGRH to store the population size data, but i have yet to do so.
  • STORY(2 points): Modify the implementation to match the new ModelTransformation document.
not done yet

Iteration 1.4 (Week starting 2011/01/23; velocity = 1.5 points/iteration)


  • STORY(2 points): Modify ModelTransformation LaTeX document to reflect use of separate behaviour index set and separate prophylactic use and effectiveness probabilities for all transmission modes.
this has taken me longer to do than i expected, i suspect mostly because i'm not following the Energised Work practice. i'm almost finished, and i am going to practice Energised Work from now on, in the sense of having a timebox each working day in which i can get work done, and not trying to work outside that time and sacrifice my sleep, health, wellbeing, etc., as i would otherwise be inclined to do.
  • STORY(2 points): Modify the implementation to match the new ModelTransformation document.
not done yet
  • STORY(2 points): Implement the weighted-sum-of-prevalences behavioural response mechanism.
not done yet

Iteration 1.3 (Week starting 2011/01/16)

No stories wrote StringMatrix module in order to remove dependence of the generation of ModelTransformation on SciLab, so that this LaTeX document can become a first class agile object with anyone being able to modify it (should that prove useful).

Iteration 1.2 (Week starting 2011/01/09)


  • STORY: As a customer I would like the prophylactic use probability (double EstC) to depend on risk-gender group and transmission mode.
after much discussion, finally decided that EstC did not need to depend on risk-gender groups and transmission mode, but that a separate behaviour index set should be used to label interaction situations. set up the build script infrastructure to accommodate LaTeX operations in preparation for modifying the ModelTransformation document to reflect this decision.

Iteration 1.1 (Week starting 2011/01/02)


  • STORY: Make scenario files in folder 'scenarios' openable. [Currently general error prompt appears when trying to open a scenario file.]
explained that files in scenario folder are pieces of Modgen scenario files, which the build script uses an include mechanism to put together in various ways to construct conventional Modgen scenario files (which it puts in the output folder).
  • STORY: Make the model to pass current unit tests. / Clean the corresponding portion of code (remove empty tests?). [When running >perl build.pl {::clean} {::cleanoutput} ::modgen ::devenv ::run(FALSE,Low,EST,PopulationContactRate,0.001,0.001), some errors about unit tests are given and simulation is not run.]
explained that what was seen was in fact a use of ::run where the unit tests passed, but the simulation was not run because it was a unit-test-only run needed for a tight test-driven development cycle.
  • STORY: As a customer I would like to see short descriptions about the purpose of the file in the beginning of each MPP source file.
done
«Main Page  • Queries? Email: Roger Mateer  • Last Modified: 2011/08/15  • All rights reserved © SACEMA 2011