Friday, July 30, 2010

Communicating (Agent-Based) Models

So, you've worked hard and finally you have a working ABM, full of features and parameters and you are super-excited and want to show everybody your latest phase space or time-series. You go to a conference or you write a paper, in any case chances are that people will misunderstand your model, or will simply take for granted it's underlying algorithm and there will be very few questions...especially if your audience is not trained....
I think that one of the biggest problem archaeological (and non) ABM must face is when your model reaches it's audience and you have a small window of time (10~30 minutes) and/or space (3000~5000 words) to communicate all the algorithms and submodels you've used.  Now even in an ideal (?) word where everybody understands Java, C++ or NetLogo (or even R in my case), few people will have the patience and the willingness to go through your raw code and try to understand how really your model works. Most people, will simply look at your conclusions, or read through the old-fashioned text-based way of communicating your model. The problem then is how you evaluate other people's model. Well the short answer at this stage is that you could but you won't. And the risk is that we lose the scientific feedback process, bringing us back to simple story-telling with perhaps some fancy dynamic illustrations....
The thing is that, I'm much convinced that greatest achievement that archaeology can gain from ABM, is not the actual bunch of codes and files, but the formalisation of submodels. We tell stories and we tend often to avoid details in the non-computational modelling process. We delineate the larger trends without tackling the smallest issues. This epistemological laziness  (as one of my supervisors would call) is however strictly prohibited in an ABM. Or to better put, you can still place lazy models, but people will discover this and criticise it..but only if your model, submodels and algorithm are well communicated.
So the communication problem is really a big issue, and the risk is to be trapped with a series of over-complicated hyper-realistic models, with very long codes that nobody will ever read and check...

I'm having a series of nice chats with Yu Fujimoto, a visiting scholar from the Faculty of Culture and Information Science at the Doshisha University in Japan. The discussions are around whether models should be communicated through UMLs (Unified Modeling Language)  or using the ODD (Overview Design Concepts, and Details) Protocols advocated by Volker Grimm or simply by series of pseudo-codes. All modes of communications are around, but not common in archaeology. One reason is that a non-trained archaeologist will struggle to understand pseudo-codes, and will definitely reject UML as something mystic and unquestionably complicated. This leaves ODD, which hopefully will take over. Ideally journals should allow the upload of the source code and also an additional appendix with the ODD description of the code, leaving aims&objectives, brief description, experiments results and discussion as the core elements of the paper. Of course, having said that, the problem of model communication in conferences remains tricky, as going through the ODD will most likely use the entire time-block and you'll hear the 5 minutes bell ringing as soon as you reach the second D....

Monday, July 26, 2010

R Plotting tips

Yes...I haven't been updating the blog for a very long while...But I'm really busy with many things right know.... Our paper for the "Cultural Evolution in Spatially Structured Populations" have been accepted so we are currently working on that, and I have also stuffs for the PhD and a couple of papers I'm working on...busy busy busy.....
I've also started writing a couple of ABM using R, which sounds crazy at first (also at second, and third) but it has some nice things which I'll write about extensively in a future post.
But for know, I just wanted to start a series of very small posts (mainly for archaeologists) of small tips, which are astonishing simple concepts which however takes a couple of hours of googling and forum foraging...
For instance, have you ever tried to plot a time series of BC or BP dates? Suppose you have a sequence of count per century as follows:

data<-c(789,100,923,444,224,192,83,45,32,21,19,22,23,42,120)

plotting this as a timeseries is very simple:

plot(data,type="l")

and then you realise that you want something meaningful on the x-axis and you write the follow

dates<-c(3500,3400,3300,3200,3100,3000,2900,2800,2700,2600,2500,2400,2300,2200,2100)

 perhaps, if you know a bit of R you'll choose the more elegant nested function

dates<-sort(seq(2100,3500,100),decreasing=TRUE)

In any case you'll try to plot this as follow:

plot(x=dates,y=data,type="l")

and you'll find out that R ignored the ordering of the vector dates, and it even reversed your time-series.

My practical solution was to use negative values on the plot, and then delete the "-" with gimp or something (yes I should really be ashamed of myself).
Well for the small portion of people who had the same problem and here's the solution

plot(x=dates,y=data,type="l",xlim=c(max(dates),min(dates)))

Basically you can tell to the plot function that the range of values for the x axis is from the greatest value (the oldest date in our case, thus the largest number) to the smallest value. R will simply then read the values of dates in the correct order and plot the TS in the way it should look like.
Easy.