Wednesday, April 25, 2012

Sharing Data in Archaeology

...I know it has been a long time since my last post and many, many things happened.  Our CAA session on Uncertainty was great (many thanks to the contributors, and actually many thanks to the organisers and  all the staff involved in the conference organisation!), lots of nice paper on a very wide range of topics from data collection to dissemination, going through all stages of analysis and data representation....

One thing that came out from the roundtable discussion and from some of the contributions was very much related to data format, standards and their representation. A big issue in archaeology where you struggle to find datasets and you are often tied to social obligations, nasty replies, and even when you get access to the data you find them useless as the data structure is comprehensible only to its creator, and sometime not even to him/her....

We tend to create data for the purpose to do something with it, and very rarely we care about others, and whether they are able to re-use our data-set and examine if the results can be re-created (see an interesting example of this for ABM here). This is profoundly unscientific, as not allowing the user to recreate the experiment or the analysis is essentially cheating. The problem is that creating datasets that can be used by other people is hard, damn hard. Personally I don't think making dataset re-usable for other people is the most exciting part of our research, but I love when I find data-sets that can be easily re-used and can be understood by anyone.

Then I discovered that my own university started this great project called REWARD and that CAA introduced a recycle award. Both were aimed to encourage researchers to make their dataset available, and to reuse them either to re-assess given knowledge, or to provide new analysis and possibly new interpretation. But how to do this? The currency of the academic word is citation (alas) and so the best way to change the system is to exploit such structure. If people can publish their data as papers, other people using these data have to reference them. That's the price you pay, a citation entry. Data becoming available in these format will need to be re-usable, and by doing this they will acquire visibility. People will start to choose these data, instead of asking dataset to some mean professor replying you with a nasty email. This will also create a feedback mechanism, the more a paper is being referenced the more it will be advertised. So scholars will start to submit their data paper to acquire the same visibility. If you fast-forward this everybody will submit data papers, providing the availability a vast amount of data which were previously hidden inside dusty cupboards. This can change the discipline. If there is a journal doing this....

Ah. I forgot to mention. Turns out that there is a new journal doing this now. The Journal of Open Archaeology Data  is exactly aiming to do what I wrote in the paragraph above:

"The Journal of Open Archaeology Data (JOAD) features peer reviewed data papers describing archaeology datasets with high reuse potential. We are working with a number of specialist and institutional data repositories to ensure that the associated data are professionally archived, preserved, and openly available. Equally importantly, the data and the papers are citable, and reuse will be tracked. While still in beta phase, the journal is now accepting papers. We will also be adding new functionality over the next few weeks, and refining the look and feel."

I published a paper myself, using the settlement data I have used for this paper which I published (along with Andy Bevan and Mark Lake) on the Journal of Archaeological Science a couple of years ago. The data is slightly different (it's updated to a new chronology) but I attached an R script into the paper which should allow anybody to update the core components of the paper. The entire experience has been very thoughtful, especially as much of the stuff has been created 5 years ago... Trying to find the right version of the file, check if everything was matching is everything else...  tough job. But definitely worth it, and I'm looking forward to seeing somebody using my data-set, perhaps proving that I was wrong in my conclusions !

So open you dusty cupboards, look back at your data, fix all the small (and big) errors you'll find in there, and share your data by submitting a paper to the Journal of Open Archaeology Data.


No comments:

Post a Comment