I've been running for the last couple of months a lot of simulations written in R on legion cluster here at UCL. But I still do many things on our quad-core server here at the institute, and parallelising the simulations is extremely handy.

So here's my recipe:

1)Firstly, the main ingredient, you need an R function for your model where you have a set of parameters to sweep, something like:

Rsim(InitialPopulation=100,K=5000,ReproductiveRate=0.2,TransmissionRate=0.4,MutationRate=0.1)

{

#Put Something amazing and cool here

# ...

return(list(pop=pop,fitness=fitness,diversity=diversity))

}

The function can return everything, in my case this will be a list with three vectors, say $pop, $fitness and $diversity.

2)For the actual parallelisation, you need the following two packages: foreach and doMC and their dependencies. After loading both packages you specify the number of cores you want to use, as follows:

library(foreach)

library(doMC)

cores=4

registerDoMC(cores)

3)Now, at this point I create a sweepSpace, which is basically a data.frame with all the combinations of parameters that I want to explore, repeated

runs=50

mutationSweep=c(0.0001,0.001,0.01,0.1)

reproductionSweep=c(0.1,0.2,0.3)

transmissionSweep=c(0,0.5,1.0)

sweepSpace=expand.grid(runs=1:runs,m= mutationSweep,r= reproductionSweep,z= transmissionSweep)

4)Now we are ready to run the simulation with the following code

res=foreach(a=sweepSpace$runs,b=sweepSpace$m,c=sweepSpace$r,d=weepSpace$z)%dopar%{Rsim(InitialPopulation=100,K=5000,ReproductiveRate=c,TransmissionRate=d, TransmissionRate =b)}

The foreach() function will read each row of sweepSpace and assign the values of each column to the letters, and then use these for the Rsim() function. The resulting object is a list with a length equal to the number of rows of sweepSpace (thus in this case 500), containing the results of Rsim() in each of them. The parallelisation allows the simultaneous but independent run for all combinations and the storage of the results which follows the order in the sweepSpace. So if you have a specific parameter combination which takes longer to compute, the foreach() function allows to carry on, in the meanwhile, all the other combinations and put the results in the correct place.

5)Get and plot your results!

The sweepSpace can also be used as map to go through the resulting list.

For instance, if you want to create a function which allows the plotting of all the population dynamics on a single plot, you can write something as the following code:

plotSimPop=function(res,sweepSpace, ReproductiveRate, TransmissionRate, TransmissionRate)

{

#First we want to retrieve the index of list having the specific parameters that we've requested:

tmp=res[which(sweepSpace$r==ReproductiveRate&sweepSpace$m== MutationRate&sweepSpace$z==TransmissionRate)]

#we then retrieve the number of runs, and create a matrix with our results, with row corresponding to the number of runs, and columns to the number of timesteps (here set to 1000):

nruns=length(tmp)

resMat=matrix(0,nruns,1000)

#We then loop through the results collecting and storing the population dynamics in our matrix

for (x in 1:nruns)

{

resMat[x,]=res[[x]]$pop

}

#And we can now plot the results, with the average dynamics in red:

average=apply(resMat,2,mean,na.rm=TRUE)

plot(resMat[1,],type="n",xlim=c(1,1000),ylim=c(min(resMat),max(resMat),xlab="Time",ylab="Population")

for (x in 1:nruns)

{

lines(resMat[x,],col="grey")

}

lines(average,col="red")

}

So here's my recipe:

1)Firstly, the main ingredient, you need an R function for your model where you have a set of parameters to sweep, something like:

Rsim(InitialPopulation=100,K=5000,ReproductiveRate=0.2,TransmissionRate=0.4,MutationRate=0.1)

{

#Put Something amazing and cool here

# ...

return(list(pop=pop,fitness=fitness,diversity=diversity))

}

The function can return everything, in my case this will be a list with three vectors, say $pop, $fitness and $diversity.

2)For the actual parallelisation, you need the following two packages: foreach and doMC and their dependencies. After loading both packages you specify the number of cores you want to use, as follows:

library(foreach)

library(doMC)

cores=4

registerDoMC(cores)

3)Now, at this point I create a sweepSpace, which is basically a data.frame with all the combinations of parameters that I want to explore, repeated

*n*times, with*n*being the number of runs. So in my case I want to sweep the MutationRate, the TransmissionRate and ReproductiveRate for 100 runs.runs=50

mutationSweep=c(0.0001,0.001,0.01,0.1)

reproductionSweep=c(0.1,0.2,0.3)

transmissionSweep=c(0,0.5,1.0)

sweepSpace=expand.grid(runs=1:runs,m= mutationSweep,r= reproductionSweep,z= transmissionSweep)

4)Now we are ready to run the simulation with the following code

res=foreach(a=sweepSpace$runs,b=sweepSpace$m,c=sweepSpace$r,d=weepSpace$z)%dopar%{Rsim(InitialPopulation=100,K=5000,ReproductiveRate=c,TransmissionRate=d, TransmissionRate =b)}

The foreach() function will read each row of sweepSpace and assign the values of each column to the letters, and then use these for the Rsim() function. The resulting object is a list with a length equal to the number of rows of sweepSpace (thus in this case 500), containing the results of Rsim() in each of them. The parallelisation allows the simultaneous but independent run for all combinations and the storage of the results which follows the order in the sweepSpace. So if you have a specific parameter combination which takes longer to compute, the foreach() function allows to carry on, in the meanwhile, all the other combinations and put the results in the correct place.

5)Get and plot your results!

The sweepSpace can also be used as map to go through the resulting list.

For instance, if you want to create a function which allows the plotting of all the population dynamics on a single plot, you can write something as the following code:

plotSimPop=function(res,sweepSpace, ReproductiveRate, TransmissionRate, TransmissionRate)

{

#First we want to retrieve the index of list having the specific parameters that we've requested:

tmp=res[which(sweepSpace$r==ReproductiveRate&sweepSpace$m== MutationRate&sweepSpace$z==TransmissionRate)]

#we then retrieve the number of runs, and create a matrix with our results, with row corresponding to the number of runs, and columns to the number of timesteps (here set to 1000):

nruns=length(tmp)

resMat=matrix(0,nruns,1000)

#We then loop through the results collecting and storing the population dynamics in our matrix

for (x in 1:nruns)

{

resMat[x,]=res[[x]]$pop

}

#And we can now plot the results, with the average dynamics in red:

average=apply(resMat,2,mean,na.rm=TRUE)

plot(resMat[1,],type="n",xlim=c(1,1000),ylim=c(min(resMat),max(resMat),xlab="Time",ylab="Population")

for (x in 1:nruns)

{

lines(resMat[x,],col="grey")

}

lines(average,col="red")

}

## No comments:

## Post a Comment