ff data storage running big analysis
I've spent hours reading for using ff package and couldn't get a handle on
this topic yet. Basically, I'd like to run a analysis on a big data and
save the results/statistics from the analysis.
I modified the example code written in ff package using biglm on my data
set. http://cran.r-project.org/web/packages/ff/ff.pdf
Here's my code below
library(ff)
library(ffbase)
library(doSNOW)
registerDoSNOW(makeCluster(4, type = "SOCK"))
memory.limit(size=32000)
setwd('Z:/data')
wd <- getwd()
data.path <- file.path(wd,'ffdb')
data.path.train <- file.path(data.path,'train')
ff.train <- read.table.ffdf(file='train.tsv', sep='\t')
save.ffdf(ff.train, dir=data.path.train)
library(biglm)
# Here I'm implementing biglm model on ffdf data
# Vi represents the column names
form <- V27 ~ V3 + V4 + V5 + V6 + V7 + V8 + V9 + V10 + V11 + V12 + V13 +
V14 + V15
ff.biglm <- for (i in chunk(ff.train, by=500)){
if (i[1]==1){
message("first chunk is: ", i[[1]],":",i[[2]])
biglmfit <- biglm(form, data=ff.train[i,,drop=FALSE])
}else{
message("next chunk is: ", i[[1]],":",i[[2]])
biglmfit <- update(biglmfit, ff.train[i,,drop=FALSE])
}
}
When the above code is ran, it gives the following error message:
first chunk is: 1:494 Error: cannot allocate vector of size 19.4 Gb In
addition: There were 50 or more warnings (use warnings() to see the first
50)
Is this error message in regards to the size of biglmfit cannot be fitting
to memory? Any work around to save biglmfit into ffdf data type? Or for
that matter, is there any ways to store analysis statistics into ffdf type
in chunk? Thank you.
No comments:
Post a Comment