r - Fast read different type of data with same command, better seperator guessing -
i have ld data, raw output file plink below (notice spaces - used make output pretty, notice leading , trailing spaces, too):
write.table(read.table(text=" chr_a bp_a snp_a chr_b bp_b snp_b r2 1 154834183 rs1218582 1 154794318 rs9970364 0.0929391 1 154834183 rs1218582 1 154795033 rs56744813 0.10075 1 154834183 rs1218582 1 154797272 rs16836414 0.106455 1 154834183 rs1218582 1 154798550 rs200576863 0.0916789 1 154834183 rs1218582 1 154802379 rs11264270 0.176911 ",sep="x"), "type1.txt",col.names=false,row.names=false,quote=false)
or nicely tab separated file:
write.table(read.table(text=" chr_a bp_a snp_a chr_b bp_b snp_b r2 1 154834183 rs1218582 1 154794318 rs9970364 0.0929391 1 154834183 rs1218582 1 154795033 rs56744813 0.10075 1 154834183 rs1218582 1 154797272 rs16836414 0.106455 1 154834183 rs1218582 1 154798550 rs200576863 0.0916789 1 154834183 rs1218582 1 154802379 rs11264270 0.176911", sep=" "), "type2.txt",col.names=false,row.names=false,quote=false,sep="\t")
read.csv works both types of data:
read.csv("type1.txt", sep="") read.csv("type2.txt", sep="")
fread works type2:
fread("type1.txt") fread("type2.txt")
files big, in millions of rows, hence can't use read.csv
option. there way make fread
guess better? other package/function suggestions?
i use readlines
guess type of file, or tidy file using system call fread
, add overhead trying avoid.
edit: sessioninfo
r version 3.2.0 (2015-04-16) platform: x86_64-w64-mingw32/x64 (64-bit) running under: windows 7 x64 (build 7601) service pack 1
fixed on devel version, v1.9.5. either use devel (/upgrade) or wait while hit cran v1.9.6:
require(data.table) # v1.9.5+ ans <- fread("type1.txt") # chr_a bp_a snp_a chr_b bp_b snp_b r2 # 1: 1 154834183 rs1218582 1 154794318 rs9970364 0.0929391 # 2: 1 154834183 rs1218582 1 154795033 rs56744813 0.1007500 # 3: 1 154834183 rs1218582 1 154797272 rs16836414 0.1064550 # 4: 1 154834183 rs1218582 1 154798550 rs200576863 0.0916789 # 5: 1 154834183 rs1218582 1 154802379 rs11264270 0.1769110
fread()
has gained strip.white
(default=true
) amidst other arguments / bug fixes. please see readme
file on project page more info.
types recognised correctly well.
sapply(ans, class) # chr_a bp_a snp_a chr_b bp_b snp_b r2 # "integer" "integer" "character" "integer" "integer" "character" "numeric"
Comments
Post a Comment