Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
368 views
in Technique[技术] by (71.8m points)

r - Reading numbers as strings

I am new at R programming and I want to read a text file in R.

One of the columns, lets say column 7 is numeric and each number represent an ID I want R to read the numbers as if they were strings. And count the number of times each ID appear in the file (such that later I can assign the frequency of each ID to the given ID for latter use) I have tried

mydata<-(read.table(filename.txt))
ID=mydata[7]
freq=table(ID)

This works but it takes the IDs as numbers. Now I have tried

freq=table(as.character(ID))

But then it takes the whole column ID as only one string and from

summary(freq)

I get

Number of cases in table: 1 
Number of factors: 1 
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

At the time of reading the data into your data frame from the text file you can specify the type of each column using the colClasses argument. See below a file have in my computer:

> head(read.csv("R/Data/ZipcodeCount.csv"))
    X zipcode stateabb countyno  countyname
1   1     401       NY      119 WESTCHESTER
2 391     501       NY      103     SUFFOLK
3 392     544       NY      103     SUFFOLK
4 393     601       PR        1    ADJUNTAS
5 630     602       PR        3      AGUADA
6 957     603       PR        5   AGUADILLA
> head(read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5))))
    X zipcode stateabb countyno  countyname
1   1   00401       NY      119 WESTCHESTER
2 391   00501       NY      103     SUFFOLK
3 392   00544       NY      103     SUFFOLK
4 393   00601       PR      001    ADJUNTAS
5 630   00602       PR      003      AGUADA
6 957   00603       PR      005   AGUADILLA

> zip<-read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5)))
> str(zip)
'data.frame':   53424 obs. of  5 variables:
 $ X         : Factor w/ 53424 levels "1","10000081",..: 1 36316 36333 36346 43638 52311 19581 23775 26481 26858 ...
 $ zipcode   : Factor w/ 41174 levels "00401","00501",..: 1 2 3 4 5 6 6 7 8 9 ...
 $ stateabb  : Factor w/ 60 levels "","  ","AK","AL",..: 41 41 41 46 46 46 46 46 46 46 ...
 $ countyno  : Factor w/ 380 levels "","000","001",..: 106 95 95 3 5 7 5 7 7 9 ...
 $ countyname: Factor w/ 1925 levels "","ABBEVILLE",..: 1844 1662 1662 9 10 11 10 11 11 12 ...
> head(table(zip[,"zipcode"]))

00401 00501 00544 00601 00602 00603 
    1     1     1     1     1     2 

as you can see R is no longer treating zipcodes as numbers but as factors. In your case you need to specify the class of the first 6 columns and then choose factor as your seventh. So if the first 6 columns are numeric it should be something like this colClasses = c(rep("numeric",6),"factor").


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...