Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
159 views
in Technique[技术] by (71.8m points)

pandas - Python: string to integer as a key

I'm trying to convert a string column in a dataframe to int. The strings should be replaced with an integer as a key value.

Data:

user_id site_id 
100     url1.com 
100     url2.com 
100     url1.com 
101     url2.com 
101     url2.com 
101     url2.com

Wanted output:

user_id site_id 
100     1 
100     2 
100     1 
101     2 
101     2 
101     2

I tried to get all unique urls with:

names = pd.unique(df.site_id.ravel()) 
urls = pd.Series(np.arange(len(names)), names) 

and then

df["site_id"] = df.applymapp(urls.get)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You want factorize to encode the values to ints:

In [52]:
df['site_id'] = pd.factorize(df['site_id'])[0] + 1
df

Out[52]:
   user_id  site_id
0      100        1
1      100        2
2      100        1
3      101        2
4      101        2
5      101        2

here factorize returns an array:

In [53]:
pd.factorize(df['site_id'])

Out[53]:
(array([0, 1, 0, 1, 1, 1], dtype=int64), Int64Index([1, 2], dtype='int64'))

we want the encoded values in the tuple and add 1 to each:

pd.factorize(df['site_id'])[0] + 1

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...