Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
303 views
in Technique[技术] by (71.8m points)

python - Assign a number to each unique value in a list

I have a list of strings. I want to assign a unique number to each string (the exact number is not important), and create a list of the same length using these numbers, in order. Below is my best attempt at it, but I am not happy for two reasons:

  1. It assumes that the same values are next to each other

  2. I had to start the list with a 0, otherwise the output would be incorrect

My code:

names = ['ll', 'll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'LL', 'HL', 'HL', 'HL']
numbers = [0]
num = 0
for item in range(len(names)):
    if item == len(names) - 1:
      break
    elif names[item] == names[item+1]:
        numbers.append(num)
    else:
        num = num + 1
        numbers.append(num)
print(numbers)

I want to make the code more generic, so it will work with an unknown list. Any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Without using an external library (check the EDIT for a Pandas solution) you can do it as follows :

d = {ni: indi for indi, ni in enumerate(set(names))}
numbers = [d[ni] for ni in names]

Brief explanation:

In the first line, you assign a number to each unique element in your list (stored in the dictionary d; you can easily create it using a dictionary comprehension; set returns the unique elements of names).

Then, in the second line, you do a list comprehension and store the actual numbers in the list numbers.

One example to illustrate that it also works fine for unsorted lists:

# 'll' appears all over the place
names = ['ll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'll', 'LL', 'HL', 'HL', 'HL', 'll']

That is the output for numbers:

[1, 1, 3, 3, 3, 2, 2, 1, 2, 0, 0, 0, 1]

As you can see, the number 1 associated with ll appears at the correct places.

EDIT

If you have Pandas available, you can also use pandas.factorize (which seems to be quite efficient for huge lists and also works fine for lists of tuples as explained here):

import pandas as pd

pd.factorize(names)

will then return

(array([(array([0, 0, 1, 1, 1, 2, 2, 0, 2, 3, 3, 3, 0]),
 array(['ll', 'hl', 'LL', 'HL'], dtype=object))

Therefore,

numbers = pd.factorize(names)[0]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...