Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
912 views
in Technique[技术] by (71.8m points)

matplotlib - Python - Categorical bubble plot

I have a 12x17 dataframe and want to create a categorical bubble plot looking like this one:

https://i.stack.imgur.com/IvD58.png (from Categorical bubble plot for mapping studies)

My dataframe looks basically like this:

#      A     B     C
# X   0.3   0.2   0.4
# Y   0.1   0.4   0.1

I can't use matplotlib.scatter because it does not take categorical input and creating fake values doesn't work either because it's not n*n. Or can I? I couldn't figure it out. I found seaborn.stripplot which takes one categorical input but the size of all bubbles is the same so I am stuck.

Any ideas how I could create such a plot in python? Thanks a lot.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think a scatter plot is perfectly suitable to create this kind of categorical bubble plot.

Create the dataframe:

import pandas as pd
df = pd.DataFrame([[.3,.2,.4],[.1,.4,.1]], columns=list("ABC"), index=list("XY"))

Option 1: unstack the DataFrame

dfu = df.unstack().reset_index()
dfu.columns = list("XYS")

This creates a table like

   X  Y    S
0  A  X  0.3
1  A  Y  0.1
2  B  X  0.2
3  B  Y  0.4
4  C  X  0.4
5  C  Y  0.1

which you can plot column-wise. Since the sizes of scatters are points one would need to multiply the S column with some large number, like 5000 to get large bubbles.

import matplotlib.pyplot as plt
dfu["S"] *= 5000
plt.scatter(x="X", y="Y", s="S", data=dfu)
plt.margins(.4)
plt.show()

Option 2: create grid

Using e.g. numpy, one may create a grid of the dataframe's columns and index such that one may then plot a scatter of the flattened grid. Again one would need to multiply the dataframe values by some large number.

import numpy as np
import matplotlib.pyplot as plt

x,y = np.meshgrid(df.columns, df.index)

df *= 5000
plt.scatter(x=x.flatten(), y=y.flatten(), s=df.values.flatten())
plt.margins(.4)
plt.show()

In both cases the result would look like

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...