Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
238 views
in Technique[技术] by (71.8m points)

python - Can I mimic a log scale of an axis in matplotlib without transforming the associated data?

I am trying to display a Zipf plot, which is typically displayed on a log-log scale.

I'm using a library which gives rank in linear scale and frequencies in log scale. I have the following code which plots my data fairly correctly:

ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414, 
        -19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]

data = {
    'ranks': ranks, 
    'freqs': freqs,
}

df = pd.DataFrame(data=data)

_, ax = plt.subplots(figsize=(7, 7))
ax.set(xscale="log", yscale="linear")
ax.set_title("Zipf plot")
sns.regplot("ranks", "freqs", data=df, ax=ax, fit_reg=False)
ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
ax.grid(True, which="both")
plt.show()

The resulting plot is:

Zipf 1

The plot looks good, but the y-label is weird. I'd like it to be displayed in log-increments as well. My current workaround is to raise 10 to the power of each element in the freqs list; i.e.,

freqs = [10**freq for freq in freqs]
# ...

and change the yscale in ax.set to log; i.e.,

_, ax = plt.subplots(figsize=(7, 7))
ax.set(xscale="log", yscale="log")
ax.set_title("Zipf plot")
# ...

This gives me the expected plot (below), but it requires a transform of the data which is a) relatively expensive, b) redundant, c) lossy.

Zipf 2

Is there a way to mimic the log scale of the axes in a matplotlib plot without transforming the data?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First a comment: Personally i would prefer the method of rescaling the data, since it makes everything much easier at the expense of some more memory/cpu time and accurary should not matter

Now to the question, which is acutally how to mimic a log scale on a linear axis

Solution 1: mimic the log scale

This is not easy. Setting the axes to log scale changes a lot in the background and one needs to mimic all of that.

  • The easy part is to set the major tickmark frequency to 1 by using matplotlib.ticker.MultipleLocator()
  • Creating the minor tickmarks at positions which look logarithmic is harder. The best solution I could come up with is to set them manually using the matplotlib.ticker.FixedLocator()
  • Last we need to change the tickmarks to represent the actual numbers, meaning that they should look like 10^(-x) instead of -x. I am aware of two options here:
    • Using a FuncFormatter that sets the values 10**x in scientific format.
    • Using a FuncFormatter that sets the values 10^x in Latex format. This looks much nicer but contrasts to the rest of the plot.

I do not know any better solution for that last point, but maybe someone else does.

Here is the code and how it looks.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from matplotlib.ticker import MultipleLocator, FixedLocator, FuncFormatter

###### Locators for Y-axis
# set tickmarks at multiples of 1.
majorLocator = MultipleLocator(1.)
# create custom minor ticklabels at logarithmic positions
ra = np.array([ [n+(1.-np.log10(i))]  for n in xrange(10,20) for i in [2,3,4,5,6,7,8,9][::-1]]).flatten()*-1.
minorLocator = FixedLocator(ra)
###### Formatter for Y-axis (chose any of the following two)
# show labels as powers of 10 (looks ugly)
majorFormatter= FuncFormatter(lambda x,p: "{:.1e}".format(10**x) ) 
# or using MathText (looks nice, but not conform to the rest of the layout)
majorFormatter= FuncFormatter(lambda x,p: r"$10^{"+"{x:d}".format(x=int(x))+r"}$" ) 

ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414, 
        -19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]

data = {
    'ranks': ranks, 
    'freqs': freqs,
}

df = pd.DataFrame(data=data)

_, ax = plt.subplots(figsize=(6, 6))
ax.set(xscale="log", yscale="linear")
ax.set_title("Zipf plot")

sns.regplot("ranks", "freqs", data=df, ax=ax, fit_reg=False)

# Set the locators
ax.yaxis.set_major_locator(majorLocator)
ax.yaxis.set_minor_locator(minorLocator)
# Set formatter if you like to have the ticklabels consistently in power notation
ax.yaxis.set_major_formatter(majorFormatter)

ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
ax.grid(True, which="both")
plt.show()

enter image description here

Solution 2: Use a different axes

A different solution, of which I did not think in the first place, would be to use two different axes, one with a loglog scale which looks nice and produces the correct labels and ticks and anotherone to plot the data to.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

ranks = [3541, 60219, 172644, 108926, 733215, 1297533, 1297534, 1297535]
# These frequencies are already log-scale
freqs = [-10.932271003723145, -15.213129043579102, -17.091760635375977, -16.27560806274414, 
        -19.482173919677734, -19.502029418945312, -19.502029418945312, -19.502029418945312]

data = {
    'ranks': ranks, 
    'freqs': freqs,
}

df = pd.DataFrame(data=data)

fig, ax = plt.subplots(figsize=(6, 6))
# use 2 axes
# ax is the log, log scale which produces nice labels and ticks
ax.set(xscale="log", yscale="log")
ax.set_title("Zipf plot")
# ax2 is the axes where the values are plottet to
ax2 = ax.twinx()

#plot values to ax2
sns.regplot("ranks", "freqs", data=df, ax=ax2, fit_reg=False)
# set the limits of the log log axis to 10 to the power of the label of ax2
ax.set_ylim(10**np.array(ax2.get_ylim())  )


ax.set_xlabel("Frequency rank of token")
ax.set_ylabel("Absolute frequency of token")
# remove ticklabels and axislabel from ax2
ax2.set_yticklabels([])
ax2.set_ylabel("")
ax.grid(True, which="both")
plt.show()

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...