python - Basic example for PCA with matplotlib

Question

Welcome To Ask or Share your Answers For Others

python - Basic example for PCA with matplotlib

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Basic example for PCA with matplotlib

I trying to do a simple principal component analysis with matplotlib.mlab.PCA but with the attributes of the class I can't get a clean solution to my problem. Here's an example:

Get some dummy data in 2D and start PCA:

from matplotlib.mlab import PCA
import numpy as np

N     = 1000
xTrue = np.linspace(0,1000,N)
yTrue = 3*xTrue

xData = xTrue + np.random.normal(0, 100, N)
yData = yTrue + np.random.normal(0, 100, N)
xData = np.reshape(xData, (N, 1))
yData = np.reshape(yData, (N, 1))
data  = np.hstack((xData, yData))
test2PCA = PCA(data)

Now, I just want to get the principal components as vectors in my original coordinates and plot them as arrows onto my data.

What is a quick and clean way to get there?

Thanks, Tyrax

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:30:38+0000

I don't think the mlab.PCA class is appropriate for what you want to do. In particular, the PCA class rescales the data before finding the eigenvectors:

a = self.center(a)
U, s, Vh = np.linalg.svd(a, full_matrices=False)

The center method divides by sigma:

def center(self, x):
    'center the data using the mean and sigma from training set a'
    return (x - self.mu)/self.sigma

This results in eigenvectors, pca.Wt, like this:

[[-0.70710678 -0.70710678]
 [-0.70710678  0.70710678]]

They are perpendicular, but not directly relevant to the principal axes of your original data. They are principal axes with respect to massaged data.

Perhaps it might be easier to code what you want directly (without the use of the mlab.PCA class):

import numpy as np
import matplotlib.pyplot as plt

N = 1000
xTrue = np.linspace(0, 1000, N)
yTrue = 3 * xTrue
xData = xTrue + np.random.normal(0, 100, N)
yData = yTrue + np.random.normal(0, 100, N)
xData = np.reshape(xData, (N, 1))
yData = np.reshape(yData, (N, 1))
data = np.hstack((xData, yData))

mu = data.mean(axis=0)
data = data - mu
# data = (data - mu)/data.std(axis=0)  # Uncommenting this reproduces mlab.PCA results
eigenvectors, eigenvalues, V = np.linalg.svd(data.T, full_matrices=False)
projected_data = np.dot(data, eigenvectors)
sigma = projected_data.std(axis=0).mean()
print(eigenvectors)

fig, ax = plt.subplots()
ax.scatter(xData, yData)
for axis in eigenvectors:
    start, end = mu, mu + sigma * axis
    ax.annotate(
        '', xy=end, xycoords='data',
        xytext=start, textcoords='data',
        arrowprops=dict(facecolor='red', width=2.0))
ax.set_aspect('equal')
plt.show()

enter image description here

Categories

python - Basic example for PCA with matplotlib

python - Basic example for PCA with matplotlib

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags