BekoC: Algorithms: eigenvectors

Wednesday, December 11, 2013

Implementing Principle Component Analysis (PCA) in Python

i take a look at PCA (principle component analysis). i'm not sure this is implemented somewhere else but a quick review of my collage notes (reference needed) lead me the code below, and  data is (reference needed):

x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2 1.6
1 1.1
1.5 1.6
1.1 0.9

'''
 *@author beck 
 *@date Sep 14, 2012
 *PCA with Python 
 *bekoc.blogspot.com 
'''
import numpy as np
import matplotlib.pyplot as plt
import pylab


xs= np.loadtxt("pcaData",delimiter=" ", skiprows=1, usecols=(0,1)) # numpy array - similar to C array notation.
#get mean
meanx=np.average(xs[:,0])
meany=np.average(xs[:,1])

correctedX=[value-meanx for value in (xs[:,0])] #X data with the means subtracted
correctedY=[value-meany for value in (xs[:,1])] #Y data with the means subtracted
data= np.array([correctedX,correctedY])
print data.shape
covData=np.cov(data)#calculate covariance matrix

eigenvalues, eigenvectors = np.linalg.eig(covData)

print eigenvectors
print eigenvectors[0][0] #eigenvectors are both unit eigenvectors
print eigenvectors[1][0]
x= [n for n in range (-2,3)]
y=  [eigenvectors[1][0]*i/eigenvectors[0][0] for i in x ] 
y1=  [eigenvectors[1][1]*i/eigenvectors[0][1] for i in x ] 

print x
print y 
plt.plot(x, y,linestyle='--', label='eigenvector1')
plt.plot(x, y1, linestyle='--', label='eigenvector2')
plt.plot(data[0,:],data[1,:], marker='+', linestyle=' ',  label= "Normalized data" )

#plt.plot(xs[:,0],xs[:,1],marker='+',linestyle=' ')
pylab.ylim([-2,2])
pylab.xlim([-2,2])
plt.title('PCA example')
plt.legend()
plt.show()

The code includes step 1 to 5

PCA summary :
1- Given a dataset calculate normalized data (mean substructed data), let's say n dimension (feature) data
2-calculate covariance matrix of normalized data
3-calculate eigenvalues and eigenvectors of the covariance matrix
4-eigenvector with the largest eigenvalue is the principal component
5-choose p eigenvectors and multiply with your data
6-now your data is p dimension.

The green dotted plot of the eigenvector shows the most significant relation between dimensions

Please refer to simple and consise tutorial at georgemdallas blog

Pages

Wednesday, December 11, 2013

Implementing Principle Component Analysis (PCA) in Python