Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Wednesday, December 11, 2013

Implementing Principle Component Analysis (PCA) in Python

i take a look at PCA (principle component analysis). i'm not sure this is implemented somewhere else but a quick review of my collage notes (reference needed) lead me the code below, and  data is (reference needed):
x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2 1.6
1 1.1
1.5 1.6
1.1 0.9

'''
 *@author beck 
 *@date Sep 14, 2012
 *PCA with Python 
 *bekoc.blogspot.com 
'''
import numpy as np
import matplotlib.pyplot as plt
import pylab


xs= np.loadtxt("pcaData",delimiter=" ", skiprows=1, usecols=(0,1)) # numpy array - similar to C array notation.
#get mean
meanx=np.average(xs[:,0])
meany=np.average(xs[:,1])

correctedX=[value-meanx for value in (xs[:,0])] #X data with the means subtracted
correctedY=[value-meany for value in (xs[:,1])] #Y data with the means subtracted
data= np.array([correctedX,correctedY])
print data.shape
covData=np.cov(data)#calculate covariance matrix

eigenvalues, eigenvectors = np.linalg.eig(covData)

print eigenvectors
print eigenvectors[0][0] #eigenvectors are both unit eigenvectors
print eigenvectors[1][0]
x= [n for n in range (-2,3)]
y=  [eigenvectors[1][0]*i/eigenvectors[0][0] for i in x ] 
y1=  [eigenvectors[1][1]*i/eigenvectors[0][1] for i in x ] 

print x
print y 
plt.plot(x, y,linestyle='--', label='eigenvector1')
plt.plot(x, y1, linestyle='--', label='eigenvector2')
plt.plot(data[0,:],data[1,:], marker='+', linestyle=' ',  label= "Normalized data" )

#plt.plot(xs[:,0],xs[:,1],marker='+',linestyle=' ')
pylab.ylim([-2,2])
pylab.xlim([-2,2])
plt.title('PCA example')
plt.legend()
plt.show()
The code includes step 1 to 5
 PCA summary :
1- Given a dataset calculate normalized data (mean substructed data), let's say n dimension (feature) data
2-calculate covariance matrix of normalized data
3-calculate eigenvalues and eigenvectors of the covariance matrix
4-eigenvector with the largest eigenvalue is the principal component
5-choose p eigenvectors and multiply with your data
6-now your data is p dimension.
The green dotted plot of the eigenvector shows the most significant relation between dimensions

Please refer to simple and consise tutorial at georgemdallas blog 

Sunday, September 29, 2013

Understanding thread basics with Python

Recently, I needed to use the threads in order to increase the CPU efficiency in terms of idle time in Python. I came a cross great examples from agiliq's blog. I have reimplemented his code and add some minor comments.
You can refer the original post with more details.

Examining thread order:


'''
Created on Sep 29, 2013
@author: Bekoc::algorithms
'''

from threading import Thread
import time
import urllib2

class GetUrlThread(Thread):
    def __init__(self, url):
        self.url = url 
        super(GetUrlThread, self).__init__()

    def run(self):
        resp = urllib2.urlopen(self.url)
        print self.url, resp.getcode()

def get_responses():
    urls = ['http://www.google.com', 'http://www.amazon.com', 'http://www.ebay.com', 'http://www.alibaba.com', 'http://www.reddit.com']
    start = time.time()
    threads = []
    for url in urls:
        t = GetUrlThread(url)
        threads.append(t)
        print ('Thread %s is calling %s' %(t.getName(), url))
        t.start()
    for t in threads:
        t.join()
    print "Elapsed time: %s" % (time.time()-start)

get_responses()


Race Condition example without use of lock.acquire and lock.release:

'''
Created on Sep 30, 2013
@author: Bekoc::algorithms
'''
from threading import Thread
import time
#define a global variable
some_var = 0

class IncrementThreadRaceCondition(Thread):
    def run(self):
        #we want to read a global variable
        #and then increment it
        global some_var
        read_value = some_var
        time.sleep(.001)
        print "some_var in %s is %d" % (self.name, read_value)
        some_var = read_value + 1 
        print "some_var in %s after increment is %d" % (self.name, some_var)

def use_increment_thread():
    threads2 = []
    for i in range(50):
        t = IncrementThreadRaceCondition()
        threads2.append(t)
        t.start()
       
    for t in threads2:
        t.join()
    print "After 50 modifications, some_var should have become 50"
    print "After 50 modifications, some_var is %d" % (some_var,)

use_increment_thread()


in order to prevent the race condition change the run function  and import Lock:

from threading import Lock

    def run(self):
        #we want to read a global variable
        #and then increment it
        global some_var
        lock.acquire()
        read_value = some_var
        time.sleep(.001)
        print "some_var in %s is %d" % (self.name, read_value)
        some_var = read_value + 1 
        print "some_var in %s after increment is %d" % (self.name, some_var)
        lock.release()

Khan Academy offers Python Programming course

There are lots of Python online courses going on around, but take a look at khan academy playlist. It includes the basic programming concepts with Python.

  1. Introduction to Programs Data Types and Variables
  2. Binary Numbers
  3. Python Lists
  4. For Loops in Python
  5. While Loops in Python
  6. Fun with Strings
  7. Writing a Simple Factorial Program. (Python 2)
  8. Stepping Through the Factorial Program
  9. Flowchart for the Factorial Program
  10. Python 3 Not Backwards Compatible with Python 2
  11. Defining a Factorial Function
  12. Diagramming What Happens with a Function Call
  1. Recursive Factorial Function
  2. Comparing Iterative and Recursive Factorial Functions
  3. Exercise - Write a Fibonacci Function
  4. Iterative Fibonacci Function Example
  5. Stepping Through Iterative Fibonacci Function
  6. Recursive Fibonacci Example
  7. Stepping Through Recursive Fibonacci Function
  8. Exercise - Write a Sorting Function
  9. Insertion Sort Algorithm
  10. Insertion Sort in Python
  11. Stepping Through Insertion Sort Function
  12. Simpler Insertion Sort Function

Saturday, December 1, 2012

Matlab, Octove or Python for Machine Learning

but  my adviser uses Matlab

I start implementing ML algorithms after learning theory behind them however I really got stuck in which tool to write my code. There are 3 options for now: Octave, Matlab and Python  (read discussions). You can check my previous posts about python, I switched to Python after learning Perl. For now, it seems that for implementation of machine learning algorithms preferring Matlab is a good decision.

There are other tools such as R, Sage etc. i really don't know which one to master, but for now my adviser uses matlab exclusively, so do i.

I list some useful posts that are good when you make your decision :





Friday, September 14, 2012

Switching From Perl to Python, Step 5 First Step into Machine Learning

in this post i share my experience while searching about the python machine learning modules. There are lots of them, i think that there is no one that can be used for all algorithms, so for a specific algorithm you can choose one of them that satisfies your need.

First i start reading  Scientific Scripting with Python for Computational Immunology, this is the best, short tutorial ever to understand the basic statistics. While going through stackoverflow questions, i realized that many people recommend scikit-learn: machine learning in Python.

i'm familiar with matplotlib and pyplot, however now in examples another module pylab is imported. Clarification: matplotlib, pyplot, and pylab from (http://truongnghiem.wordpress.com):

pyplot is just a wrapper module to provide a Matlab-style interface to matplotlib.
Many plotting functions in Matlab are provided by pyplot with the same names and arguments.
This will ease the process of moving from Matlab to Python for scientific computation.
pylab is basically a mode in which pyplot and numpy are imported in a single namespace,
thus making the Python working environment very similar to Matlab. By importing pylab: 
from pylab import *
we can use Matlab-style commands like:
x = arange(0, 10, 0.2)
y = sin(x)
plot(x, y)

Ok. Let's start, first i try simple linear regression from Scientific Scripting with Python for Computational Immunology. We have dilution.cvs file that contains the data of:
Dilution Factor,Rep 1,Rep 2,Rep 3,Mean,sd
1,15.16,14.95,14.55,14.89,0.31
2,15.36,15.61,15.51,15.49,0.13
4,16.65,16.88,16.71,16.75,0.12
8,18.07,17.60,18.13,17.93,0.29
16,18.86,19.63,19.39,19.29,0.39
32,20.39,19.40,20.39,20.06,0.57
64,21.44,20.76,21.22,21.14,0.35
128,21.90,22.04,21.94,21.96,0.07
256,22.87,22.77,23.36,23.00,0.32
512,23.98,23.92,24.24,24.05,0.17
1024,24.91,24.83,24.92,24.89,0.05
2048,26.37,25.43,26.21,26.00,0.50
Dilution and Factor columns are used in order to implement the linear regression in two dimensional space where the line is defined as :

'''
 *@author beck 
 *@date Sep 14, 2012
 *Basic Statistics with Python 
 *bekoc.blogspot.com 
'''

import numpy
import matplotlib.pyplot as plt
#import pylab, # from pylab import *
import scipy.stats as stats

xs= numpy.loadtxt("dilution.csv",delimiter=",", skiprows=1, usecols=(0,1)) 
# numpy array - similar to C array notation.
x= numpy.log2(xs[:,0])
y=xs[:,1]

plt.plot(x,y,"x")
        
plt.xlabel('Number of Dilutions(log2)')
plt.ylabel('Rep1')
plt.title('Linear Regression example')
plt.legend()

slope, intercept, r_value, p_value, std_error = stats.linregress(x,y)
plt.plot(x,intercept+slope*x,"r-") # y=mx+b where m is slope and b is intercept
#plt.plot(x,x**2)
plt.show()

straight line seems reasonable

Resources for machine learning:

Tuesday, September 4, 2012

Switching From Perl to Python, Step 4 Basic Statistics


On August 28 2012 at 10am, the creator of matplotlib, John D. Hunter died from complications arising from cancer treatment, after a brief but intense battle with this terrible illness. John is survived by his wife Miriam, his three daughters Rahel, Ava and Clara, his sisters Layne and Mary, and his mother Sarah.
If you have benefited from John's many contributions, please say thanks in the way that would matter most to him. Please consider making a donation to the John Hunter Memorial Fund.

Rest in peace, John  Hunter (1968-August 28th, 2012). My sincere condolences to his family. Thank you for all your contribution. A great loss to the community.Thank you again.  

After learning the basics of Python, today i will try to implement some basic statistical methods including plotting. We need two important packages mathplotlib and scipy. I remember that there is also numpy library, a bit confused but it's explained very well in official scipy documentation:
"SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. It is also the name of a very popular conference on scientific programming with Python. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization."
i follow two useful blogs in order to learn how to plot and make basic statistical calculation by using pyton, after i learn these topics, i will search about machine learning libraries.




Monday, August 13, 2012

Switching From Perl to Python, Step 3 Python Data Structures

We read the Knuth, so you don't need to
-Tim Peters

After finishing first part of Python, today i decided to read about Python data structures.
Today's topics are:
Night -2-
10.  Modules
11.  Data Structures
12.  Problem Solving

While i was reading about the data structures in Python, list, tuple and dictionary implementations are very similar to corresponding Java collections list, set and map. Please check here to review some examples.

While reading about the dictionaries, defaultdict implementation was interesting, in addition, i really like the one liners, as an example:


myfile=open('names', 'r')
words=[line.rstrip() for line in myfile]
quote='''google
        excite
        yahoo
        bing
        altavista '''
#find the words that exits in file but not in quote
difference=[word for word in quote.split() if word not in words]
print difference

Some useful resources:
Python in High performance computing 
Official python documentation
Data Structures and Algorithms Using Python


Friday, August 10, 2012

Switching From Perl to Python, Step 2 Python Basics

 If you don't know any computer languages, I recommend starting with Python. It is cleanly designed, well documented, and relatively kind to beginners. Despite being a good first language, it is not just a toy; it is very powerful and flexible and well suited for large projects.
How To Become A Hacker, Eric Steven Raymond


From coffeeghost
Following topics are from Byte of Python which is a free Python book for completely beginners.
Today's topics:
Night -1-
1. → Translations
2. → Preface
3. → Introduction
4. → Installation
5. → First Steps
6. → Basics
7. → Operators and Expressions
8. → Control Flow
9. → Functions 

Resources that i have reviewed:

Switching From Perl to Python, Step 1 Which IDE to use for Python programming ?


 Python 2.x is the status quo, Python 3.x is the present and future of the language (http://wiki.python.org/moin/Python2orPython3)


i followed the instruction from vogella, however instead of installing python 2.7 directly from official python web page, i installed from enthought academic version which includes the numpy, Scipy and matplotlib. These modules will be useful when we start programming,
i do not give details for now.


Academic versions of Enthought:
Students or employees from degree-granting institutions may use these installations for an extended period free of cost.


After installing Eclipse and configuration of PyDev , i guess i'm ready for Python. Actually, i'd like to use netbeans instead but netbeans does not have python support after 7.0 versions.

Some experiments to be sure that everything works smoothly:

1-First (little) Python module (from vogella)

2- First (little)NumPy module
3-Scipy and Matplotlib module

Code is from oneau

Ok. then i'm ready for Night 1 for tomorrow:

1. → Translations
2. → Preface
3. → Introduction
4. → Installation
5. → First Steps
6. → Basics
7. → Operators and Expressions
8. → Control Flow
9. → Functions