Spam Spam Eggs and Spam: 2011

Wednesday, October 12, 2011

Aliased Circles

Inspired by this I wrote a little script to create antialiased circles in python
The Gist:
Example output:

Monday, October 3, 2011

difflib

Found another standard library gem: difflib, which allows you to do intelligent string comparisons.

For example:

will find the longest palindrome in a string, e.g.

>>> from palindrome import *
>>> longest_palindrome('asdfracecarasdf')
'racecar'
>>> longest_palindrome('asdffasdfaipreferpiasdfasdfa')
'aipreferpia'

Saturday, September 3, 2011

I've been looking into C/C++ interface creator known as SWIG for use with a molecular dynamics program that is in the works. It is nice in that from an interface file, that you can create entirely by hand or just use a standard header file, you can create a python module that looks and feels like python, but is really some nifty C that you've written. There are some drawbacks of course in that if a function returns a pointer, it is actually a wrapped object that can be passed to other wrapped functions, but will have little use in python without some other work. Of course for the purposes of this post I would like these pointers to be an array of floating point numbers, or a numpy array. Wouldn't it be great if you could easily pass numpy arrays back and forth to C/C++ without having to deal with writing your own wrapper using Python.h?

Well you can with a little gem called numpy.i which is a SWIG interface wrapper for the numpy array. With it, you specify certain arguments you want to be treated as numpy arrays and it does so automatically.

So on with the example. I have two C functions which fill an array sequentially and can take the dot product of two arrays. They are found below in two files

Let's now provide a python interface to these using SWIG. Here, we specify the python module name and the functions that make up the module's functions. Then we include the numpy.i interface. Next, we have to say how we want our C arguments treated. The details of what's written in the file can be found here but I have used most of the common ones: modifying an array in place, taking an input array, and returning an array as output. NOTE: the far right section of the line matches verbatim to the arguments in the C code. The interface file is as follows:

Then, we need to build the module and test. To make this module we used a setup.py script (which will not be the case for future SWIG posts). I'm not sure of the details here, but it is listed for your information.

After that, let's build with `python setup.py build` and run some speed tests. For example,

Returns


Numpy says: 3.33332833333e+17

SWIG says: 3.33332833333e+17

Time of numpy: 0.0032000541687

Time of SWIG: 0.00278091430664

Wahoo, we are actually slightly faster than the built-in numpy dot product with just a little bit of work. Next time, moving on to SWIGing classes and incorporating numpy with them as well.

Wednesday, August 24, 2011

A Dictionary of Links

I wanted a dictionary of links for an MD type simulation, and I wanted the links to be unique, and I didn't want to have to check for the order the guys appeared in the dictionary keys every time, so I wrote my own dictionary by extending the native one.

Where I put that code on gist. To pull it off, you see that I subclass dict, and define a decorator that pulls out and sorts the first argument. After that it is just decorating the '__setitem__','__getitem__',and '__delitem__' methods. This lets us do nice things now:

>>> z = Links()
>>> z[1,2] = 5
>>> z[4,3] = 10
>>> z
{(1, 2): 5, (3, 4): 10}
>>> z[2,1]
5
>>> z[1,2]
5
>>> z[2,1] = 11
>>> z
{(1, 2): 11, (3, 4): 10}
>>> del z[2,1]
>>> (3,4) in z
True
>>> (4,3) in z
True
>>> z
{(3, 4): 10}
>>> z[5] = 1
Traceback (most recent call last):
  File "", line 1, in 
  File "links.py", line 10, in orderedversion
    assert hasattr(first,"__len__"), "Key must be iterable"
AssertionError: Key must be iterable
>>> z[4,4,3] = 1
Traceback (most recent call last):
  File "", line 1, in 
  File "links.py", line 11, in orderedversion
    assert len(first)==2, "Key must be 2-tuple"
AssertionError: Key must be 2-tuple
>>> z
{(3, 4): 10}

Tuesday, August 23, 2011

Setting up ROOT and pyROOT on a new Mac

I just got a Mac and need to re-do the setup of my software stack. I'm keeping a log of anything tricky here, both for my own future reference and so others can use it as a reference. These instructions work for Lion and may work for other versions of OS X, but I haven't tested them.

Getting ROOT and pyROOT up and running:

Note: you can't use the pre-compiled ROOT because it's linked against the wrong version of python.

Install Xcode from the Mac App Store. I got a few errors the first few times I tried to install it, but eventually it worked.
Download and install the newest version of the enthought python distribution from http://www.enthought.com. If you hunt around, you can find a link to download it for free for academic use.

Set up some environment variables. You can set ROOTSYS to the directory you want to install ROOT to.

export PYTHONDIR=/Library/Frameworks/EPD64.framework/Versions/Current
export ROOTSYS=/usr/local/root
export LD_LIBRARY_PATH=$ROOTSYS/lib:$PYTHONDIR/lib/:$LD_LIBRARY_PATH
export PATH=$ROOTSYS/bin:$PYTHONDIR/bin:$PATH
export PYTHONPATH=$ROOTSYS/lib:$PYTHONPATH

Download the source for the newest production version of ROOT from http://root.cern.ch/drupal/content/downloading-root
Unpack the tarball:
```
tar -xf root_blah_blah.tar
```
```
cd root
```

Configure ROOT installation and compile

./configure macosx64 --enable-python --with-python-incdir=$PYTHONDIR/include --with-python-libdir=$PYTHONDIR/lib
make
sudo su
export ROOTSYS=/usr/local/root
make install
exit

Now you should have working installs of both ROOT and pyROOT.

Tuesday, August 16, 2011

Planet Walk 2.0

We can make our own real-time planet walk.

Planetary positions are obtained according to this NASA brief

Code available as a gist here

Results: here

Monday, August 8, 2011

Making animated gifs

I always lose this command, so I'm putting here so I can find it again

convert pics/*.png [-delay 100 -layers Optimize] anim.gif

And for sticking around, here is an animated gif for you:

Sunday, August 7, 2011

Dow Jones Industrial Lottery

I remember reading Malcolm X in highschool. In it, at one point Malcolm X gets involved with a local lottery scheme, where tickets are sold on the closing price for the Dow Jones Industrial average.

I decided to investigate. Turns out you could do fairly well at such a lottery. The below script displays a lot that you can do with python, including grabbing data from the web, csv parsing, generator expressions, use of counters, scipy stuff, matplotlib plotting, etc.

One of the things I really like about Python is that you can do stuff like this, analyise the cents on the closing of the Dow Jones Industrial average for 20,806 days in only a few lines. Really only the first four lines (ignoring imports and comments) does all of the work, the rest is me doing some math and making my plots pretty.

The Code

import csv, urllib2
from collections import Counter
import scipy as sp
import pylab as py
from scipy.stats import chisquare

#Link format for yahoo finance historical data
yahoo_finance_link = "http://ichart.finance.yahoo.com/table.csv?s={symbol}&a=09&b=1&c=1928&d=07&e=7&f=2011&g=d&ignore=.csv"

#Use urllib2 to open the historical data for the dow jones
#and use the csv module to read each line
dow_data = csv.reader(
                urllib2.urlopen(
                    yahoo_finance_link.format(symbol='^DJI')))

#The first line is the names of the columns, find the column
#   for closing prices
close_index = next(dow_data).index('Close')

#extract the closing prices as floats
closing_prices = (float(row[close_index]) for row in dow_data)

#grab the last two digits of the closing price
last_digits = [int( (price%1) * 100 ) for price in closing_prices]

#Count the appearance of each closing digit
lotto = Counter(last_digits)

#Do some statistics
lotto_counts = sp.array([lotto[x] for x in range(100)])
total_data_points = sum(lotto.values())
lotto_probs = lotto_counts*1./total_data_points
lotto_err = sp.sqrt(lotto_counts)/total_data_points
lotto_odds = (1.- lotto_probs )/ lotto_probs
lotto_odds_err = sp.sqrt((1./lotto_probs**2)**2 * lotto_err**2)

#perform a chisquared test
print ( "chisquare results for the Dow Jones closing cents:\n",
        " chisquare:{0}, p-value: {1}".format(*chisquare(lotto_counts)) )

#draw random numbers to compare
random_choices = sp.random.randint(0,100,total_data_points)
random = Counter(random_choices)
random_counts = sp.array([random[x] for x in range(100)])
random_probs = random_counts*1./total_data_points
random_err = sp.sqrt( random_counts ) / total_data_points
random_odds = (1.-random_probs )/ random_probs
random_odds_err = sp.sqrt((1./random_probs**2)**2 * random_err**2)

print ( "chisquare results for the random closing cents:\n", 
        " chisquare:{0}, p-value: {1}".format(*chisquare(random_counts)))

#Plot the odds and the errors
py.figure(1)
py.clf()
py.errorbar(sp.arange(100),lotto_odds,lotto_odds_err,fmt=None,label='DJI')
py.errorbar(sp.arange(100),random_odds,random_odds_err,fmt=None,label='RANDOM')
py.hlines(100,0,100,linestyle='dashed')
py.xlim((-1,100))
py.xlabel('Last Two Digits')
py.ylabel('Odds')
py.title('Dow Lottery and random drawing odds results')
py.savefig('oddsplot.png')

#Make a heatmap of the lucky and unlucky numbers
py.figure(2)
py.clf()
py.imshow( lotto_odds.reshape((10,10)) , interpolation='nearest')
py.colorbar()
py.xlabel('Second Digit')
py.ylabel('First Digit')
py.xticks(range(10))
py.yticks(range(10))
py.ylim((-0.5,9.5))
py.title('Dow Lotto Odds')
py.savefig('heatmap.png')

#Print the most common digits
print "The ten most common digits are: ", [ num for num,count in lotto.most_common(10) ]

The output:

chisquare results for the Dow Jones closing cents:
 chisquare:1397.69124291, p-value: 3.47521670702e-228
chisquare results for the random closing cents:
 chisquare:108.207440161, p-value: 0.247576090193
The ten most common digits are:  [12, 75, 87, 25, 50, 0, 62, 37, 9, 59]

In this last plot, I've plotted the odds { (1-p)/p ) for both the Dow Jones numbers with error, as well as randomly chosen numbers between 00-99. Notice that the effects are real. Perhaps most tellingly demonstrated in the chisquare test results, where the random draws get a p value of 0.24, the Dow Jones numbers have a p-value of 3e-228, definitely not a fair lottery.

Friday, August 5, 2011

Sets

Sets are python hash tables. You can think of them as just a dictionary without any values. So, they are a lot like lists, except they

Do not preserve order
Have fast lookups
Must be composed of hashable (read: immutable) elements
Must have unique elements

So, it looks like a bunch of downsides, but the fast lookup can really be killer. Par example:

>>> foo = range(10000)
>>> timeit int(rand()*100000) in foo
10000 loops, best of 3: 146 us per loop

>>> boo = set(range(10000))
>>> timeit int(rand()*100000) in boo
1000000 loops, best of 3: 479 ns per loop

Notice that the lookup time for a set is ~300 times faster. Sets also have nicely overloaded operators, so that we can do set operations

>>> A = set(range(5))
>>> B = set(range(2,10))
>>> 5 in A  #testing for element
False
>>> A | B  # A union B
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> A - B  # set difference
set([0,1])
>>> A & B  # set intersection
set([2,3,4])
>>> A <= B # testing subsets
False

Other nice functions to now are add and update. Add adds a single element, update will take the union of a set with something else.

>>> A
set([0,1,2,3,4])
>>> A.add(10)
>>> A
set([0,1,2,3,4,10])
>>> A.update(range(12))
>>> A
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

Full documentation here

Including Python Source Code

Following the advice here, we should have syntax highlighting now.

If we put in

<pre class="prettyprint">
def hello_world():
print "Hello World"

hello_world()
</pre>

returns
<pre>
Hello World
</pre>

We get:

def hello_world():
    print "Hello World"

hello_world()

returns

Hello World

In short use <pre> tags for output and <pre class="prettyprint"> for source.