Spam Spam Eggs and Spam: August 2011

Wednesday, August 24, 2011

A Dictionary of Links

I wanted a dictionary of links for an MD type simulation, and I wanted the links to be unique, and I didn't want to have to check for the order the guys appeared in the dictionary keys every time, so I wrote my own dictionary by extending the native one.

Where I put that code on gist. To pull it off, you see that I subclass dict, and define a decorator that pulls out and sorts the first argument. After that it is just decorating the '__setitem__','__getitem__',and '__delitem__' methods. This lets us do nice things now:

>>> z = Links()
>>> z[1,2] = 5
>>> z[4,3] = 10
>>> z
{(1, 2): 5, (3, 4): 10}
>>> z[2,1]
5
>>> z[1,2]
5
>>> z[2,1] = 11
>>> z
{(1, 2): 11, (3, 4): 10}
>>> del z[2,1]
>>> (3,4) in z
True
>>> (4,3) in z
True
>>> z
{(3, 4): 10}
>>> z[5] = 1
Traceback (most recent call last):
  File "", line 1, in 
  File "links.py", line 10, in orderedversion
    assert hasattr(first,"__len__"), "Key must be iterable"
AssertionError: Key must be iterable
>>> z[4,4,3] = 1
Traceback (most recent call last):
  File "", line 1, in 
  File "links.py", line 11, in orderedversion
    assert len(first)==2, "Key must be 2-tuple"
AssertionError: Key must be 2-tuple
>>> z
{(3, 4): 10}

Tuesday, August 23, 2011

Setting up ROOT and pyROOT on a new Mac

I just got a Mac and need to re-do the setup of my software stack. I'm keeping a log of anything tricky here, both for my own future reference and so others can use it as a reference. These instructions work for Lion and may work for other versions of OS X, but I haven't tested them.

Getting ROOT and pyROOT up and running:

Note: you can't use the pre-compiled ROOT because it's linked against the wrong version of python.

Install Xcode from the Mac App Store. I got a few errors the first few times I tried to install it, but eventually it worked.
Download and install the newest version of the enthought python distribution from http://www.enthought.com. If you hunt around, you can find a link to download it for free for academic use.

Set up some environment variables. You can set ROOTSYS to the directory you want to install ROOT to.

export PYTHONDIR=/Library/Frameworks/EPD64.framework/Versions/Current
export ROOTSYS=/usr/local/root
export LD_LIBRARY_PATH=$ROOTSYS/lib:$PYTHONDIR/lib/:$LD_LIBRARY_PATH
export PATH=$ROOTSYS/bin:$PYTHONDIR/bin:$PATH
export PYTHONPATH=$ROOTSYS/lib:$PYTHONPATH

Download the source for the newest production version of ROOT from http://root.cern.ch/drupal/content/downloading-root
Unpack the tarball:
```
tar -xf root_blah_blah.tar
```
```
cd root
```

Configure ROOT installation and compile

./configure macosx64 --enable-python --with-python-incdir=$PYTHONDIR/include --with-python-libdir=$PYTHONDIR/lib
make
sudo su
export ROOTSYS=/usr/local/root
make install
exit

Now you should have working installs of both ROOT and pyROOT.

Tuesday, August 16, 2011

Planet Walk 2.0

We can make our own real-time planet walk.

Planetary positions are obtained according to this NASA brief

Code available as a gist here

Results: here

Monday, August 8, 2011

Making animated gifs

I always lose this command, so I'm putting here so I can find it again

convert pics/*.png [-delay 100 -layers Optimize] anim.gif

And for sticking around, here is an animated gif for you:

Sunday, August 7, 2011

Dow Jones Industrial Lottery

I remember reading Malcolm X in highschool. In it, at one point Malcolm X gets involved with a local lottery scheme, where tickets are sold on the closing price for the Dow Jones Industrial average.

I decided to investigate. Turns out you could do fairly well at such a lottery. The below script displays a lot that you can do with python, including grabbing data from the web, csv parsing, generator expressions, use of counters, scipy stuff, matplotlib plotting, etc.

One of the things I really like about Python is that you can do stuff like this, analyise the cents on the closing of the Dow Jones Industrial average for 20,806 days in only a few lines. Really only the first four lines (ignoring imports and comments) does all of the work, the rest is me doing some math and making my plots pretty.

The Code

import csv, urllib2
from collections import Counter
import scipy as sp
import pylab as py
from scipy.stats import chisquare

#Link format for yahoo finance historical data
yahoo_finance_link = "http://ichart.finance.yahoo.com/table.csv?s={symbol}&a=09&b=1&c=1928&d=07&e=7&f=2011&g=d&ignore=.csv"

#Use urllib2 to open the historical data for the dow jones
#and use the csv module to read each line
dow_data = csv.reader(
                urllib2.urlopen(
                    yahoo_finance_link.format(symbol='^DJI')))

#The first line is the names of the columns, find the column
#   for closing prices
close_index = next(dow_data).index('Close')

#extract the closing prices as floats
closing_prices = (float(row[close_index]) for row in dow_data)

#grab the last two digits of the closing price
last_digits = [int( (price%1) * 100 ) for price in closing_prices]

#Count the appearance of each closing digit
lotto = Counter(last_digits)

#Do some statistics
lotto_counts = sp.array([lotto[x] for x in range(100)])
total_data_points = sum(lotto.values())
lotto_probs = lotto_counts*1./total_data_points
lotto_err = sp.sqrt(lotto_counts)/total_data_points
lotto_odds = (1.- lotto_probs )/ lotto_probs
lotto_odds_err = sp.sqrt((1./lotto_probs**2)**2 * lotto_err**2)

#perform a chisquared test
print ( "chisquare results for the Dow Jones closing cents:\n",
        " chisquare:{0}, p-value: {1}".format(*chisquare(lotto_counts)) )

#draw random numbers to compare
random_choices = sp.random.randint(0,100,total_data_points)
random = Counter(random_choices)
random_counts = sp.array([random[x] for x in range(100)])
random_probs = random_counts*1./total_data_points
random_err = sp.sqrt( random_counts ) / total_data_points
random_odds = (1.-random_probs )/ random_probs
random_odds_err = sp.sqrt((1./random_probs**2)**2 * random_err**2)

print ( "chisquare results for the random closing cents:\n", 
        " chisquare:{0}, p-value: {1}".format(*chisquare(random_counts)))

#Plot the odds and the errors
py.figure(1)
py.clf()
py.errorbar(sp.arange(100),lotto_odds,lotto_odds_err,fmt=None,label='DJI')
py.errorbar(sp.arange(100),random_odds,random_odds_err,fmt=None,label='RANDOM')
py.hlines(100,0,100,linestyle='dashed')
py.xlim((-1,100))
py.xlabel('Last Two Digits')
py.ylabel('Odds')
py.title('Dow Lottery and random drawing odds results')
py.savefig('oddsplot.png')

#Make a heatmap of the lucky and unlucky numbers
py.figure(2)
py.clf()
py.imshow( lotto_odds.reshape((10,10)) , interpolation='nearest')
py.colorbar()
py.xlabel('Second Digit')
py.ylabel('First Digit')
py.xticks(range(10))
py.yticks(range(10))
py.ylim((-0.5,9.5))
py.title('Dow Lotto Odds')
py.savefig('heatmap.png')

#Print the most common digits
print "The ten most common digits are: ", [ num for num,count in lotto.most_common(10) ]

The output:

chisquare results for the Dow Jones closing cents:
 chisquare:1397.69124291, p-value: 3.47521670702e-228
chisquare results for the random closing cents:
 chisquare:108.207440161, p-value: 0.247576090193
The ten most common digits are:  [12, 75, 87, 25, 50, 0, 62, 37, 9, 59]

In this last plot, I've plotted the odds { (1-p)/p ) for both the Dow Jones numbers with error, as well as randomly chosen numbers between 00-99. Notice that the effects are real. Perhaps most tellingly demonstrated in the chisquare test results, where the random draws get a p value of 0.24, the Dow Jones numbers have a p-value of 3e-228, definitely not a fair lottery.

Friday, August 5, 2011

Sets

Sets are python hash tables. You can think of them as just a dictionary without any values. So, they are a lot like lists, except they

Do not preserve order
Have fast lookups
Must be composed of hashable (read: immutable) elements
Must have unique elements

So, it looks like a bunch of downsides, but the fast lookup can really be killer. Par example:

>>> foo = range(10000)
>>> timeit int(rand()*100000) in foo
10000 loops, best of 3: 146 us per loop

>>> boo = set(range(10000))
>>> timeit int(rand()*100000) in boo
1000000 loops, best of 3: 479 ns per loop

Notice that the lookup time for a set is ~300 times faster. Sets also have nicely overloaded operators, so that we can do set operations

>>> A = set(range(5))
>>> B = set(range(2,10))
>>> 5 in A  #testing for element
False
>>> A | B  # A union B
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> A - B  # set difference
set([0,1])
>>> A & B  # set intersection
set([2,3,4])
>>> A <= B # testing subsets
False

Other nice functions to now are add and update. Add adds a single element, update will take the union of a set with something else.

>>> A
set([0,1,2,3,4])
>>> A.add(10)
>>> A
set([0,1,2,3,4,10])
>>> A.update(range(12))
>>> A
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

Full documentation here

Including Python Source Code

Following the advice here, we should have syntax highlighting now.

If we put in

<pre class="prettyprint">
def hello_world():
print "Hello World"

hello_world()
</pre>

returns
<pre>
Hello World
</pre>

We get:

def hello_world():
    print "Hello World"

hello_world()

returns

Hello World

In short use <pre> tags for output and <pre class="prettyprint"> for source.