I remember reading Malcolm X in highschool. In it, at one point Malcolm X gets involved with a local lottery scheme, where tickets are sold on the closing price for the Dow Jones Industrial average.
I decided to investigate. Turns out you could do fairly well at such a lottery. The below script displays a lot that you can do with python, including grabbing data from the web, csv parsing, generator expressions, use of counters, scipy stuff, matplotlib plotting, etc.
One of the things I really like about Python is that you can do stuff like this, analyise the cents on the closing of the Dow Jones Industrial average for 20,806 days in only a few lines. Really only the first four lines (ignoring imports and comments) does all of the work, the rest is me doing some math and making my plots pretty.
The Code
import csv, urllib2
from collections import Counter
import scipy as sp
import pylab as py
from scipy.stats import chisquare
#Link format for yahoo finance historical data
yahoo_finance_link = "http://ichart.finance.yahoo.com/table.csv?s={symbol}&a=09&b=1&c=1928&d=07&e=7&f=2011&g=d&ignore=.csv"
#Use urllib2 to open the historical data for the dow jones
#and use the csv module to read each line
dow_data = csv.reader(
urllib2.urlopen(
yahoo_finance_link.format(symbol='^DJI')))
#The first line is the names of the columns, find the column
# for closing prices
close_index = next(dow_data).index('Close')
#extract the closing prices as floats
closing_prices = (float(row[close_index]) for row in dow_data)
#grab the last two digits of the closing price
last_digits = [int( (price%1) * 100 ) for price in closing_prices]
#Count the appearance of each closing digit
lotto = Counter(last_digits)
#Do some statistics
lotto_counts = sp.array([lotto[x] for x in range(100)])
total_data_points = sum(lotto.values())
lotto_probs = lotto_counts*1./total_data_points
lotto_err = sp.sqrt(lotto_counts)/total_data_points
lotto_odds = (1.- lotto_probs )/ lotto_probs
lotto_odds_err = sp.sqrt((1./lotto_probs**2)**2 * lotto_err**2)
#perform a chisquared test
print ( "chisquare results for the Dow Jones closing cents:\n",
" chisquare:{0}, p-value: {1}".format(*chisquare(lotto_counts)) )
#draw random numbers to compare
random_choices = sp.random.randint(0,100,total_data_points)
random = Counter(random_choices)
random_counts = sp.array([random[x] for x in range(100)])
random_probs = random_counts*1./total_data_points
random_err = sp.sqrt( random_counts ) / total_data_points
random_odds = (1.-random_probs )/ random_probs
random_odds_err = sp.sqrt((1./random_probs**2)**2 * random_err**2)
print ( "chisquare results for the random closing cents:\n",
" chisquare:{0}, p-value: {1}".format(*chisquare(random_counts)))
#Plot the odds and the errors
py.figure(1)
py.clf()
py.errorbar(sp.arange(100),lotto_odds,lotto_odds_err,fmt=None,label='DJI')
py.errorbar(sp.arange(100),random_odds,random_odds_err,fmt=None,label='RANDOM')
py.hlines(100,0,100,linestyle='dashed')
py.xlim((-1,100))
py.xlabel('Last Two Digits')
py.ylabel('Odds')
py.title('Dow Lottery and random drawing odds results')
py.savefig('oddsplot.png')
#Make a heatmap of the lucky and unlucky numbers
py.figure(2)
py.clf()
py.imshow( lotto_odds.reshape((10,10)) , interpolation='nearest')
py.colorbar()
py.xlabel('Second Digit')
py.ylabel('First Digit')
py.xticks(range(10))
py.yticks(range(10))
py.ylim((-0.5,9.5))
py.title('Dow Lotto Odds')
py.savefig('heatmap.png')
#Print the most common digits
print "The ten most common digits are: ", [ num for num,count in lotto.most_common(10) ]
The output:
chisquare results for the Dow Jones closing cents:
chisquare:1397.69124291, p-value: 3.47521670702e-228
chisquare results for the random closing cents:
chisquare:108.207440161, p-value: 0.247576090193
The ten most common digits are: [12, 75, 87, 25, 50, 0, 62, 37, 9, 59]
In this last plot, I've plotted the odds { (1-p)/p ) for both the Dow Jones numbers with error, as well as randomly chosen numbers between 00-99. Notice that the effects are real. Perhaps most tellingly demonstrated in the chisquare test results, where the random draws get a p value of 0.24, the Dow Jones numbers have a p-value of 3e-228, definitely not a fair lottery.