I decided to investigate. Turns out you could do fairly well at such a lottery. The below script displays a lot that you can do with python, including grabbing data from the web, csv parsing, generator expressions, use of counters, scipy stuff, matplotlib plotting, etc.

One of the things I really like about Python is that you can do stuff like this, analyise the cents on the closing of the Dow Jones Industrial average for 20,806 days in only a few lines. Really only the first four lines (ignoring imports and comments) does all of the work, the rest is me doing some math and making my plots pretty.

The Code

import csv, urllib2 from collections import Counter import scipy as sp import pylab as py from scipy.stats import chisquare #Link format for yahoo finance historical data yahoo_finance_link = "http://ichart.finance.yahoo.com/table.csv?s={symbol}&a=09&b=1&c=1928&d=07&e=7&f=2011&g=d&ignore=.csv" #Use urllib2 to open the historical data for the dow jones #and use the csv module to read each line dow_data = csv.reader( urllib2.urlopen( yahoo_finance_link.format(symbol='^DJI'))) #The first line is the names of the columns, find the column # for closing prices close_index = next(dow_data).index('Close') #extract the closing prices as floats closing_prices = (float(row[close_index]) for row in dow_data) #grab the last two digits of the closing price last_digits = [int( (price%1) * 100 ) for price in closing_prices] #Count the appearance of each closing digit lotto = Counter(last_digits) #Do some statistics lotto_counts = sp.array([lotto[x] for x in range(100)]) total_data_points = sum(lotto.values()) lotto_probs = lotto_counts*1./total_data_points lotto_err = sp.sqrt(lotto_counts)/total_data_points lotto_odds = (1.- lotto_probs )/ lotto_probs lotto_odds_err = sp.sqrt((1./lotto_probs**2)**2 * lotto_err**2) #perform a chisquared test print ( "chisquare results for the Dow Jones closing cents:\n", " chisquare:{0}, p-value: {1}".format(*chisquare(lotto_counts)) ) #draw random numbers to compare random_choices = sp.random.randint(0,100,total_data_points) random = Counter(random_choices) random_counts = sp.array([random[x] for x in range(100)]) random_probs = random_counts*1./total_data_points random_err = sp.sqrt( random_counts ) / total_data_points random_odds = (1.-random_probs )/ random_probs random_odds_err = sp.sqrt((1./random_probs**2)**2 * random_err**2) print ( "chisquare results for the random closing cents:\n", " chisquare:{0}, p-value: {1}".format(*chisquare(random_counts))) #Plot the odds and the errors py.figure(1) py.clf() py.errorbar(sp.arange(100),lotto_odds,lotto_odds_err,fmt=None,label='DJI') py.errorbar(sp.arange(100),random_odds,random_odds_err,fmt=None,label='RANDOM') py.hlines(100,0,100,linestyle='dashed') py.xlim((-1,100)) py.xlabel('Last Two Digits') py.ylabel('Odds') py.title('Dow Lottery and random drawing odds results') py.savefig('oddsplot.png') #Make a heatmap of the lucky and unlucky numbers py.figure(2) py.clf() py.imshow( lotto_odds.reshape((10,10)) , interpolation='nearest') py.colorbar() py.xlabel('Second Digit') py.ylabel('First Digit') py.xticks(range(10)) py.yticks(range(10)) py.ylim((-0.5,9.5)) py.title('Dow Lotto Odds') py.savefig('heatmap.png') #Print the most common digits print "The ten most common digits are: ", [ num for num,count in lotto.most_common(10) ]

The output:

chisquare results for the Dow Jones closing cents: chisquare:1397.69124291, p-value: 3.47521670702e-228 chisquare results for the random closing cents: chisquare:108.207440161, p-value: 0.247576090193 The ten most common digits are: [12, 75, 87, 25, 50, 0, 62, 37, 9, 59]

In this last plot, I've plotted the odds { (1-p)/p ) for both the Dow Jones numbers with error, as well as randomly chosen numbers between 00-99. Notice that the effects are real. Perhaps most tellingly demonstrated in the chisquare test results, where the random draws get a p value of 0.24, the Dow Jones numbers have a p-value of 3e-228, definitely not a fair lottery.

## No comments:

## Post a Comment