Python Numpy Poisson regression producing bad numbers -

- September 15, 2015

i use poisson regression model football matches. trying fit attack , defence ratings each team based on past results. have set of results this:

a v b 2 0 b v 2 1 v b 1 1

the number of goals each team scores goes vector this:

y = numpy.array([2,0.001,2,1,1,1]) #0.001 become clear

i'm trying fit home , away defence ratings in vector b such y = exp(x*b) x matrix representing results of games.

the vector b in form:

b = [a_home_attack,  a_home_defence,  b_home_attack,  b_home_defence,  a_away_attack,  a_away_defence,  b_away_attack,  b_away_defence]

from above table of results matrix x must this:

[1,0,0,0,0,0,0,-1] [0,-1,0,0,0,0,1,0] [0,0,1,0,0,-1,0,0] [0,0,0,-1,1,0,0,0] [1,0,0,0,0,0,0,-1] [0,-1,0,0,0,0,1,0]

now since left-hand-side of model equation above in terms of e^x take logarithms of vector y (hence entering 0 0.001, log(0) not defined).

here python implementation of algorithm stated above:

import numpy  y = numpy.array([2,0.001, 2,1,1,1]) x = numpy.matrix([ [1,0,0,0,0,0,0,-1], [0,-1,0,0,0,0,1,0], [0,0,1,0,0,-1,0,0], [0,0,0,-1,1,0,0,0],  [1,0,0,0,0,0,0,-1], [0,-1,0,0,0,0,1,0]]) logy = numpy.log(y) beta = numpy.linalg.lstsq(x,logy) print beta[0]  print "a %.2f v %.2f b" % ( beta[0][0] - beta[0][7], beta[0][6] - beta[0][1] )

the output of above is:

[  1.73286795e-01   1.72693882e+00   3.46573590e-01  -1.11022302e-16 1.45089809e-16  -3.46573590e-01  -1.72693882e+00  -1.73286795e-01]   0.35 v -3.45 b

the numbers beta[0][0] - beta[0][7] , beta[0][6] - beta[0][1] represent expected number of goals home , away teams. these must positive definition has gone wrong.

if point out error of ways i'd eternally grateful.

if assume response y has poisson distribution, expected value of y equal model parameter mu, i.e. e[y]=mu.

now in poisson regression model log of expected counts linear combination of unknown parameters, i.e. log( e[y] ) = x * beta. generalized linear model, parameters beta not positive.

in order obtain fitted scores need take exponential again, i.e. e[y] = exp(x * beta).

import numpy  y = numpy.array([2,0.001, 2,1,1,1]) x = numpy.matrix([ [1,0,0,0,0,0,0,-1], [0,-1,0,0,0,0,1,0], [0,0,1,0,0,-1,0,0], [0,0,0,-1,1,0,0,0],  [1,0,0,0,0,0,0,-1], [0,-1,0,0,0,0,1,0]]) logy = numpy.log(y) beta = numpy.linalg.lstsq(x,logy)[0]  print x print beta print numpy.exp( x.dot(beta) )

this produces:

[[ 1.41421356  0.03162278  2.          1.          1.41421356  0.03162278]]

Search This Blog

Running

Python Numpy Poisson regression producing bad numbers -

Comments

Post a Comment

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -

Why does a .NET 4.0 program produce a system.unauthorizedAccess error on a Windows Server 2012 machine with .NET 4.5 installed? -