Python Numpy Poisson regression producing bad numbers -
i use poisson regression model football matches. trying fit attack , defence ratings each team based on past results. have set of results this:
a v b 2 0 b v 2 1 v b 1 1
the number of goals each team scores goes vector this:
y = numpy.array([2,0.001,2,1,1,1]) #0.001 become clear
i'm trying fit home , away defence ratings in vector b
such y = exp(x*b)
x matrix representing results of games.
the vector b
in form:
b = [a_home_attack, a_home_defence, b_home_attack, b_home_defence, a_away_attack, a_away_defence, b_away_attack, b_away_defence]
from above table of results matrix x must this:
[1,0,0,0,0,0,0,-1] [0,-1,0,0,0,0,1,0] [0,0,1,0,0,-1,0,0] [0,0,0,-1,1,0,0,0] [1,0,0,0,0,0,0,-1] [0,-1,0,0,0,0,1,0]
now since left-hand-side of model equation above in terms of e^x take logarithms of vector y
(hence entering 0 0.001, log(0) not defined).
here python implementation of algorithm stated above:
import numpy y = numpy.array([2,0.001, 2,1,1,1]) x = numpy.matrix([ [1,0,0,0,0,0,0,-1], [0,-1,0,0,0,0,1,0], [0,0,1,0,0,-1,0,0], [0,0,0,-1,1,0,0,0], [1,0,0,0,0,0,0,-1], [0,-1,0,0,0,0,1,0]]) logy = numpy.log(y) beta = numpy.linalg.lstsq(x,logy) print beta[0] print "a %.2f v %.2f b" % ( beta[0][0] - beta[0][7], beta[0][6] - beta[0][1] )
the output of above is:
[ 1.73286795e-01 1.72693882e+00 3.46573590e-01 -1.11022302e-16 1.45089809e-16 -3.46573590e-01 -1.72693882e+00 -1.73286795e-01] 0.35 v -3.45 b
the numbers beta[0][0] - beta[0][7]
, beta[0][6] - beta[0][1]
represent expected number of goals home , away teams. these must positive definition has gone wrong.
if point out error of ways i'd eternally grateful.
if assume response y has poisson distribution, expected value of y equal model parameter mu, i.e. e[y]=mu.
now in poisson regression model log of expected counts linear combination of unknown parameters, i.e. log( e[y] ) = x * beta. generalized linear model, parameters beta not positive.
in order obtain fitted scores need take exponential again, i.e. e[y] = exp(x * beta).
import numpy y = numpy.array([2,0.001, 2,1,1,1]) x = numpy.matrix([ [1,0,0,0,0,0,0,-1], [0,-1,0,0,0,0,1,0], [0,0,1,0,0,-1,0,0], [0,0,0,-1,1,0,0,0], [1,0,0,0,0,0,0,-1], [0,-1,0,0,0,0,1,0]]) logy = numpy.log(y) beta = numpy.linalg.lstsq(x,logy)[0] print x print beta print numpy.exp( x.dot(beta) )
this produces:
[[ 1.41421356 0.03162278 2. 1. 1.41421356 0.03162278]]
Comments
Post a Comment