hash - Is it possible to implement universal hashing for the complete range of integers? -
i reading universal hashing on integers. prerequisite , mandatory precondition seems choose prime number p
greater set of possible keys.
i not clear on point.
if our set of keys of type int
means prime number needs of next bigger data type e.g. long
.
but whatever hash need down-casted int index hash table. doesn't down-casting affect quality of universal hashing (i referring distribution of keys on buckets) somehow?
if our set of keys integers means prime number needs of next bigger data type e.g. long.
that not problem. necessary otherwise hash family cannot universal. see below more information.
but whatever hash need down-casted
int
index hash table.doesn't down-casting affect quality of universal hashing (i referring distribution of keys on buckets) somehow?
the answer no. try explain.
whether p
has data type or not not important hash family universal. important p
equal or larger u
(the maximum integer of universe of integers). important p
big enough (i.e. >= u
).
a hash family universal when collision probability equal or smaller
1/m
.
so idea hold constraint.
the value of p
, in theory, can as big long
or more. needs integer , prime.
u
size of domain/universe (or number of keys). given universeu = {0, ..., u-1}
,u
denotes size|u|
.m
number of bins or bucketsp
prime must equal or greatern
- the hash family defined
h = {h(a,b)(x)}
h(a,b)(x) = ((a * x + b) mod p) mod m
. notea
,b
randomly chosen integers (from possible integers, theoretically can largerp
) modulo primep
(which can make them either smaller or largerm
, number of bins/buckets); here data type (domain of values not matter). see hashing integers on wikipedia notation. - follow proof on wikipedia , conclude collision probability
_p/m_ * 1/(p-1)
(the underscores mean truncate decimals).p >> m
(p
considerably biggerm
) probability tends1/m
(but not mean probability better largerp
is).
in other terms answering question: p
being bigger data type not problem here , can required. p
has equal or greater u
, a
, b
have randomly chosen integers modulo p
, no matter number of buckets m
. these constraints can construct universal hash family.
maybe mathematical example help
- let u universe of integers correspond
unsigned char
(in c example).u = {0, ..., 255}
let
p
(next possible) prime equal or greater256
. notep
can of these types (short
,int
,long
signed or unsigned). the point data type not play role (in programming type denotes domain of values.). whether257
short
,int
orlong
doesn't matter here sake of correctness of mathematical proof. have chosen largerp
(i.e. bigger data type); not change proof's correctness.- the next possible prime number
257
. - we have
25
buckets, i.e.m = 25
. means hash family universal if collision probability equal or less1/25
, i.e. approximately0.04
. - put in values
_p/m_ * 1/(p-1)
:_257/25_ * 1/256 = 10/256 = 0.0390625
smaller0.04
. universal hash family chosen parameters.
- the next possible prime number
we have chosen m = u = 256
buckets. have collision probability of 0.003891050584
, smaller 1/256 = 0,00390625
. hash family still universal.
let's try m
being bigger p
, e.g. m = 300
. collision probability 0, smaller 1/300 ~= 0.003333333333
. trivial, had more buckets keys. still universal, no collisions.
implementation detail example
we have following:
x
of typeint
(an element of|u|
)a
,b
,p
of typelong
m
we'll see later in example- choose
p
bigger maxu
(element of|u|
),p
of typelong
. - choose
a
,b
(modulop
) randomly. of typelong
,< p
. - for
x
(of typeint
u
) calculate((a*x+b) mod p)
.a*x
of typelong
,(a*x+b)
of typelong
,((a*x+b) mod p
of typelong
. note((a*x+b) mod p)
's result< p
. let's denote resulth_a_b(x)
. h_a_b(x)
takenmodulo m
, means @ step depends on data type ofm
whether there downcasting or not. however, not matter.h_a_b(x)
< m
, because takemodulo m
. hence value ofh_a_b(x) modulo m
fitsm
's data type. in case has downcasted there won't loss of value. , have mapped key bin/bucket.
- choose
Comments
Post a Comment