hash - Is it possible to implement universal hashing for the complete range of integers? -
i reading universal hashing on integers. prerequisite , mandatory precondition seems choose prime number p greater set of possible keys.
i not clear on point.
if our set of keys of type int means prime number needs of next bigger data type e.g. long.
but whatever hash need down-casted int index hash table. doesn't down-casting affect quality of universal hashing (i referring distribution of keys on buckets) somehow?
if our set of keys integers means prime number needs of next bigger data type e.g. long.
that not problem. necessary otherwise hash family cannot universal. see below more information.
but whatever hash need down-casted
intindex hash table.doesn't down-casting affect quality of universal hashing (i referring distribution of keys on buckets) somehow?
the answer no. try explain.
whether p has data type or not not important hash family universal. important p equal or larger u (the maximum integer of universe of integers). important p big enough (i.e. >= u).
a hash family universal when collision probability equal or smaller
1/m.
so idea hold constraint.
the value of p, in theory, can as big long or more. needs integer , prime.
usize of domain/universe (or number of keys). given universeu = {0, ..., u-1},udenotes size|u|.mnumber of bins or bucketspprime must equal or greatern- the hash family defined
h = {h(a,b)(x)}h(a,b)(x) = ((a * x + b) mod p) mod m. notea,brandomly chosen integers (from possible integers, theoretically can largerp) modulo primep(which can make them either smaller or largerm, number of bins/buckets); here data type (domain of values not matter). see hashing integers on wikipedia notation. - follow proof on wikipedia , conclude collision probability
_p/m_ * 1/(p-1)(the underscores mean truncate decimals).p >> m(pconsiderably biggerm) probability tends1/m(but not mean probability better largerpis).
in other terms answering question: p being bigger data type not problem here , can required. p has equal or greater u , a , b have randomly chosen integers modulo p, no matter number of buckets m. these constraints can construct universal hash family.
maybe mathematical example help
- let u universe of integers correspond
unsigned char(in c example).u = {0, ..., 255} let
p(next possible) prime equal or greater256. notepcan of these types (short,int,longsigned or unsigned). the point data type not play role (in programming type denotes domain of values.). whether257short,intorlongdoesn't matter here sake of correctness of mathematical proof. have chosen largerp(i.e. bigger data type); not change proof's correctness.- the next possible prime number
257. - we have
25buckets, i.e.m = 25. means hash family universal if collision probability equal or less1/25, i.e. approximately0.04. - put in values
_p/m_ * 1/(p-1):_257/25_ * 1/256 = 10/256 = 0.0390625smaller0.04. universal hash family chosen parameters.
- the next possible prime number
we have chosen m = u = 256 buckets. have collision probability of 0.003891050584, smaller 1/256 = 0,00390625. hash family still universal.
let's try m being bigger p, e.g. m = 300. collision probability 0, smaller 1/300 ~= 0.003333333333. trivial, had more buckets keys. still universal, no collisions.
implementation detail example
we have following:
xof typeint(an element of|u|)a,b,pof typelongmwe'll see later in example- choose
pbigger maxu(element of|u|),pof typelong. - choose
a,b(modulop) randomly. of typelong,< p. - for
x(of typeintu) calculate((a*x+b) mod p).a*xof typelong,(a*x+b)of typelong,((a*x+b) mod pof typelong. note((a*x+b) mod p)'s result< p. let's denote resulth_a_b(x). h_a_b(x)takenmodulo m, means @ step depends on data type ofmwhether there downcasting or not. however, not matter.h_a_b(x)< m, because takemodulo m. hence value ofh_a_b(x) modulo mfitsm's data type. in case has downcasted there won't loss of value. , have mapped key bin/bucket.
- choose
Comments
Post a Comment