hash - Is it possible to implement universal hashing for the complete range of integers? -


i reading universal hashing on integers. prerequisite , mandatory precondition seems choose prime number p greater set of possible keys.

i not clear on point.

if our set of keys of type int means prime number needs of next bigger data type e.g. long.

but whatever hash need down-casted int index hash table. doesn't down-casting affect quality of universal hashing (i referring distribution of keys on buckets) somehow?

if our set of keys integers means prime number needs of next bigger data type e.g. long.

that not problem. necessary otherwise hash family cannot universal. see below more information.

but whatever hash need down-casted int index hash table.

doesn't down-casting affect quality of universal hashing (i referring distribution of keys on buckets) somehow?

the answer no. try explain.

whether p has data type or not not important hash family universal. important p equal or larger u (the maximum integer of universe of integers). important p big enough (i.e. >= u).

a hash family universal when collision probability equal or smaller 1/m.

so idea hold constraint.

the value of p, in theory, can as big long or more. needs integer , prime.

  • u size of domain/universe (or number of keys). given universe u = {0, ..., u-1}, u denotes size |u|.
  • m number of bins or buckets
  • p prime must equal or greater n
  • the hash family defined h = {h(a,b)(x)} h(a,b)(x) = ((a * x + b) mod p) mod m. note a , b randomly chosen integers (from possible integers, theoretically can larger p) modulo prime p (which can make them either smaller or larger m, number of bins/buckets); here data type (domain of values not matter). see hashing integers on wikipedia notation.
  • follow proof on wikipedia , conclude collision probability _p/m_ * 1/(p-1) (the underscores mean truncate decimals). p >> m (p considerably bigger m) probability tends 1/m (but not mean probability better larger p is).

in other terms answering question: p being bigger data type not problem here , can required. p has equal or greater u , a , b have randomly chosen integers modulo p, no matter number of buckets m. these constraints can construct universal hash family.


maybe mathematical example help

  • let u universe of integers correspond unsigned char (in c example). u = {0, ..., 255}
  • let p (next possible) prime equal or greater 256. note p can of these types (short, int, long signed or unsigned). the point data type not play role (in programming type denotes domain of values.). whether 257 short, int or long doesn't matter here sake of correctness of mathematical proof. have chosen larger p (i.e. bigger data type); not change proof's correctness.

    1. the next possible prime number 257.
    2. we have 25 buckets, i.e. m = 25. means hash family universal if collision probability equal or less 1/25, i.e. approximately 0.04.
    3. put in values _p/m_ * 1/(p-1): _257/25_ * 1/256 = 10/256 = 0.0390625 smaller 0.04. universal hash family chosen parameters.

we have chosen m = u = 256 buckets. have collision probability of 0.003891050584, smaller 1/256 = 0,00390625. hash family still universal.

let's try m being bigger p, e.g. m = 300. collision probability 0, smaller 1/300 ~= 0.003333333333. trivial, had more buckets keys. still universal, no collisions.


implementation detail example

we have following:

  • x of type int (an element of |u|)
  • a, b, p of type long
  • m we'll see later in example

    1. choose p bigger max u (element of |u|), p of type long.
    2. choose a , b (modulo p) randomly. of type long, < p.
    3. for x (of type int u) calculate ((a*x+b) mod p). a*x of type long, (a*x+b) of type long , ((a*x+b) mod p of type long. note ((a*x+b) mod p)'s result < p. let's denote result h_a_b(x).
    4. h_a_b(x) taken modulo m, means @ step depends on data type of m whether there downcasting or not. however, not matter. h_a_b(x) < m, because take modulo m. hence value of h_a_b(x) modulo m fits m's data type. in case has downcasted there won't loss of value. , have mapped key bin/bucket.

Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -