intersection - ElasticSearch: minimum_should_match and length of terms list -


using elasticsearch i'm trying use minimum_should_match option on terms query find documents have list of longs x% similar list of longs i'm querying with.

e.g:

{     "filter": {         "fquery": {             "query": {                 "terms": {                     "mynum": [1, 2, 3, 4, 5, 6, 7, 8, 9, 13],                     "minimum_should_match": "90%",                     "disable_coord": false                 }             }         }     } } 

will match 2 documents mynum list of:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 

and:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12] 

this works , correct since first document has 10 @ end while query contained 13 , second document contained 11 again query contained 13.

which means 1 ou of 10 numbers in query's list different in returned document , amounts allowed 90% similarity (minimum_should_match) value in query.

now issue have behaviour different in sense since second document longer , has 11 numbers in place of 10, difference level should ideally have been higher since has 2 values 11 , 12 not in query's list. e.g:

instead of computing intersection of:

(list1) [1, 2, 3, 4, 5, 6, 7, 8, 9, 13] 

with:

(list2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12] 

which 10% difference

it should since list2 longer list1, intersection should be:

(list2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12] 

with:

(list1) [1, 2, 3, 4, 5, 6, 7, 8, 9, 13] 

which 12% difference

  • is possible ?
  • if not, how weight in length of list besides using dense vector rather sparse 1 ? e.g:

using

[1, 2, 3, 4, 5, 6, 7, 8, 9, , , , 13] 

rather than:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 13] 


Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -