intersection - ElasticSearch: minimum_should_match and length of terms list -
using elasticsearch i'm trying use minimum_should_match option on terms query find documents have list of longs x% similar list of longs i'm querying with.
e.g:
{     "filter": {         "fquery": {             "query": {                 "terms": {                     "mynum": [1, 2, 3, 4, 5, 6, 7, 8, 9, 13],                     "minimum_should_match": "90%",                     "disable_coord": false                 }             }         }     } }   will match 2 documents mynum list of:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]   and:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]   this works , correct since first document has 10 @ end while query contained 13 , second document contained 11 again query contained 13.
which means 1 ou of 10 numbers in query's list different in returned document , amounts allowed 90% similarity (minimum_should_match) value in query.
now issue have behaviour different in sense since second document longer , has 11 numbers in place of 10, difference level should ideally have been higher since has 2 values 11 , 12 not in query's list. e.g:
instead of computing intersection of:
(list1) [1, 2, 3, 4, 5, 6, 7, 8, 9, 13]   with:
(list2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]   which 10% difference
it should since list2 longer list1, intersection should be:
(list2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]   with:
(list1) [1, 2, 3, 4, 5, 6, 7, 8, 9, 13]   which 12% difference
- is possible ?
 - if not, how weight in length of list besides using dense vector rather sparse 1 ? e.g:
 
using
[1, 2, 3, 4, 5, 6, 7, 8, 9, , , , 13]   rather than:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 13]       
 
  
Comments
Post a Comment