intersection - ElasticSearch: minimum_should_match and length of terms list -
using elasticsearch i'm trying use minimum_should_match
option on terms query
find documents have list of long
s x%
similar list of long
s i'm querying with.
e.g:
{ "filter": { "fquery": { "query": { "terms": { "mynum": [1, 2, 3, 4, 5, 6, 7, 8, 9, 13], "minimum_should_match": "90%", "disable_coord": false } } } } }
will match 2 documents mynum
list of:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
and:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
this works , correct since first document has 10
@ end while query contained 13
, second document contained 11
again query contained 13
.
which means 1 ou of 10 numbers in query's list different in returned document , amounts allowed 90%
similarity (minimum_should_match
) value in query.
now issue have behaviour different in sense since second document longer , has 11 numbers in place of 10, difference level should ideally have been higher since has 2 values 11
, 12
not in query's list. e.g:
instead of computing intersection of:
(list1) [1, 2, 3, 4, 5, 6, 7, 8, 9, 13]
with:
(list2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
which 10%
difference
it should since list2
longer list1
, intersection should be:
(list2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
with:
(list1) [1, 2, 3, 4, 5, 6, 7, 8, 9, 13]
which 12%
difference
- is possible ?
- if not, how weight in length of list besides using dense vector rather sparse 1 ? e.g:
using
[1, 2, 3, 4, 5, 6, 7, 8, 9, , , , 13]
rather than:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 13]
Comments
Post a Comment