c# - term frequency of documents with Nest Elasticsearch -
i new in elasticsearch , want top n term frequency of "content" field of specific document using nest elasticsearch. i've searched lot find proper answer works me, got should use terms vector , not term facet since counts terms in whole set of documents. know should settings term vector below;
[elasticproperty(type = nest.fieldtype.attachment, termvector =nest.termvectoroption.with_positions_offsets, store = true)] public attachment file { get; set; }
i searched getting term frequency of specific document using nest elasticsearch lot found lucene , solr. need example in nest elasticsearch. appreciate help.
one more question; solution(suggested rob) works when want term frequency of string title of documents. when change target field content of documents, gain no results back! in order able search content of documents, followed answer in link: elasticsearch & attachment type (nest c#) , works fine , can search term through content of document getting tf not work; below code it;
var searchresults = client.termvector<document>(t =>t.id(id).termstatistics().fields(f => f.file));
does have solution it?
you can client.termvector(..)
. here simple example:
document class:
public class mydocument { public int id { get; set; } [elasticproperty(termvector = termvectoroption.withpositionsoffsets)] public string description { get; set; } [elasticproperty(type = fieldtype.attachment, termvector =termvectoroption.withpositionsoffsetspayloads, store = true, index = fieldindexoption.analyzed)] public attachment file { get; set; } }
index test data:
var indicesoperationresponse = client.createindex(indexname, c => c .addmapping<mydocument>(m => m.mapfromattributes())); var mydocument = new mydocument {id = 1, description = "test cat test"}; client.index(mydocument); client.index(new mydocument {id = 2, description = "river"}); client.index(new mydocument {id = 3, description = "test"}); client.index(new mydocument {id = 4, description = "river"}); client.refresh();
retrieve term statistics through nest:
var termvectorresponse = client.termvector<mydocument>(t => t .document(mydocument) //.id(1) //you can specify document id .termstatistics() .fields(f => f.description)); foreach (var item in termvectorresponse.termvectors) { console.writeline("field: {0}", item.key); var topterms = item.value.terms.orderbydescending(x => x.value.totaltermfrequency).take(10); foreach (var term in topterms) { console.writeline("{0}: {1}", term.key, term.value.termfrequency); } }
output:
field: description cat: 1 test: 2
hope helps.
update
when checked mapping index 1 thing interesting:
{ "my_index" : { "mappings" : { "mydocument" : { "properties" : { "file" : { "type" : "attachment", "path" : "full", "fields" : { "file" : { "type" : "string" }, "author" : { "type" : "string" }, "title" : { "type" : "string" }, "name" : { "type" : "string" }, "date" : { "type" : "date", "format" : "dateoptionaltime" }, "keywords" : { "type" : "string" }, "content_type" : { "type" : "string" }, "content_length" : { "type" : "integer" }, "language" : { "type" : "string" } } }, "id" : { "type" : "integer" } } } } } }
there no information term vector.
when have created index through sense:
put http://localhost:9200/my_index/mydocument/_mapping { "mydocument": { "properties": { "file": { "type": "attachment", "path": "full", "fields": { "file": { "type": "string", "term_vector":"with_positions_offsets", "store": true } } } } } }
i able retrieve term statistics.
hope i'll later working mapping created through nest.
update2
based on greg's answer try fluent mapping:
var indicesoperationresponse = client.createindex(indexname, c => c .addmapping<mydocument>(m => m .mapfromattributes() .properties(ps => ps .attachment(s => s.name(p => p.file) .filefield(ff => ff.name(f => f.file).termvector(termvectoroption.withpositionsoffsets))))) );
Comments
Post a Comment