Stanford CoreNLP Training Examples -


anyone know following files located:

trainfilelist = /u/nlp/data/ner/column_data/muc6.ptb.train, /u/nlp/data/ner/column_data/muc7.ptb.train

i following faq link http://nlp.stanford.edu/software/crf-faq.shtml#a

if need provide file 2 columns consisting of tokens , class, work. curious train files listed in classifier property files.

serializeto = english.muc.7class.caseless.distsim.crf.ser.gz

java -mx1g -cp "$classpath" edu.stanford.nlp.ie.nerclassifiercombiner -textfile sample.txt -ner.model classifiers/english.all.3class.distsim.crf.ser.gz,classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz -outputformat tabbedentities -textfile sample.txt > sample2.tsv

those files training data muc-6 , muc-7 tasks:

http://cs.nyu.edu/faculty/grishman/muc6.html

they not distributed stanford. see if can figure out distributed , update answer.

update: ldc distributes files if want copy, have copyright issues have purchase them ldc, why don't distribute them. here links more info:

http://www-nlpir.nist.gov/related_projects/muc/muc_data/muc_data_index.html

https://catalog.ldc.upenn.edu/ldc2003t13

https://catalog.ldc.upenn.edu/ldc2001t02


Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -