RAKE stop list generation
Stop list generation for [RAKE
- Words adjacent to but not included in the keyword are good candidates for [stopword
- Both precision and recall were improved by excluding from the stop list words whose frequency in the keyword is higher than the frequency adjacent to the keyword.
- F value is best with the largest stop list and could be made even better by making it larger.
- The stop list made by DF alone is worse than the opposite.
- Note that the training was performed on 1000 of the 2000 abstracts, and DF was tried on 10 or more, 25 or more, and 50 or more, respectively.
- Interesting to use DF instead of TF.
impressions
This page is auto-translated from /nishio/RAKEのストップリスト生成 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.