When each page of a book is the target document
When a book is one target document
That is, it is affected in the opposite direction by the contour of the object.
${\displaystyle {\hat {f}}_{h}(x)={\frac {1}{nh}}\sum {i=1}^{n}K\left({\frac {x-x{i}}{h}}\right)}$
If the density estimation in the appropriate window is truly uniform in appearance, it should be uniformly distributed
Just look at the distance of the distribution from there.
The distance of the distribution is [Kullback-Leibler information content - Wikipedia https://ja.wikipedia.org/wiki/%E3%82%AB%E3%83%AB%E3%83%90%E3%83%83%E3%82%AF%E3%83%BB%E3%83%A9%E3%82 %A4%E3%83%96%E3%83%A9%E3%83%BC%E6%83%85%E5%A0%B1%E9%87%8F] or
And one distribution is fixed.
Since we can ignore Q if we only consider large and small relationships
$\sum P(i) \log P(i)$
Oh, here negative entropy, you can use this negative entropy. - entropy (in the sense of entropy)
Suppose suffix array is created.
The position of the occurrence of a keyword can be determined by looking at the position of suffixes that begin with that keyword.
Can we get a density estimate from that?
Assumed data size
Miscellaneous Methods
This page is auto-translated from /nishio/文書が階層的 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.