The naive vector similarity method separates English and Japanese.
For me personally, this is a big problem, so I'm not very satisfied with a simple embedded vector similarity
But when I went to bed and woke up, I felt like, "Why don't we machine translate every English sentence into Japanese and load it into a vector index?
Better than making it into a vector and then thinking "how do I paste together the spaces that are far apart"?
In Plurality Japanese translation, the language is now included in the vector index without separating the languages.
[/plurality-japanese/Plurality Vector Search](https://scrapbox.io/plurality-japanese/Plurality Vector Search)
I feel like I can't provide user value with that.
If so, it would be better to load the "Japanese" side of "other language -> Japanese" into the search index.
The association of two chunks by machine translation is a kind of "link"
What to record
This page is auto-translated from /nishio/日記2024-01-14 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.