nishio #gleninjapan In the lightning talk, I didn't delve into mathematical discussions due to limited time. For a more detailed explanation, I created diagrams by embedding the meanings of words into vector spaces using LLM.
nishio This is a two-dimensional visualization of the meanings of each word embedded in a high-dimensional space using OpenAI's text embedding API. In simple terms, it demonstrates how AI recognizes the similarity in meanings of words like this.
nishio Plotting two languages on a single chart is not a straightforward task. In this chart, the first principal component axis of PCA is treated as the axis representing the differences between languages and has been removed.
nishio Here is annotated version. The plotted words are a combination of those I have considered and those that GPT-4 has suggested as being similar. So it shows GPT4 can not find English words similar to Japanese Nattoku. Understanding and agreement is major explanation in dictionaries
nishio One word can bridge multiple concepts. In this example, the Japanese word "納得" (nattoku) serves as a bridge connecting concepts like "understanding", "agreement", and "satisfaction". Similarly, in Mandarin, "數位" (shùwèi) connects concepts like "digital" and "plural".
nishio In the mapping from a high-dimensional space(H) to a low-dimensional space(L), objects that are close in H will generally remain close in L. However, there is no guarantee that objects far apart in H will also be far apart in L. nishio You can think of it like imagining the shadow of a three-dimensional object. Therefore, the absence of proximity in a low-dimensional space can be useful for understanding a high-dimensional space.
Making
In this chart, the first principal component axis of PCA is treated as the axis representing the differences between languages and has been removed.