You can probably make jointly trained decoder to turn a vector back into a new document which most closely matches.
Would be cool to add together the vectors for harry potter and lord of the rings and then decode that into a new book about Frodo going to wizard school to collect the ring to help push Voldemort into mount doom.
This is really interesting! I've experimented with similar idea, but with time series forecasting on the sentence embeddings - https://github.com/Srakai/embcaster.
It turns out you can tokenise arbitrary information into constant vector which is really useful for later processing. The vec2text (https://github.com/vec2text/vec2text) is an excellent asset if you want to reverse the embeddings back to text. This allows you to encode arbitrary data into standarized vectors, and all the way back.
Previous discussion
https://news.ycombinator.com/item?id=45784455
https://news.ycombinator.com/item?id=45756599
You can probably make jointly trained decoder to turn a vector back into a new document which most closely matches.
Would be cool to add together the vectors for harry potter and lord of the rings and then decode that into a new book about Frodo going to wizard school to collect the ring to help push Voldemort into mount doom.
Look at https://github.com/vec2text/vec2text
Isn’t that an auto encoder?
This is really interesting! I've experimented with similar idea, but with time series forecasting on the sentence embeddings - https://github.com/Srakai/embcaster.
It turns out you can tokenise arbitrary information into constant vector which is really useful for later processing. The vec2text (https://github.com/vec2text/vec2text) is an excellent asset if you want to reverse the embeddings back to text. This allows you to encode arbitrary data into standarized vectors, and all the way back.
It works with image embeddings too: https://youtu.be/r6TJfGUhv6s?si=_LC0d4Mwyw18c53B