2 upvotes, 0 direct replies (showing 0)
View submission: Propositional Interpretability in Artificial Intelligence
When you get a response from an LLM, there's a lot of tokens involved, and each one signifies a point where the internal activations are completely lost. Over the course of its thinking and then answering, the model has to use the tokens themselves to continue its cognition. So that's somewhat in favor of a close relation between the model's internal... thinking, whatever, and the actual text it outputs.
There's nothing here!