Comment by Idrialite on 01/02/2025 at 22:57 UTC

2 upvotes, 0 direct replies (showing 0)

View submission: Propositional Interpretability in Artificial Intelligence

View parent comment

When you get a response from an LLM, there's a lot of tokens involved, and each one signifies a point where the internal activations are completely lost. Over the course of its thinking and then answering, the model has to use the tokens themselves to continue its cognition. So that's somewhat in favor of a close relation between the model's internal... thinking, whatever, and the actual text it outputs.

Replies

There's nothing here!