💾 Archived View for dioskouroi.xyz › thread › 29364890 captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
This is really cool and the interactive article they created does a great job at diving deeper:
https://attentionneuron.github.io/
It seems like if done at scale could open up a whole new world of "dump data in get observations out"-type of networks.
Although the paper deals with visual applications, in principle
it should be possible to extend to text and other data sources (input data is projected into numeric form ).
"A
similar setup to the CartPole experiment with extra noisy channels could enable a system that receives
thousands of noisy input channels to identify the small subset of channels with relevant information. "
This could be interesting - real time monitoring of a number of number of securities ticker symbols to hunt for a signal. Hedge funds and investment banks are buyers of data sets that often do not have any useful trading signals. A system like this might indicate point of focus for data set prep.
I think if you wanted to do this at scale, you would prefer Perceiver-IO (
https://arxiv.org/abs/2103.03206#deepmind
https://arxiv.org/abs/2107.14795#deepmind
).
Does this means the input to my model can be a set of varying length instead of a list of fixed length?
Could this be used for something like a card game? Consider, how would you encode the cards in your hand in such a game? The cards in your hand form a set, their order does not matter, and the number of cards in your hand at any time varies in many games.
The examples on that page are neat, but they're also stupid and artificial. Why would anyone mangle their input like that? Are there any examples of this being useful that are less artificial?
The card game example you propose is a pretty apt one.
One might have any number of cards in a given hand, with any number of varying properties (like for standard 52 card decks, suit and rank). To keep a fixed size feature vector as the input for a neural net, one approach is to have a representation that treats every distinct possible card as a single input feature in the vector, with value representing whether it is currently in a given player's hand or not (0 or 1).
This of course falls apart a bit when looking at card games where multiple copies of the same card can exist. If you just scale up the input feature values proportionally to the duplicate card count, you might assume a linear effect of those duplicate cards where a nonlinear effect exists. TDGammon's representation for Backgammon solves a similar issue, a similar way, for representing how many stones exist on a given point (if I recall correctly, some version of it cares about 0, 1, 2, or 3+ as distinct possibilities, represented effectively as different features per point). Still, as a first approximation, this does work for a decent swath of games (and abstracts the ordering of cards away).
It seems like this article's approach involves a good deal of having the neural net recognize symmetries like this automatically.
Is this "hard" permutation-invariant? For example, does shuffling the order of the inputs really have zero effect on the output? Or is it just "soft" permutation-invariant? Looking at their architecture diagram, the "attention" network receives input from the "sensory neurons" in a certain order, right? If so, I suspect order does matter, although they attempt to minimize the importance of the order.
I haven’t read the article in depth, but transformer style self-attention is usually permutation equivariant, meaning that if you change the order of inputs then you get the same change in order of outputs, but the outputs themselves stay the same.
As I understand it all outputs are then added together / averaged into one vector. That should make the model ”hard” permutation invariant.
Most actual implementations use a position encoding, though, which breaks permutation invariance. This is a good inductive bias... word order matters, for example.
Sure. But it would be very weird to use a position embedding here, when the whole point is permutation invariance.
I must not understand what transformers are. I thought they were a type of RNN where order of input matters?
Any suggestions on how to learn about transformers? Or a few sentences to get me started on the right path?
yes, there are a bunch of permutation invariant operations - such as sum, max and pairwise interactions
Reinforcement learning agents typically perform poorly if provided with inputs that were not clearly defined in training. A new approach enables RL agents to perform well, even when subject to corrupt, incomplete, or shuffled inputs.
I am not a specialist in NN and the article doesn’t even mention it, but shouldn’t input dropout and input corruption help with this?
From a description in the referenced NeurIPs spotlight paper (
https://attentionneuron.github.io
)
> While it is possible to acquire a quasi-PI agent by training with randomly shuffled observations and hope the agent’s policy network has enough capacity to memorize all the patterns, we aim for a design that achieves true PI even if the agent is trained with fix-ordered observations.
Two minute papers did a quick overview of this paper last month:
I'm not sure I agree with the "not human like" angle on this one. Is this not what our brain and visual system is doing all the time? I look at a glass of water on a table. I perceive the same glass whether I'm looking at it directly, towards the side of it, from my periphery, or even closing my eyes. Which input neurons are directly perceiving the glass actually changes drastically between these states, indeed even while just looking at it straight on your eyes imperceptibly twitch to intentionally introduce jitter in order to pick up more detail (this is called "saccades"). Include other senses and the number of ways that you can perceive something explodes combinatorially. Yet I perceive the glass as a single permanent object nearly the same in all of those scenarios despite the fact that my perception of it is varying rapidly -- this is so automatic and ingrained that you don't even conceptualize it as something that you do unless you pay attention.
That's what makes this cool and VERY human-like. Just because our visual system isn't set up to handle this particular type of reshuffling problem doesn't mean that we don't constantly have to handle the reshuffling of inputs to maintain a coherent view of reality.
Yup, you can use it to input letters and numbers.