💾 Archived View for dioskouroi.xyz › thread › 24915662 captured on 2020-10-31 at 00:55:11. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
This is great. The one thing that is a little unsatisfying with a lot of examples of Q-learning is that the environment is fixed. The utility functions are owned by the square, not the agent (i.e. not a function of any local awareness).
For instance, you couldn't drop the agent into a different environment with the same rules (avoid red squares, find the green one) and expect it to do anything.
Obviously you can change this with a different representation of your state space, but that's a completely different problem, and much harder.
Is there any interesting work you could point to in that space? My expertise is in statistics in general, not really Q-learning or reinforcement learning.
Super cool. How long did this take to make?
I kind of wonder if there is some nice analogy to be made here wrt. Kelly Betting vs Bayesian RL. As in, some version of maximising log reward will have higher median performance than Bayesian RL even though on average Bayesian RL is better. By analogy, the discprepancy should come from Bayesian RL doing vastly better in some unlikely string of world trajectories.
https://www.reddit.com/r/MachineLearning/comments/jg475u/r_a...
Wow, this blog template is gorgeous.
It's looks like it's based on Distill's template (distill.pub), which is available here: