Markov Chain Monte Carlo (MCMC) Sampling, Part 1: The Basics (2019)

Author: vonadz

Score: 145

Comments: 21

Date: 2020-10-28 09:55:27

Web Link

________________________________________________________________________________

michaericalribo wrote at 2020-10-29 01:34:40:

The Gibbs sampler is right at the top of my personal list of the greatest technical achievements in statistics. Not only is the asymptotic guarantee of producing a sample from the target distribution a neat and elegant result, but it also opened up a really rich world of probabilistic modeling. From a practical / ML perspective, variational methods are more efficient for getting decent estimates of complex models. And there are a lot of devilish details to MCMC sampling, and the real world _never_ goes to infinity. Still, the sheer theoretical fact that an imperfect, simulation-based method can produce valid statistical estimates is just awesome.

frodetb wrote at 2020-10-29 08:42:43:

I wrote my masters on using simulated annealing to solve a particular optimisation problem. In a sense, the Gibbs sampler lies at its core, and I really have to agree with you in what you say. At the time I did not have time to think too deeply or too much about it, but I have since really been increasingly awestruck with it, with thermodynamics and statistical mechanics in general, and analogies that can be drawn to all kinds of disciplines.

tetris11 wrote at 2020-10-29 11:22:12:

In bioinformatics, MCMC was used often linkage analysis to sample a "genotype space" of a true inheritance pattern of multiple related individuals to find the location and transmission pathway of disease causing variants.

Classical genetics and statistics were very tightly linked in this regard due to scarcity of data at the time (pre-2000).

Now that the field has exploded with big data, such statistics are no longer so neccesary to answer the same problems, and a lot of the early mathematical elegance associated with the field has given way to brute-force filtering methods.

wheresmycraisin wrote at 2020-10-29 07:01:21:

I appreciate the theoretical aspects of it as well -- however, I could never get over the notion that bayesians need to use frequentist statistics to judge whether the chain has sufficiently converged... isn't that defeating the entire point?

soVeryTired wrote at 2020-10-29 10:09:34:

Interesting point, but I think the academic culture has taken a pragmatic shift: there aren't many 'pure' bayesians or frequentists left anymore, and the boundary between the two viewpoints has softened. Stats has more of a 'shut up and calculate' mentality about it these days.

wheresmycraisin wrote at 2020-10-30 23:45:06:

Then why bother pretending to calculate a posterior distribution when you can do a MLE estimator with regularization or something?

sgt101 wrote at 2020-10-29 10:29:50:

>Stats has more of a 'shut up and calculate' mentality about it these days.

I'm happy that the "pure this pure that" fundamentalism has faded into retirement... but shut up and calculate has problems - especially around causality. The "blessings of multiple causes" paper for example.

melling wrote at 2020-10-29 00:26:53:

Ben Lambert has a great series on Bayesian Statistics:

https://www.youtube.com/playlist?list=PLwJRxp3blEvZ8AKMXOy0f...

He works his way up to MCMC

ranc1d wrote at 2020-10-29 11:04:38:

Agreed, that YouTube playlist along with his book of the same name really helped me get a good understanding of Bayesian Statistics/MCMC

https://uk.sagepub.com/en-gb/eur/book/student%E2%80%99s-guid...

cake93 wrote at 2020-10-29 12:13:15:

This week, the PyMCon 2020 is happening: All around PyMC3, one of the leading libraries for MCMC/Bayesian statistics/probabilistic programming, this online conference is split into an asynchronous and a synchronous part.

The talks of the asynchronous part are publicly accessible on the forum:

https://discourse.pymc.io/c/pymcon/2020talks/15

Tickets for the synchronous part with Keynote sessions and live Q&A are still available:

https://pymc-devs.github.io/pymcon/

Fishysoup wrote at 2020-10-29 05:10:51:

Something I've had trouble finding resources for is how to apply probabilistic/Bayesian techniques and thinking to chaotic dynamical systems. People keep telling me "Look up MCMC" but I don't see the relevance to dynamical systems (further than the notion that you can maybe sample from them with MCMC somehow?)

c1ccccc1 wrote at 2020-10-29 07:04:05:

If all you want to do is sample from possible outcomes of the system, then you should just be able to run a simulation of it on your computer.

If you want to condition on an event? Say you want to predict the weather on Tuesday, conditioned on the event that it rained on Sunday? Run a lot of simulations, and only keep the ones where it rained on Sunday.

Similarly: If you want to compute an expectation value? Run a lot of simulations and take the average.

If the event is unlikely? Say you want to condition on the fact that it rained on Sunday, and the high temperature was precisely 15 degrees C? Then you have a difficult problem on your hands.

(Sometimes the expectation value of the quantity you care about will depend a lot on a few rare events with outcomes many standard deviations away from the mean. Then you also have a difficult problem on your hands.)

Sometimes MCMC will work on this kind of problem, and sometimes it won't. Even if it doesn't, maybe other techniques will work.

To apply MCMC to a dynamical system, one method is write down all of the history of the system as a single object, say a single vector. You write down what your system is doing at t=1, at t=2, etc, and all that information goes into the vector. The rules governing the system determine a probability distribution over the vector space that the vector lives in. (Or more generally, the object in the object space. The vector axioms aren't important here, it's just a nice familiar example.)

Generally speaking, if you know how to describe your dynamical system, you know how to compute an non-normalized probability for any given vector. Usually you won't be able to compute a normalized probability for that vector. That's fine, since MCMC works with non-normalized probabilities.

The next step is simply to run MCMC. If you want to condition on some fact, cut out all the parts of the space where that fact doesn't hold, and then run MCMC. If you want to compute an expectation value, there are other tweaks to MCMC that are possible (i.e. importance sampling).

Fishysoup wrote at 2020-10-29 08:04:16:

Wonderful!!! Can you point me to any textbooks/resources that go into more depth on this (or if that's too specific, dynamical systems in general at the early graduate level)?

bollu wrote at 2020-10-29 11:00:42:

Shameless plug: I wrote some high quality (I hope) notes on HMC, because I couldn't find an explanation that was (a) rigorous, (b) explained the different points of view of MCMC, and (c) was visual! Here's the notes:

https://github.com/bollu/notes/blob/master/mcmc/report.pdf

Feedback is much appreciated.

dtjohnnyb wrote at 2020-10-29 09:27:24:

Not sure if it's what you're looking for, but

https://en.m.wikipedia.org/wiki/Monte_Carlo_molecular_modeli...

has a long history and technologies around it, might hold some inspiration at least.

Chandler's statistical mechanics book might be a good place to start to dig deeper

swsieber wrote at 2020-10-29 01:05:27:

So this is only tangentially related in that it's ML related. I'm working on a system where a user uploads a single document that corresponds to several documents we already have I'm the system (and we know which documents those are) and we are supposed to map pages from the single document to pages in the other know documents.

I'm having a hard time finding literature on problems like this. I think the closest thing it's like is the stable matching problem, but I'm not sure. My current algorithm doesn't allow for missing or reordered pages.

Does anyone have magic words I could google to find pertinent research/algorithms? I think if worst comes to worse I'll have to try to model the incoming page stream as something like a bayseian ... graph? It's been so long since I've done something like that (5 years), and that was in school.

sgt101 wrote at 2020-10-29 10:33:09:

If I were you I would measure reconstruction errors from transformer predictions using input from the source document and then difference vs the target documents.

The challenge would be to get a good measure of the difference, perhaps us some cosine difference again from the transformer?

If you have specialist vocab you can train the transformers - so maybe you are doing legal matching?

setr wrote at 2020-10-29 01:25:58:

It sounds like you're just trying to do similarity search? In which case, I imagine the key terms you want to be looking for are Information Retrieval and Recommendation Algorithms

The most basic thing I'd imagine (assuming text docs) is something like index each document by TF-IDF, and then the new document by TF-IDF, and then rank results by similarity

You're basically doing a google search on documents, but with a really big query (another document)

KidComputer wrote at 2020-10-29 01:21:55:

I'd generate document embeddings for each page then put the embedding vectors in a similarity search database like Facebook Research's Faiss.

sixhobbits wrote at 2020-10-29 02:26:26:

If the pages are ordered it sounds like a text alignment problem. Lots of literature on that especially in machine translation

xvilka wrote at 2020-10-29 02:11:19:

There are many articles about MCMC in Python. I am surprised Tweag didn't write the same but for Haskell.