💾 Archived View for dioskouroi.xyz › thread › 29427959 captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content
➡️ Next capture (2021-12-05)
-=-=-=-=-=-=-
________________________________________________________________________________
Two lesser known really cool things about Web Workers:
1. They kind of allow for stopping synchronous operations, example: some regexes have "catastrophic backtracking", executing them will take a really long time, so what do you do if you have to execute user-provided regexes, especially if on the server? Detecting potentially catastrophic regexes is tough, reimplementing the regex engine in order to make it yield to the main thread frequently so that you can stop it is super tough, so what's the solution? You can execute the regex in a Web Worker, and if you haven't received a response within some set amount of time you can just kill the Web Worker, effectively stopping the regex execution, cool!
2. They kind of allow for blocking on promises, example: normally you can't block the event loop while you are waiting for a promise to be resolved, in other words you can't make an asynchronous function synchronous, except if you use Web Workers, you can execute the asynchronous function you need on a worker, and then use Atomics.wait on the main thread to block (without melting the computer) until that function resolves, super cool!
In #2 you use Atomics.wait in the Worker instead and then it can signal the main thread when done.
We use this to convert async browser APIs to service synchronous callbacks that our C library compiled to WebAssembly expects.
> _and then use Atomics.wait on the main thread to block_
The main thread is not allowed to use Atomics.wait. I’m not certain what the implementation status of this is because I’ve never used it and I have a vague feeling I heard that _one_ browser shipped it without that restriction, but at the very least you may get a TypeError in some user agents and you can expect to in all user agents at some point in the future when they become sterner about not blocking the main thread.
As for stopping synchronous operations, I’m dubious that would actually work; without actually testing it (and I don’t have time to test it now, though I’d be interested in the result, including across various platforms), I think it’s more likely that the regexp match would be uninterruptable, and that it would just go on munching your CPU until it finished, and _then_ terminate once it returned from the native code to the JavaScript.
I should check what's the status with Atomics.wait on Chrome, currently it seems to work fine under both Node and Deno.
I'll test out the regex thing in the following days as I need it for that exact use case, I just assumed killing the web worker would... work, hopefully that's the case otherwise I'm back to square 0 :D
---
Edit: MDN says: "The terminate() method of the Worker interface immediately terminates the Worker. This does not offer the worker an opportunity to finish its operations; it is stopped at once." so if that's not the case either the engine is wrong or the docs are wrong.
Worker.terminate() aborts the currently running script evaluation, see
https://html.spec.whatwg.org/multipage/workers.html#dom-work...
→
https://html.spec.whatwg.org/multipage/workers.html#terminat...
→
https://html.spec.whatwg.org/multipage/webappapis.html#abort...
, but that’s a fairly fuzzy definition, and it’s not generally reasonable to expect that to interrupt a currently executing piece of _native_ code, because interrupting that (e.g. by sending a signal to the thread and forcibly starting unwinding) could leave data structures in a memory-unsafe state (that is, in the improbable worst case this could be a vector for escaping the sandbox). It’s possible they’ve come up with some way of working around this, but it’s going to be _considerably_ easier and safer to just treat native code as uninterruptible by default, and possibly get known-slow blocking operations to manually periodically check if they’re being asked to stop.
So I say my gut feeling is that the match operation won’t actually be interrupted, and I wouldn’t be inclined to _depend_ on it actually being interrupted without explicit documentation of what aborting does, even if all environments I cared about did abort it immediately.
Atomics.wait() cannot be called on the main thread (i.e. when your global is a Window object).
It works fine both in Node and Deno. I haven't tested this out on browsers though, is that supposed to throw even if SharedArrayBuffer is enabled?
Yes, browsers will not allow you to block on the main thread. Atomics.waitAsync is supposed to be used instead.
This is a fairly difficult aspect of multithreading on the web, and it makes things more complicated than other platforms (like Node and Deno, as you mentioned). For example in emscripten's pthreads support layer there is code dedicated to do a sort of careful busy-wait when we have no other option, and all that is only for the case of the main thread.
But your point is still very relevant, just not on the main thread: if you can run your application in a worker, then you _can_ block on Promises using another worker that does the async operation while the first worker is synchronous. And that's really useful!
There is Atomics.waitAsync which can be used on the main thread and just returns a promise. It is shipped at least in chrome.
In the specific case of regular expressions on the server, re2 was written to handle exactly this.
The fact that creating a worker requires you to pass a module (file) name makes them extremely unergonomic to use.
This is clearly visible in the lack of any library ecosystem around them. We don't have any highly used threadpool or executor libraries using workers. Everyone seems to be manually setting up a worker and setting up the job scheduling logic from scratch.
All the boilerplate and restrictions really restrict its usage IMO.
I know you can pass a Blob URL but you can't write libraries this way since you risk running foul of the downstream consumer's browser CSP.
Workers are awesome but you are right, working with them can be painful without the right tooling.
Personally I've written my own libraries for abstracting all this away and I'm having a blast working with workers now, maybe check them out:
- WorkTank [1]: This abstracts away the difference between browser workers and Node worker threads, it makes it easy to make worker pools, and it can transfer simple functions to a worker at runtime too.
- WorkTank loader: This abstracts away loading asynchronous function from a worker basically, you just add ".worker" to your file name and that file and all its dependencies are transparently moved to a worker (or worker pool), all the rest of the app (TS types for example) won't even notice anything happened, it just works, transparently.
You might want to check out the more popular "comlink" library too, although it didn't work for me for whatever reason when I tried it, and it doesn't support worker pools I believe.
[1]:
https://github.com/fabiospampinato/worktank
[2]:
https://github.com/fabiospampinato/worktank-loader
A big complaint I have is that the built-in scheduling for workers assumes a very specific scenario. You can send messages to a single worker, but the message loop has to be operated by the browser. You also don't know the length of the message queue, even though the browser has it available, so you can't easily send a message to the worker with the least work queued. If a worker starts working on low-priority items, and you want to interject with a high-priority message, you also can't interrupt the worker, nor can the worker loop the messages on its own accord. You also can't re-sort the message queue, it's FIFO.
Basically, any sort of work scheduling that you would _like_ to do to queue high-priority messages, or have workers share a pool of work, is impossible to build with the onmessage-style of WebWorkers. It feels like an API made by someone who had read about threads once, rather than someone who's built an actual many-workers processing system like this. The event loop being unpumpable from user code feels like a giant kick in the face.
My workaround for this was to kill workers working on low-priority items, relaunch them, and resort the queue, but that all feels like a mess. Also, at some point a Chrome update made this strategy crash. I ended up just removing the WebWorker code; more trouble than it was worth for marginal code improvements.
I know that SAB and atomics add new low-level primitives to support this, but SAB is still poorly supported on widely deployed platforms. And you still need to do the serialization yourself.
> "The fact that creating a worker requires you to pass a module (file) name makes them extremely unergonomic to use."
Can anybody explain how did this ever make it into the official spec and the default implementation?
The browser will make a network call to fetch your Web Worker .js file on every instantiation of your Web Worker. Instantiations of the same Web Worker "module" aren't cached, so in a thread-pool scenario your browser would be fetching the same file over and over again.
What were the people designing this even thinking? This is so wasteful and frustrating. The API simply sucks.
> Can anybody explain how did this ever make it into the official spec and the default implementation?
This and service workers. Both are... weird, to say the least
When I'm seeing half-arsed APIs like this being introduced into so-called "modern" incarnations of the web, I start doubting the technical chops of the people involved and the process as a whole. Is it truly possible that there wasn't anyone with sufficient experience in building concurrency/multi-threading/parallelism APIs involved in building this?
Failing to account for the thread-pool scenario, as an example, is just mind boggling.
"best guess": API decisions are opinionated, reflecting the designers' views of how users should write code.
For example, FileReader API is async in the main thread. There's an equivalent FileReaderSync for sync operations but that is only available in Web Workers. Why isn't FileReaderSync available on the main thread? Because the designers didn't want people to do things that would block the main thread.
The Web Worker argument probably went along the lines of "If we allow users to pass arbitrary function objects, they may try to do something that accesses local variables from the site where the worker is created. That is obviously an error, so we should design the API so that it can't happen. Creating a separate script creates a clear mental separation and avoids that class of error"
My bigger complaint is... how you cleanly do synchronous RPC style calling? Even with clever async/await tricks, the serialization to and from input/output complicated structs seems so expensive.
There are solutions to those problems:
https://github.com/GoogleChromeLabs/comlink
https://github.com/Bnaya/objectbuffer
This past year I've live streamed the development of a small web app that benefits enormously from web workers:
https://github.com/Rezmason/wireworld-player
https://rezmason.github.io/wireworld-player
As a simulation, the main thread asks a web worker to update the world state and then render the new state. By default, this happens once per requestAnimationFrame.
But there's a "Turbo" mode (its UI toggle looks like a radioactive hazard symbol) that, when activated, tells the web worker to update as often as it can per requestAnimationFrame, speeding up the simulation around 72x while keeping the main thread 100% responsive.
The decoupling of the synchronous number crunching work from the main thread has also given me a place to experiment with much more resource intensive algorithms, like
https://jennyhasahat.github.io/hashlife.html
, which fills an enormous cache and can advance the sim by exponential time steps.
Modifying Hashlife to run in the main thread without freezing the app is possible, but it would have made the code much more complicated, run slower, and the other cores available to web workers would have gone unused.
If you're interested in leveraging web workers easily for repetitive compute-heavy tasks in a webapp, i've built a little library that takes care of launching and managing worker threads for you:
https://github.com/GitSquared/rinzler
Nice. Do you also support SharedArrayBuffers or does everything need to be serializable that is sent to/from WebWorkers?
By the way, I built something similar (?): A Rust library that mimics the API of the `futures-executor` crate, but each worker thread is a single WebWorker.
https://github.com/wngr/wasm-futures-executor
I have used them for keeping things running in tabs that are not focused, not losing time sync (not critical though).
Hyperscript supports inline web worker definitions:
https://hyperscript.org/features/worker/
We tried to improve the API to this very cool feature.
I built a little golang playground using web workers and WASM. It's nice to not hang the UI thread.
https://app.qvault.io/playground/go
A little known annoying detail of web workers is that passing large data to them via postMessage is incredibly slow. The browser has to convert your javascript object into some sort of internal binary format and it slows down the thread that's doing the sending.
Depends on the type of large data, but since 2011 there's "Transferable" [1], where objects like ArrayBuffer, MessagePort, and ImageBitmap can be transferred with a low overhead [2]. Now, if you're passing large arbitrary object graphs, you're out of luck.
[1]
https://developers.google.com/web/updates/2011/12/Transferab...
[2]
https://developer.mozilla.org/en-US/docs/Web/API/Worker/post...
Any libraries that make this nicer/easier to do instead of rewriting the same thing over and over?
https://github.com/Bnaya/objectbuffer
and comlink
Web Workers makes it possible to run a script operation in a background thread separate from the main execution thread of a web application. The advantage of this is that laborious processing can be performed in a separate thread, allowing the main (usually the UI) thread to run without being blocked/slowed down.
That can be a great feature in some circumstances, but it also seems like it can undo much of the efficiency improvement browsers have accomplished by throttling and sleeping less used tabs. Do browsers implement any kind of leashing in terms of how much resources any particular site’s workers can use, frequency of running, length of runs, etc?
Any context? Is there a specific reason to post this now? Web workers aren't new to Firefox, are they?
There was a discussion in another thread about Apple stifling PWAs on iOS.
One point was, web workers aren't supported correctly.
IIRC normal workers are supported (dedicated workers) but shared workers are not. Service workers are "supported" but some features that are pretty critical (like push notifications) are not.
I believe this is the other thread:
https://news.ycombinator.com/item?id=29440457
Also I think push notifications are not supported.
is web worker really good for some costly processes? would it really be useful for a dashboard with sample instant data flow?
Depends what's on the dashboard and how the data has to be transformed for display. I've worked on several such dashboards and never had to use Web Workers to solve the performance issues, these were usually caused by UI rendering not optimized for loads of data. Now, the workers can help with rendering too, for example to render a chart off the main thread, but the libraries that support this usually hide the worker magic behind an API.
We used web workers to shift a lot of processing off of the main thread. They helped in keeping the UI feeling responsive and less laggy.
What kind of processing do you mean? Unless you're developing a game or a media player, I can't think of a good example where processing might delay the UI.