💾 Archived View for dioskouroi.xyz › thread › 29427959 captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Web Workers API

Author: KubikPixel

Score: 92

Comments: 39

Date: 2021-12-03 08:48:55

Web Link

________________________________________________________________________________

fabiospampinato wrote at 2021-12-04 14:47:12:

Two lesser known really cool things about Web Workers:

1. They kind of allow for stopping synchronous operations, example: some regexes have "catastrophic backtracking", executing them will take a really long time, so what do you do if you have to execute user-provided regexes, especially if on the server? Detecting potentially catastrophic regexes is tough, reimplementing the regex engine in order to make it yield to the main thread frequently so that you can stop it is super tough, so what's the solution? You can execute the regex in a Web Worker, and if you haven't received a response within some set amount of time you can just kill the Web Worker, effectively stopping the regex execution, cool!

2. They kind of allow for blocking on promises, example: normally you can't block the event loop while you are waiting for a promise to be resolved, in other words you can't make an asynchronous function synchronous, except if you use Web Workers, you can execute the asynchronous function you need on a worker, and then use Atomics.wait on the main thread to block (without melting the computer) until that function resolves, super cool!

wffurr wrote at 2021-12-04 15:28:45:

In #2 you use Atomics.wait in the Worker instead and then it can signal the main thread when done.

We use this to convert async browser APIs to service synchronous callbacks that our C library compiled to WebAssembly expects.

chrismorgan wrote at 2021-12-04 14:56:21:

> _and then use Atomics.wait on the main thread to block_

The main thread is not allowed to use Atomics.wait. I’m not certain what the implementation status of this is because I’ve never used it and I have a vague feeling I heard that _one_ browser shipped it without that restriction, but at the very least you may get a TypeError in some user agents and you can expect to in all user agents at some point in the future when they become sterner about not blocking the main thread.

As for stopping synchronous operations, I’m dubious that would actually work; without actually testing it (and I don’t have time to test it now, though I’d be interested in the result, including across various platforms), I think it’s more likely that the regexp match would be uninterruptable, and that it would just go on munching your CPU until it finished, and _then_ terminate once it returned from the native code to the JavaScript.

fabiospampinato wrote at 2021-12-04 15:07:22:

I should check what's the status with Atomics.wait on Chrome, currently it seems to work fine under both Node and Deno.

I'll test out the regex thing in the following days as I need it for that exact use case, I just assumed killing the web worker would... work, hopefully that's the case otherwise I'm back to square 0 :D

---

Edit: MDN says: "The terminate() method of the Worker interface immediately terminates the Worker. This does not offer the worker an opportunity to finish its operations; it is stopped at once." so if that's not the case either the engine is wrong or the docs are wrong.

chrismorgan wrote at 2021-12-04 16:14:29:

Worker.terminate() aborts the currently running script evaluation, see

https://html.spec.whatwg.org/multipage/workers.html#dom-work...

→

https://html.spec.whatwg.org/multipage/workers.html#terminat...

→

https://html.spec.whatwg.org/multipage/webappapis.html#abort...

, but that’s a fairly fuzzy definition, and it’s not generally reasonable to expect that to interrupt a currently executing piece of _native_ code, because interrupting that (e.g. by sending a signal to the thread and forcibly starting unwinding) could leave data structures in a memory-unsafe state (that is, in the improbable worst case this could be a vector for escaping the sandbox). It’s possible they’ve come up with some way of working around this, but it’s going to be _considerably_ easier and safer to just treat native code as uninterruptible by default, and possibly get known-slow blocking operations to manually periodically check if they’re being asked to stop.

So I say my gut feeling is that the match operation won’t actually be interrupted, and I wouldn’t be inclined to _depend_ on it actually being interrupted without explicit documentation of what aborting does, even if all environments I cared about did abort it immediately.

domenicd wrote at 2021-12-04 14:54:01:

Atomics.wait() cannot be called on the main thread (i.e. when your global is a Window object).

fabiospampinato wrote at 2021-12-04 15:05:12:

It works fine both in Node and Deno. I haven't tested this out on browsers though, is that supposed to throw even if SharedArrayBuffer is enabled?

azakai wrote at 2021-12-04 16:13:17:

Yes, browsers will not allow you to block on the main thread. Atomics.waitAsync is supposed to be used instead.

This is a fairly difficult aspect of multithreading on the web, and it makes things more complicated than other platforms (like Node and Deno, as you mentioned). For example in emscripten's pthreads support layer there is code dedicated to do a sort of careful busy-wait when we have no other option, and all that is only for the case of the main thread.

But your point is still very relevant, just not on the main thread: if you can run your application in a worker, then you _can_ block on Promises using another worker that does the async operation while the first worker is synchronous. And that's really useful!

no_way wrote at 2021-12-04 15:14:36:

There is Atomics.waitAsync which can be used on the main thread and just returns a promise. It is shipped at least in chrome.

jefftk wrote at 2021-12-04 18:12:17:

In the specific case of regular expressions on the server, re2 was written to handle exactly this.

yutijke wrote at 2021-12-04 16:51:20:

The fact that creating a worker requires you to pass a module (file) name makes them extremely unergonomic to use.

This is clearly visible in the lack of any library ecosystem around them. We don't have any highly used threadpool or executor libraries using workers. Everyone seems to be manually setting up a worker and setting up the job scheduling logic from scratch.

All the boilerplate and restrictions really restrict its usage IMO.

I know you can pass a Blob URL but you can't write libraries this way since you risk running foul of the downstream consumer's browser CSP.

fabiospampinato wrote at 2021-12-04 21:11:08:

Workers are awesome but you are right, working with them can be painful without the right tooling.

Personally I've written my own libraries for abstracting all this away and I'm having a blast working with workers now, maybe check them out:

- WorkTank [1]: This abstracts away the difference between browser workers and Node worker threads, it makes it easy to make worker pools, and it can transfer simple functions to a worker at runtime too.

- WorkTank loader: This abstracts away loading asynchronous function from a worker basically, you just add ".worker" to your file name and that file and all its dependencies are transparently moved to a worker (or worker pool), all the rest of the app (TS types for example) won't even notice anything happened, it just works, transparently.

You might want to check out the more popular "comlink" library too, although it didn't work for me for whatever reason when I tried it, and it doesn't support worker pools I believe.

[1]:

https://github.com/fabiospampinato/worktank

[2]:

https://github.com/fabiospampinato/worktank-loader

Jasper_ wrote at 2021-12-04 21:52:18:

A big complaint I have is that the built-in scheduling for workers assumes a very specific scenario. You can send messages to a single worker, but the message loop has to be operated by the browser. You also don't know the length of the message queue, even though the browser has it available, so you can't easily send a message to the worker with the least work queued. If a worker starts working on low-priority items, and you want to interject with a high-priority message, you also can't interrupt the worker, nor can the worker loop the messages on its own accord. You also can't re-sort the message queue, it's FIFO.

Basically, any sort of work scheduling that you would _like_ to do to queue high-priority messages, or have workers share a pool of work, is impossible to build with the onmessage-style of WebWorkers. It feels like an API made by someone who had read about threads once, rather than someone who's built an actual many-workers processing system like this. The event loop being unpumpable from user code feels like a giant kick in the face.

My workaround for this was to kill workers working on low-priority items, relaunch them, and resort the queue, but that all feels like a mess. Also, at some point a Chrome update made this strategy crash. I ended up just removing the WebWorker code; more trouble than it was worth for marginal code improvements.

I know that SAB and atomics add new low-level primitives to support this, but SAB is still poorly supported on widely deployed platforms. And you still need to do the serialization yourself.

csmpltn wrote at 2021-12-04 20:33:24:

> "The fact that creating a worker requires you to pass a module (file) name makes them extremely unergonomic to use."

Can anybody explain how did this ever make it into the official spec and the default implementation?

The browser will make a network call to fetch your Web Worker .js file on every instantiation of your Web Worker. Instantiations of the same Web Worker "module" aren't cached, so in a thread-pool scenario your browser would be fetching the same file over and over again.

What were the people designing this even thinking? This is so wasteful and frustrating. The API simply sucks.

dmitriid wrote at 2021-12-04 20:48:38:

> Can anybody explain how did this ever make it into the official spec and the default implementation?

This and service workers. Both are... weird, to say the least

csmpltn wrote at 2021-12-04 21:18:55:

When I'm seeing half-arsed APIs like this being introduced into so-called "modern" incarnations of the web, I start doubting the technical chops of the people involved and the process as a whole. Is it truly possible that there wasn't anyone with sufficient experience in building concurrency/multi-threading/parallelism APIs involved in building this?

Failing to account for the thread-pool scenario, as an example, is just mind boggling.

TAForObvReasons wrote at 2021-12-04 21:09:49:

"best guess": API decisions are opinionated, reflecting the designers' views of how users should write code.

For example, FileReader API is async in the main thread. There's an equivalent FileReaderSync for sync operations but that is only available in Web Workers. Why isn't FileReaderSync available on the main thread? Because the designers didn't want people to do things that would block the main thread.

The Web Worker argument probably went along the lines of "If we allow users to pass arbitrary function objects, they may try to do something that accesses local variables from the site where the worker is created. That is obviously an error, so we should design the API so that it can't happen. Creating a separate script creates a clear mental separation and avoids that class of error"

MuffinFlavored wrote at 2021-12-04 18:16:13:

My bigger complaint is... how you cleanly do synchronous RPC style calling? Even with clever async/await tricks, the serialization to and from input/output complicated structs seems so expensive.

The_rationalist wrote at 2021-12-04 20:11:50:

There are solutions to those problems:

https://github.com/GoogleChromeLabs/comlink

https://github.com/Bnaya/objectbuffer

rezmason wrote at 2021-12-04 17:46:54:

This past year I've live streamed the development of a small web app that benefits enormously from web workers:

https://github.com/Rezmason/wireworld-player

https://rezmason.github.io/wireworld-player

As a simulation, the main thread asks a web worker to update the world state and then render the new state. By default, this happens once per requestAnimationFrame.

But there's a "Turbo" mode (its UI toggle looks like a radioactive hazard symbol) that, when activated, tells the web worker to update as often as it can per requestAnimationFrame, speeding up the simulation around 72x while keeping the main thread 100% responsive.

The decoupling of the synchronous number crunching work from the main thread has also given me a place to experiment with much more resource intensive algorithms, like

https://jennyhasahat.github.io/hashlife.html

, which fills an enormous cache and can advance the sim by exponential time steps.

Modifying Hashlife to run in the main thread without freezing the app is possible, but it would have made the code much more complicated, run slower, and the other cores available to web workers would have gone unused.

_squared_ wrote at 2021-12-04 15:38:33:

If you're interested in leveraging web workers easily for repetitive compute-heavy tasks in a webapp, i've built a little library that takes care of launching and managing worker threads for you:

https://github.com/GitSquared/rinzler

wngr wrote at 2021-12-04 15:46:22:

Nice. Do you also support SharedArrayBuffers or does everything need to be serializable that is sent to/from WebWorkers?

wngr wrote at 2021-12-04 15:48:48:

By the way, I built something similar (?): A Rust library that mimics the API of the `futures-executor` crate, but each worker thread is a single WebWorker.

https://github.com/wngr/wasm-futures-executor

jcun4128 wrote at 2021-12-04 21:51:33:

I have used them for keeping things running in tabs that are not focused, not losing time sync (not critical though).

recursivedoubts wrote at 2021-12-04 15:50:53:

Hyperscript supports inline web worker definitions:

https://hyperscript.org/features/worker/

We tried to improve the API to this very cool feature.

lanecwagner wrote at 2021-12-04 15:00:25:

I built a little golang playground using web workers and WASM. It's nice to not hang the UI thread.

https://app.qvault.io/playground/go

richardanaya wrote at 2021-12-04 17:03:20:

A little known annoying detail of web workers is that passing large data to them via postMessage is incredibly slow. The browser has to convert your javascript object into some sort of internal binary format and it slows down the thread that's doing the sending.

reginaldo wrote at 2021-12-04 17:33:04:

Depends on the type of large data, but since 2011 there's "Transferable" [1], where objects like ArrayBuffer, MessagePort, and ImageBitmap can be transferred with a low overhead [2]. Now, if you're passing large arbitrary object graphs, you're out of luck.

[1]

https://developers.google.com/web/updates/2011/12/Transferab...

[2]

https://developer.mozilla.org/en-US/docs/Web/API/Worker/post...

MuffinFlavored wrote at 2021-12-04 18:18:41:

Any libraries that make this nicer/easier to do instead of rewriting the same thing over and over?

The_rationalist wrote at 2021-12-04 20:17:01:

https://github.com/Bnaya/objectbuffer

and comlink

KubikPixel wrote at 2021-12-03 08:48:55:

Web Workers makes it possible to run a script operation in a background thread separate from the main execution thread of a web application. The advantage of this is that laborious processing can be performed in a separate thread, allowing the main (usually the UI) thread to run without being blocked/slowed down.

kitsunesoba wrote at 2021-12-04 22:02:43:

That can be a great feature in some circumstances, but it also seems like it can undo much of the efficiency improvement browsers have accomplished by throttling and sleeping less used tabs. Do browsers implement any kind of leashing in terms of how much resources any particular site’s workers can use, frequency of running, length of runs, etc?

andybak wrote at 2021-12-04 13:19:10:

Any context? Is there a specific reason to post this now? Web workers aren't new to Firefox, are they?

k__ wrote at 2021-12-04 13:24:55:

There was a discussion in another thread about Apple stifling PWAs on iOS.

One point was, web workers aren't supported correctly.

SahAssar wrote at 2021-12-04 14:37:27:

IIRC normal workers are supported (dedicated workers) but shared workers are not. Service workers are "supported" but some features that are pretty critical (like push notifications) are not.

styfle wrote at 2021-12-04 14:51:56:

I believe this is the other thread:

https://news.ycombinator.com/item?id=29440457

rubyskills wrote at 2021-12-04 14:57:33:

Also I think push notifications are not supported.

pictur wrote at 2021-12-04 15:17:50:

is web worker really good for some costly processes? would it really be useful for a dashboard with sample instant data flow?

Klaster_1 wrote at 2021-12-04 16:16:18:

Depends what's on the dashboard and how the data has to be transformed for display. I've worked on several such dashboards and never had to use Web Workers to solve the performance issues, these were usually caused by UI rendering not optimized for loads of data. Now, the workers can help with rendering too, for example to render a chart off the main thread, but the libraries that support this usually hide the worker magic behind an API.

hotz wrote at 2021-12-04 16:02:51:

We used web workers to shift a lot of processing off of the main thread. They helped in keeping the UI feeling responsive and less laggy.

wackget wrote at 2021-12-04 22:12:05:

What kind of processing do you mean? Unless you're developing a game or a media player, I can't think of a good example where processing might delay the UI.