On Hubris and Humility: developing an OS for robustness in Rust [video]

Author: dralley

Score: 133

Comments: 63

Date: 2021-12-04 17:48:15

________________________________________________________________________________

brundolf wrote at 2021-12-04 21:54:40:

Unlike a signal-handler, it never alters the control-flow of the code you read on the page...Code you write in a task executes as written, or fails. Nothing in the system will arbitrarily alter your program's control-flow from what appears on the page...Now, when phrased like that, it seems weird to have to say it out loud, like, don't all programmers pretty much assume that the code that they write runs the way they wrote it? And to that I say yes, we absolutely do, and that assumption is often wrong, and leads to common classes of bugs, starting with data races and moving on up.

Wow, this is really insightful

panick21_ wrote at 2021-12-04 22:41:44:

It really puts the task into the hands of the programmer. The way the it works (if I understand correctly from the docks) the programmer of a task needs to make sure it checks for notifications when it want to and controls that process.

This gives the programmers some rope to hang themselves with, but on the other hand you don't get shot by the OS when you don't want to.

bcantrill wrote at 2021-12-04 23:19:09:

If you want a very concrete example to hang your hat on: making the I2C driver interrupt driven was a snap -- and importantly, can easily be done from a library, where the caller merely provides functions to call to enable a specified interrupt and to synchronously wait for interrupts.[0]

[0]

https://github.com/oxidecomputer/hubris/blob/01b5af3d54348ba...

elcritch wrote at 2021-12-05 03:07:24:

Looks interesting! Though I noticed the bit about the Konami code. The terminology is great, but alas it feels like I've run into as many i2c devices that break the i2c protocol as follow it. Ahem, I'd recommend avoiding Infineon I2C devices to keep your sanity. ;)

Reading the code, is it correct to say the ISR will timeout if the device doesn't respond? It's nice the driver returns the state of the bus locking in the error `Err(drv_i2c_api::ResponseCode::BusLocked);`. Makes me curious how you're doing that. I need to find timeout watch the video.

bcantrill wrote at 2021-12-05 04:53:29:

Noted on the Infineon devices! ;)

Yes, it will timeout if the device doesn't respond, thanks to the timeout logic in the STM32H7's I2C block.[0] If the block didn't have that logic, Hubris would still make it relatively easy to do that, though, as one would just perform the closed receive for the interrupt or a timer notification. (An example of a closed receive on a timer can be found at [1].)

[0]

https://github.com/oxidecomputer/hubris/blob/01b5af3d54348ba...

[1]

https://github.com/oxidecomputer/hubris/blob/01b5af3d54348ba...

brundolf wrote at 2021-12-04 22:51:18:

Yeah. Though importantly, all of these tasks are (as I understand it) being designed and developed cohesively, by a small team under a single roof. We're not talking about random userspace applications in a general-purpose OS; new tasks can't even be started and stopped at runtime, they're laid out at build time. I think that's why this explicit and cooperative model can work.

panick21_ wrote at 2021-12-04 22:59:50:

I would assume as the ecosystem grows 'standard' task will develop that will see a lot of reuse between different people using this OS.

But I agree critically you build deploy this as a complete system even if many of the tasks are not inhouse. You still test the complete system.

tgbugs wrote at 2021-12-04 21:10:24:

For those looking for more on "parse don't validate" mentioned in the talk here are links to the original, and two previous discussions.

https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

https://news.ycombinator.com/item?id=21476261

(original 2019)

https://news.ycombinator.com/item?id=27639890

(revisited 2021)

bch wrote at 2021-12-05 02:25:37:

“Parse don’t validate” reminds me of John Ousterhouts “Define errors out of existence.”[0]

[0]

https://youtu.be/bmSAYlu0NcY?t=1315

brundolf wrote at 2021-12-04 22:28:12:

+1, it's become a mantra for me in any code I write that deals with foreign data (which is most of them). Really reshapes your thinking.

cle wrote at 2021-12-05 01:06:58:

No rule of thumb should be a mantra. Parsing when you don’t need to induces unnecessary coupling, which can be quite undesirable in many scenarios. Like any technique, there are times when it’s worth it and times when it’s not.

clmay wrote at 2021-12-05 03:28:51:

Is "No rule of thumb should be a mantra" its own counterexample?

becuz99h wrote at 2021-12-05 03:57:13:

Only the Sith deal in absolutes

bcantrill wrote at 2021-12-04 20:14:29:

I am extraordinarily and unspeakably biased here, but I also say this as someone who didn't see the talk until I myself was an attendee at OSFC: Cliff's talk is as good as any I have ever seen in my career. It is dense with technical detail that itself reflects years of thinking and wisdom, but it remains well-paced with very helpful visuals; highly recommended viewing!

yjftsjthsd-h wrote at 2021-12-05 03:04:41:

Slightly a tangental, but you seem like a good person to ask and this seems like as good a place as any: Is the way Hubris handles failed tasks intentionally reminiscent of the way illumos Fault Management Architecture works, or is that a coincidence born of them solving the same task in a relatively obvious way?

bcantrill wrote at 2021-12-05 04:57:43:

That actually hadn't occurred to me, in part because the Hubris model _mainly_ deals with software failure, whereas FMA is _mainly_ dealing with hardware failures. But certainly they both have a shared zeitgeist of robustness in light of errors are faults! So I would say that commonalities aren't directly correlated, but also not totally unrelated. ;)

lostdog wrote at 2021-12-05 02:59:03:

Very interesting. This reminds me a ton of L4, with its message passing style. I'm curious if it was inspired by L4.

It's a great talk, and my only disappointment was that I was hoping to hear about what comes after. After watching the OS talk that Cantrill references [0], I really want to know the answer to "what does an OS that runs across a bunch of heterogeneous chips look like?" Having Hubris be just a single-node OS was a bit of a let down, so I will be staying tuned for the sequels.

[0]

https://www.youtube.com/watch?v=36myc8wQhLo

EDIT: Went to go read the docs and: "The operating system that originally made this point was L4, and Hubris’s IPC mechanism is directly inspired by L4’s"

bcantrill wrote at 2021-12-05 05:00:59:

Yes, definitely L3/L4 (and QNX) inspired! In terms of the Roscoe talk: I think it's a great talk because he talks about the problems, but I don't particular agree with the proposed solution. ;) In particular, I don't think it makes sense to have single operating systems spanning these wildly disparate, heterogeneous cores; to me, it makes much more sense for each to have its own (tiny) operating system -- e.g., Hubris.

lostdog wrote at 2021-12-05 19:03:48:

Thanks! I guess I'll check back in 5 years and see who was right.

ysleepy wrote at 2021-12-04 22:51:04:

Love the technical insights, but I comment here to praise the presentation style.

My first judgement was that it seemed a bit rehearsed, but I ended up really enjoying the style. It felt very respectful of my time as a viewer, very concise and well prepared. I'm sure a lot of time went into that and I appreciate it.

So thanks, for both an awesome application of rust and an amazing presentation.

steveklabnik wrote at 2021-12-04 19:21:57:

This is the video referred to in the post of this thread from earlier:

https://news.ycombinator.com/item?id=29390751

ncmncm wrote at 2021-12-05 07:07:00:

This is almost all astonishingly sensible. If static provisioning works at all for the application, it is obviously better.

If you did have a need to have dynamic tasks -- e.g., drivers for an unbounded set of USB peripherals -- one of the static tasks could be the executive for those. They would still be unable to interfere with the other, static tasks.

I have to agree with the rabbi that the Rust evangelism detracts from the talk. E.g., the runtime borrow checking example, which does not rely on any Rust mechanism, would work as well with any language.

sujayakar wrote at 2021-12-04 23:58:56:

I really enjoyed this talk.

How does a system with completely static resource allocation accommodate cases where the underlying hardware is actually dynamic? For example, consider hot-plugging USB devices or storage media.

Would there be a fixed "maximum number of USB devices," with resources reserved at compile time for the maximum? If so, does this preclude high resource utilization in the common case, where the user isn't hitting these limits?

mkeeter wrote at 2021-12-05 02:07:27:

In the past, I've worked on real-time systems which deliberately run in "worst-case" load all the time, basically to avoid your second concern: if high resource utilization is possible, then you have to design for it, and this is an easy way to make sure the system doesn't fall over.

For example, if you're building a system that does controls multiple motors, you'd naively do motion-planning calculations for only the motors that are actively running. If instead you do these calculations for _every motor all the time_, even the ones that aren't moving, you'll have a worst-case performance profile; if it still works, then you can be more confident about the system.

(I work at Oxide, but only for about a month, so this is more past experience than Hubris-specific)

alblue wrote at 2021-12-04 23:23:46:

The talk itself is great, but the background to the slides and speakers is constantly moving from side to side and is pretty distracting. I would have preferred to see the video of the slide deck itself without any extraneous background or the speaker’s head.

axegon_ wrote at 2021-12-05 00:13:05:

Feels weird to say it but... Time to give hubris a go.

wwarner wrote at 2021-12-04 22:21:06:

Love the simplicity , I wonder what Hubris would look like on a multicore cpu

xondono wrote at 2021-12-04 18:44:32:

Great, I was waiting for a recording to be posted on youtube or something.

shmerl wrote at 2021-12-05 05:46:31:

The talk mentions Oxide Computers. Just saw their site, I like the design there:

https://oxide.computer

CyberRabbi wrote at 2021-12-04 21:43:44:

This talk would be more interesting if it weren’t thinly veiled Rust evangelism. I lived through the time when XML and Java evangelism was considered technical content. The older I get, the less substance I see in evangelism of any sort.

Centering language choice as the causal factor for any engineering project signals poor discipline in engineering.

zbentley wrote at 2021-12-05 02:56:44:

Ordinarily I'd agree with you. While I enjoy Rust, rather a lot of projects using it I've seen place the language chosen farther forward in the value proposition than the capabilities or requirements of the project.

But this seems different. The Oxide folks need a tool that does specifically what Rust does in this space, and (at least going from the talk and related announcement blog post) did a substantial amount of research to try to avoid re-inventing the wheel before choosing Rust. Compared to other rewrite-it-in-Rust exercises, this one seems to evince mechanical sympathy with several of Rust's strengths, to the point that I doubt another language/platform would be even in the ballpark of a good fit for Hubris.

And that's true of your other examples as well: sure, a lot of Java evangelism is hot air driven by process-obsessed leadership sold on false promises of model-your-org-chart-in-code interoperability, yielding self-satirizing OO soup. But some projects have the right combination of (for example) non-synchronously-communicating teams, boxes-and-lines design processes, unskilled programmers, and safety requirements. For those projects, Java is the right choice; choosing a tool that doesn't "resonate" with those requirements as well as possible would be unwise.

Similarly, if I needed to work on a data-modeling-intensive project whose requirements were driven by lots of mechanically specified contracts (schemas) and was worked on by non-programmers who were comfortable with data description but not imperative logic, XML might be the right choice. I personally hope that's rarely the case, but the point stands.

staticassertion wrote at 2021-12-04 22:38:33:

They explicitly justify the lack of solutions for memory safety in their space - both in terms of hardware and software - and why they are building their product using specific tools. They even note that this may seem like a strange choice (as opposed to using something off the shelf) but that they were willing and able to invest in these tools, specifically that they were going to build pretty much everything from scratch.

They even call the project 'Hubris' as a joke about the ambition.

Further, they discuss how borrow checking as a model lends itself to the task architecture. It's obviously very relevant.

It seems silly to call this evangelism as opposed to a very self-aware deep dive into their choices.

CyberRabbi wrote at 2021-12-04 23:33:20:

The abstract software techniques they used to achieve certain properties is more substantial and generally applicable than the specific language they used to instantiate those properties. Citing the language as the causal factor in choosing those techniques and not their requirements is unnecessary evangelism.

staticassertion wrote at 2021-12-04 23:44:27:

I don't get your point. They had goals and chose technologies and approaches to achieve those goals. They cite their task model - would you call that some sort of 'task evangelism'? They cite that system calls in their OS are synchronous, and how that enables some optimizations that work well with Rust's borrow checker.

All of this works together and feels relevant.

CyberRabbi wrote at 2021-12-05 01:43:25:

It’s a general talk on OS design centering Rust as a general solution in that space when Rust is just a language not a specific OS design concept or abstraction. The concepts attributed to Rust in this talk can indeed be leveraged in any language (with varying difficulty).

The title of the talk says it all:

        On Hubris and Humility: developing an OS for robustness in Rust

Which seems silly to me when this title works just as well:

        On Hubris and Humility: developing an OS for robustness

Now imagine if the title were:

        On Hubris and Humility: developing an OS for robustness using XML

Now I’m sure it’s possible to use XML to develop a robust OS but the usage of XML specifically is less relevant than the techniques employed using XML. In that case the reference to XML seems like evangelism. I’m sure XML evangelism has an audience but it comes across to me as less substantive (and less interesting) than a talk centered around general principles that were successfully leveraged in OS design. It also makes it hard to tell the extent to which the usage of XML specifically to achieve the desired requirements was necessary.

A reasonable reader would understand that using the title to make my point is only an example of the content that runs throughout the talk which is similarly oriented around Rust.

adgjlsfhk1 wrote at 2021-12-05 05:12:24:

I think the talk was titled as it was to emphasize that Rust here wasn't a tool that they chose to use to develop a robust OS, but a tool without which, developing a robust OS is impossible. Without a language that enforces security (like C), it is demonstrably impossible to write a robust OS.

ncmncm wrote at 2021-12-05 06:51:10:

L4 is coded in C. L4 is proven robust.

QED: false.

mkj wrote at 2021-12-05 07:32:57:

The proven robustness isn't C, its

Isabelle 92.9%

Standard ML 3.0%

Haskell 1.5%

C 0.8%

TeX 0.7%

Python 0.5%

Other 0.6%

https://github.com/seL4/l4v

ncmncm wrote at 2021-12-05 07:47:57:

You make no sense. The proof is obviously not coded in C because C is not a language you can write proofs in. But all of the instructions executed when running L4 were emitted by a C compiler.

(This is not to suggest that I would ever advise coding anything whatsoever in C.)

Unless... maybe you are saying all the C code in L4 was not actually coded by anybody, but was rather emitted by programs written in these other languages, and L4 is properly a program coded in those languages, with just a transitory C representation on the way to machine code?

kobebrookskC3 wrote at 2021-12-05 09:36:49:

i wouldn't mind C if every program written in it was written to the standard of seL4, but alas, that isn't the case, and usually it's not even close. i'm also quite sure that getting even close to it would make you want to use another language instead.

staticassertion wrote at 2021-12-05 06:01:57:

But aspects of their OS are clearly intrinsically related to Rust. Like system calls that you can borrow across.

CyberRabbi wrote at 2021-12-05 06:17:12:

They’re intrinsically related to having the concept of “borrowing” in some form. Rust is an implementation detail / preference.

mlindner wrote at 2021-12-05 07:28:39:

Granted, but do you have another example of a language that does that? I'm not aware of any. If there were then I would think you argument valid.

CyberRabbi wrote at 2021-12-05 08:25:05:

You can model borrow semantics relatively easily in both LISP and C++ (and likely countless other contemporary languages) with varying static/dynamic components, it just takes a bit of infrastructure and perhaps some static or dynamic indirection. It’s not “built in” like it is in Rust. The Chromium team did this in C++ as a proof of concept (with notable caveats) somewhat recently:

https://docs.google.com/document/d/e/2PACX-1vSt2VB1zQAJ6JDMa...

Additionally even though this system uses Rust, there is quite a bit of custom support code to coax the system into building for a particular memory-mapping. While you are doing that I don’t see why it wouldn’t be roughly the same amount of infrastructure work in another language. Again, may just require the user to interact with the protected resources through an indirect API instead of language-native constructs but that is more of an ergonomics issue than a feasibility one.

staticassertion wrote at 2021-12-05 18:52:54:

But... why? Why would they do that? Like why would they go so out of their way not to mention Rust, to the extent that they would discuss borrowing in LISP or C++?

That's insanity.

CyberRabbi wrote at 2021-12-05 21:01:20:

The comment above mine asked for an example. There is a reasonable interpretation of my comment that doesn’t seem like insanity. We can meet there.

bcantrill wrote at 2021-12-05 18:57:10:

At this point, I have lost track of your argument. Do you object to Hubris being written in Rust or do you object to Cliff explaining why Hubris is written in Rust?

CyberRabbi wrote at 2021-12-05 20:58:38:

I don’t object to Hubris being written in Rust, that would be silly. My argument is really simply my opinion that the talk would have been more interesting if it were less focused on how Rust was used to make Hubris and more focused on how specific high level software techniques and concepts were used to make Hubris.

I don’t really need a long winded justification for why this is written in Rust. Everyone knows that ultimately comes down to a mix of personal preference and tool availability. I can fill in the details myself how the high level software techniques presented would map to Rust constructs.

bcantrill wrote at 2021-12-05 21:59:28:

I'm not sure what you can really accuse of being long-winded here, but it's clear that the presentation hits a nerve for you, so probably best left at that.

mattgreenrocks wrote at 2021-12-05 01:26:21:

If you’re looking for technical content then ignore the “Rust evangelism” and talk about the benefits and trade offs of Hubris’ various design choices.

There’s plenty of them to go around. In other words, start the discussion you want to have rather than complaining it wasn’t presented exactly as you’d prefer.

atoav wrote at 2021-12-04 23:09:27:

If civil engineers used a system to calculate bridges that is known to _frequently_ collapse bridges it is _good engineering_ to change to a system that prevents this from happening.

There was a time qhen civil engineers thought they were geniuses and their own genius would be enough to prevent such mistakes. Guess what: then bridges collapse.

In programming we still have people who think they can prevent stupid mistake by shere genius and/or willpower, despite clear evidence to the opposite. Get over yourself and do the right thing.

CyberRabbi wrote at 2021-12-05 03:34:08:

This comment is refuting a point I did not make. By all means, please continue to use Rust in the domains in which you think it excels but no matter how much we agree that there exists domains in which Rust excels it remains true that language evangelism is not very interesting relative to higher level and more general purpose engineering concepts and principles. Well, that is unless you actually prefer evangelism to more substantive technical content.

atoav wrote at 2021-12-05 10:10:37:

> Centering language choice as the causal factor for any engineering project signals poor discipline in engineering.

This is the point I argued against. If the way you do mathematic calculations in your bridge design repeatedly produces fatal flaws, it is indeed poor engineering if you keep that system.

Using tooling that allows you to e.g. produce buffer overflows and blaming a "poor discipline in engineering" is the kind of quote you can read from the engineers that built the biggest collapsing civil structures in history.

The best design of a system is the simplest design that gets you there, while preventing all errors that you can make on the way. And a programming language is always part of that system. This is why comment is also not about Rust it is more about recognizing languages as part of the engineering you are doing.

CyberRabbi wrote at 2021-12-05 10:56:21:

Language choice certainly is correlated with certain aspects of the outcome of a project but I continue to hold that it is not a causal factor. Causal factors would be the quality of the team involved, the resources they were allotted, the requirements that were imposed on them, in that order.

E.g. you cannot expect an incompetent and resource-starved team with a poorly specified project to produce a good outcome simply because they were forced to work in Rust (or any language).

Language choice is incidental, not causal. In my experience, anyone who has operated otherwise has failed every single time.

adgjlsfhk1 wrote at 2021-12-05 13:35:45:

languages can't be a casual factor of success, but can be a casual factor of failure. if you write an OS in JavaScript, that will cause it to be slow. if you write an OS in C, that will cause it to have bigger overflows and use after free bugs.

CyberRabbi wrote at 2021-12-05 21:09:26:

> if you write an OS in JavaScript, that will cause it to be slow. if you write an OS in C, that will cause it to have bigger overflows and use after free bugs

A good team will use these languages in that domain in ways that these effects are limited. There is an appropriate place for JavaScript and C.

adgjlsfhk1 wrote at 2021-12-06 00:57:49:

And yet

https://www.cvedetails.com/vendor/33/Linux.html

CyberRabbi wrote at 2021-12-06 01:15:07:

And yet what? What point are you refuting?

avgcorrection wrote at 2021-12-05 10:50:10:

I don’t find comments that dismiss one type of technology over another without any reaons to be very interesting. Apparently there are two kinds of technology that are discussed here: language technology and OS technology. Or at least that’s the two that they want to focus on. Yet you dismiss one of them as pretty much irrelevant.[1] Yet you can do the same for the OS technology, if this was about something that was implemented on OS Y: “It’s not interesting to me that this was developed on OS Y. The abstract techniques that they used are more interesting to me than the specific OS they used.” This would effectively communicate that either (1) the OS is irrelevant, or (2) the OS is uninteresting to you. (1) is doubtful (then threads like this would be irrelevant) and (2) is just an expression of your own proclivities and interests.

It would seem obvious that developing certain things on certain OSes have tradeoffs. And likewise for programming languages. Unless you want to be consistent and subscribe to the ridiculously relativist idea that everything is the same; it’s just a matter of what you do with them.

Complaining about evangelism without offering any kind of criticism of what is supposedly being evangelized betrays just as much bias as the supposed missionaries.

[1] “The abstract software techniques they used to achieve certain properties is more substantial and generally applicable than the specific language they used to instantiate those properties”

CyberRabbi wrote at 2021-12-05 21:04:34:

> Yet you can do the same for the OS technology, if this was about something that was implemented on OS Y: “It’s not interesting to me that this was developed on OS Y. The abstract techniques that they used are more interesting to me than the specific OS they used.”

I would absolutely say that. OS evangelism is as uninteresting as language evangelism, at least in comparison to a discussion about abstract software techniques / concepts and their implications.

> Complaining about evangelism without offering any kind of criticism of what is supposedly being evangelized betrays just as much bias as the supposed missionaries.

I did offer clear and actionable criticism. Evangelism is less interesting than a talk based in general first principles. That’s my opinion and I think it’s a widely held one. I never accused the presenters of bias and if they are biased, I have absolutely no problem with that because all humans have biases. For the record, I am not biased against Rust but perhaps I am not sufficiently biased in favor of Rust to the extent that Rust evangelism would be interesting to me.

adgjlsfhk1 wrote at 2021-12-04 22:29:07:

can you name a single widely used C library that hasn't had a buffer overflow vulnerability?

FuckButtons wrote at 2021-12-04 21:53:13:

Given that the raison d'être of oxide is to use rust to build their product I don’t know that this is fair criticism, unless you want to write their whole enterprise off for the same reason.

panick21_ wrote at 2021-12-04 22:43:56:

I don't think that's their 'raison d'être', their 'raison d'être' is making a better computer.

CyberRabbi wrote at 2021-12-04 23:28:29:

> Given that the raison d'être of oxide is to use rust to build their product

Thanks for proving my point!