💾 Archived View for dioskouroi.xyz › thread › 29407636 captured on 2021-12-05 at 23:47:19. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
A title that actually describes the post, mostly paraphrasing the first paragraph:
_Reasons why this buffer overflow wasn't caught earlier despite doing all the right things_
And then to give those reasons:
- "each component is fuzzed independently" ... "This fuzzer might have produced a SECKEYPublicKey that could have reached the vulnerable code, but as the result was never used to verify a signature, the bug could never be discovered."
- "There is an arbitrary limit of 10000 bytes placed on fuzzed input. There is no such limit within NSS; many structures can exceed this size. This vulnerability demonstrates that errors happen at extremes"
- "combined [fuzzer] coverage metrics [...]. This data proved misleading, as the vulnerable code is fuzzed extensively but by fuzzers that could not possibly generate a relevant input."
The conclusion is, of course, to fix those problems if your code base also has them, but also "even extremely well-maintained C/C++ can have fatal, trivial mistakes".
The whole post is a giant blinking red sign that says (or should say) "Fuzzing is a horribly ineffective workaround for a treacherous language."
No offense to the many bright and capable people who have worked hard on the C/C++ language, tools, compilers, libraries, kernels, etc over the years, but we will someday look back on it as asbestos and wonder why we kept at it for _so damn long_.
No issue with the first sentence of your message at all, but...
> No offense to the many bright and capable people who have worked hard on the C/C++ language, tools, compilers, libraries, kernels, etc over the years, but we will someday look back on it as asbestos and wonder why we kept at it for so damn long.
We won't wonder at all. We will understand that those people are the ONLY ones that stepped up to the task over 50 years to write this kind of software, organize standards bodies for their languages and platforms, get their software packaged as part of mainstream operating systems and out into the world, deal with patches from users, and help millions of other people make a living, enable the internet to happen, etc.
We will wonder why with all the millions of lines of C/C++ reference code available to be perused and then rewritten in Rust, Pascal, C#, Zig or Nim, and the vociferousness of their advocates, why that didn't happen in a reasonable timeframe.
We will wonder why all the Lisp and Haskell programmers who sneer down their nose at working C/C++ programmers in forums like this on a daily basis, didn't get off their asses with their One True Language (TM) and come to the worlds rescue.
The answer will be: these people aren't doers - they are talkers. It's one thing to get a toy proof of concept OS in your favourite language of choice that supports like 5 drivers. It's another thing to contribute to and build an ecosystem depended on by millions of developers daily. C/C++ people may not always be the sharpest tools in the shed, and they may be a dime a dozen. But they know how to organize themselves in loose groups of more than just a few developers, and work with other people.
We may have issues with the quality of what they ship on a frequent basis, but at least they ship instead of posting endless comments on internet forums about how it's all the others peoples fault.
> We will wonder why with all the millions of lines of C/C++ reference code available to be perused and then rewritten in Rust, Pascal, C#, Zig or Nim, and the vociferousness of their advocates, why that didn't happen in a reasonable timeframe.
Easy, every single time .NET team does some advances into that direction, it gets sabotaged by WinDev and their C++ love.
XNA vs DirectXTK, .NET vs COM/WinRT,...
Windows could have turned into something like Android, with managed userspace and a very constrained native layer for restricted use cases, naturally WinDev cannot let that ever happen.
That was simply because the alternatives to Win32 had various serious regressions, be it usability, features, bloat, or other things that are important in one way or other to the people who stick with Win32.
More secure languages and APIs can only win if they are both easier to use and offer the same features.
As proven by mobile OSes, it is not technical features that win the game, rather railing the developers into the future, regardless of their opinion.
The problem at Microsoft is that what Office and WinDev want, drives the whole business no matter what, those "serious regressions, be it usability, features, bloat, or other things" get fixed, if one cares enough to make it happen.
Speaking which, Office now is an heavy user of JavaScript, less perfomant than .NET, because it needs to be in the cloud.
Whereas Windows, while king of the desktop, is undeniable that it lost the backend to POSIX clones even if we consider MS shops with IIS, where languages like Java and Go dominate.
Also apparently PWAs are cool now as fight against ChromeOS, which from performance point of view are much worse than those former Win32 alternatives.
Before I ditched Windows a couple of years ago, I was able to experience first hand how bloated and slow was the software that Microsoft rewrote in C#, so I kind of understand why such rewrites were being sabotaged.
If by some minor miracle a C# or Java GUI app is not slow, then it will use a ton of memory. A whole OS of such apps would be a nightmare.
The new modern/UWP apps in Windows 10, like the calculator and start menu, are written in C++, aren't they? They manage to be horrendously slow and bloated without C#, so maybe C# isn't the problem.
They are, UWP is basically another take on what was being discussed before .NET as COM evolution, and Longhorn's failure has given them the wind to pursue it as Windows foundation.
So since Vista all major Windows APIs are COM based, not always surfaced to .NET, and we are expected to just go through books like ".NET and COM: The Complete Interoperability Guide" and do the needful ourselves.
WinRT as introduced in Windows 8 was then the full reboot, with .NET metadata taking over TLB files (COM type libraries) and introducing a new base interface IInspectable.
So the long term vision pursued by Synofsky, was that .NET Native and C++/CX would take over with COM fully replacing .NET.
Naturally it all failed down during the borked execution, and now you still have devs pushing for C++/WinRT, the C++/CX replacement, with ATL like tooling as "modern". Maybe it is modern given the tools that they are used at WinDev, I guess.
Ars even has a nice article how this reboot took place,
https://arstechnica.com/features/2012/10/windows-8-and-winrt...
It's quite simple really. Bad programmers (or good programmers in bad environments) are able to write slow code in any language, and the current Windows Desktop team seems to be an example of that.
When people write worse C++ code than naive C# performs, it is really bad.
You literally mention Java GUI apps in response to a post that calls out Android, a mobile OS and application ecosystem implemented in _Java_. The languages are not the issue.
It took Google more than a decade of trying and a VM rewrite to get _close_ to the perceived performance of iOS.
And anything performance intensive was done in the NDK anyway.
All of this because of Java.
Indeed, the big difference is that Google was willing to put money on the table to make it work.
Those improvements are exactly what WinDev sabotaged in regards to Windows.
Also in case you have forgotten, Objective-C and Swift are also managed languages, check chapter 5 of Garbage Collection Handbook, or any other CS reference in automatic memory management algorithms.
Android is not Java.
As proven by Midori (used to power Asian bing while in development), Android, ChromeOS, iOS/iPad OS, among others, it is possible when everyone works together for a common goal instead of sabotaging others work.
I don't think this paints an accurate picture. This sort of presupposes that C/C++ were a first iteration, but that's not true.
People chose C and C++ for bad reasons, even with historic context. Languages used to be more correct. Algol2 used to validate array subscribting and not have null etc. It was C programmers who pushed languages to be worse because it made it easier to write code.
They very much created this problem and we regressed because of their choices. Things could have been a lot better.
I think it probably makes sense that the languages it's easiest to write code in would dominate in an ecosystem where incentives are encouraging software to eat the world.
The asbestos comparison is pretty apt. Possibly also apt would be steel plants. Yes, steel manufacture is extremely hard on the local environment. Doesn't matter. The world's in the middle of an industrial revolution, and we need steel now. Now now now. We don't have time to wait for the technology to catch up with environmentally-minimized-impact manufacturing. Just condemn a couple cities to twilight noon-times and get on with it.
I don't know if things could have been a lot better _and_ we could be having this conversation on a decentralized network of anonymous machines like we are now. We had to conscript a _lot_ of developers to get here, and make very complex software run on a _lot_ of low-power devices. Erlang, as an example of an alternative, wasn't even open-source unil 1998. LISP machines paused for sixty seconds to do a GC cycle in an era contemporary with machines running executables written in C just... Not requiring that.
Given the factors involved (private ownership of language resources, complex runtime putting minimum requirements on hardware, etc.), C and C++ might be the only two paths of least resistance for a critical era of the software industry.
> Given the factors involved (private ownership of language resources, complex runtime putting minimum requirements on hardware, etc.), C and C++ might be the only two paths of least resistance for a critical era of the software industry.
Maybe! We'll never know. I just don't think that rewriting history to be "C people were _doers_" is giving any meaningful context to language development, or how much was getting done in languages that were significantly safer.
I agree. The thing of interest here, I think, is the observation that language popularity is path-specific. JavaScript is just the worst, but everyone who does browser work has to know it because it _happened_ to be the language clapped together to demo some nifty ideas for extending HTML and then those ideas stuck. Alternatives to JavaScript existed but were either privately-owned and therefore not trusted (vbscript) or too complicated to get off the ground (at some point, Mozilla had a fascinating proposal for HTML pages selecting features of the rendering engine and possibly even declaring that the whole page should be interpreted via user-definable code modules; had this won the day, you could write you web page in LaTeX for all anybody cared, but no-one was going to invest the time to write a dozen rendering agents when the HTML one was already expensive to develop and keep stable).
"C people were doers" is probably reductive, but C and C++ had an alchemy of factors that made something like, say, LISP or Erlang not be where they are.
If I had to hazard a guess, I'd say the most dominant factors are the language runtime's portability, the language's features allowing for relatively low-level access to the backing hardware (i.e. the thinness of the abstraction between C memory access and microprocessor read-write instructions means things like memory maps or "magic addresses" that trigger hardware effects on mutation could just be handed to user code), and the compatibility of the runtime with what came to be (for a _ton_ of reasons) a dominant model of computation in the commercial and consumer space: the x86 architecture.
I would guess that a huge part of C's popularity is UNIX. I think a number of other languages could have easily competed in the other areas ie: thin abstraction.
> People chose C and C++ for bad reasons, even with historic context.
Quite a bit of the original Unix's were written in assembler. There were good reasons for that - memory and CPU cycles were very scarce back then. I don't know if you have written assembler, but it takes about 5..10 times more lines of code than any high level language, and it isn't the nicest thing to read and isn't exactly portable between architectures.
C is an assembler without those problems, while retaining the speed. Admittedly t achieves that where Alogo60 didn't that by dropping minor things like bounds checking. But to give you a feel for the tradeoff, I could tell you what instructions the compiler would emit for most lines of C code. Best of luck doing that with an Alogo60 thunk.
C hit a sweet spot in other words and Algo60 didn't. C++ was originally called "C with classes". I was there at the time. The magic was Stroustrup's vtables, which gave C programmers OOP (which was all the rage at the time after Simula made it popular, and to be fair was a huge improvement on the abstractions C provided). It was that almost the same speed as C and to achieve it came with all the same disadvantages. But to Stroustrup's credit he did what he could - added stricter type checking (like function prototypes). It was an reasonable improvement. Many of Stroustrup's ideas were back ported to C, and C is a much better language for it.
Then Stroustrup added templates. In hindsight perhaps that's where the rot set in, but templates gave us zero cost abstractions that were at least as type safe as the original. It was an impressive achievement. But then there were hints of the darkness that was to fall upon us, with library after library using templates in novel very useful ways. Stdio was made typesafe. In so many ways.
C++ kept organically growing like that, into the nightmare it is today. I gather no C++ shop uses all of C++ now - all use some subset they can cope with. I lost interest ages ago, as did a lot of other people. They rebelled with against the complexity and lack of safety with things like Java. Microsoft rebelled against Java with C#. Those did solve the memory, type safety footguns and complexity of C++, but at the cost of the one thing about C that made it so attractive - low run time overhead and predictability.
In the mean time the ivory towers played with things like ML and Haskell, which with the benefit of hindsight was truly awesome work but unfortunately for those of us who work close to the metal are less practical than Java, C# and Javascript.
And then out of the swamp rose Rust, a language that has the speed of C and the type safety of ML and Haskell. Well sort of.
I still remember when I first read the Rust spec and thought they were kidding themselves - it promised so much and sounded so improbable it looked like a crypto ponzi scheme. Type inference,memory and thread saftey and no GC, what was this heresy? Then I dabbled, wrote a few Rust programs, fought the borrow checker until we make an accommodation, read the standard library and saw all those unsafes - and realised it was real. It was not perfect, but all those compromises to make it work is what reality looks like.
Where were we. Oh yes:
> Things could have been a lot better.
Things _are_ a lot better my friend. It just took us longer than expected to get there.
That wasn't because people made bad choices. They made a whole pile of small choices that solved their particular problem. It pains me to say this now, but after dealing with C++ for years anything would look good, and Java 1.0 did look very good to me. Admittedly it only looked good for a while, but I would never say the people who picked Java up at the time made a bad choice, just like I would never say the people who chose C over assembler made a bad choice.
I don't think this is fair. First-mover advantage is absolutely a thing at the ecosystem level.
There were plenty of first movers before UNIX and C, and later C++. They didn't stick. Now, step up to the plate and fix it.
It is hard to stick against free beer OS, available with source code.
Had UNIX been a comercial endevour and it would have failed, unfortunely it wasn't the case, and now C plagues the industry.
> We will understand that those people are the ONLY ones that stepped up to the task over 50 years to write this kind of software
Not to take away from what they've shipped, but it bears repeating that they are also, generally, profligate writers of severe security holes. Some of the most expert programmers in the world have created mighty CVEs.
There's a school of thought that it's a poor craftsman who blames his tools, and your post walks close to that line, but when the tool is full of footguns even the experts can't avoid that cause tremendous damage to bystanders, then perhaps these people who _know how to organize themselves in loose groups of more than just a few developers_ need to prioritize helping us all move to better tools.
Other segments of the programming population embrace a wild west culture of change, for good or bad. If the JS world can insist on a new frontend stack every few years, why can't managers and lead developers put their foot down and steadily back C/C++ into the corner where it belongs, where literally no other tool will do, and where it's doing the absolute minimum possible?
Since they're so good at shipping, ship us better tools! It may not seem a fair criticism -- it doesn't sound fair to me when I say it.
But I'll give you the answer to my question: They don't want to move away from C/C++, because they don't see a problem. The C/C++ devs I've chatted with on this topic have mostly been convinced they know how to write secure code, that they are capable of taming the distinctive complexity that spirals out of an expanding C++ codebase. Some of them might even be right! They wear their capabilities with pride. To admit that their hard-earned skills were invested in a tool that bears replacing is not something they can consider.
To change, first you have to admit you have a problem. And at least among the C/C++ devs I've talked to about it, few have professed a problem. Combined with the first mover advantage, I fear we'll be having this same debate 50 years from now.
Agree there is a problem. Now step up to the plate, and fix it.
I like this man. I think the only thing we'll look back on is, considering how those who came before us delivered so much with so little (compute power, parallism, safety, tooling), how we fell so far from grace and fuck things right into the ground with slow, unsafe, useless piles of garbage that don't even perform their primary duties well other than collect data.
This is great. We give too much merit to armchair experts instead of to people who are naturally too busy making shit happen.
> C/C++ people may not always be the sharpest tools in the shed, and they may be a dime a dozen.
I feel like you're not entirely serious about it, but i'm not sure about the premise of that statement.
To me, it feels like C/C++ has a barrier of entry that's significantly higher than that of other languages: e.g. everything from JavaScript and Python to those like .NET and Java.
Wouldn't that mean that it'd be easier to learn and be productive in the latter and therefore those devs would be more abundant? Whereas to be capable at C++, you'd need a bit more discipline/patience/whatever and therefore there would be fewer developers?
I see the same in regards to Rust - many talk about it, those who are used to low level development often look into it or adopt it, however those that are used to the languages with a higher level of abstraction, don't venture into actually using it quite as often.
For example, in the JetBrains survey of 2021, about 6% of people used Rust in the past 12 months, but 7% of people are planning of adopting it:
https://www.jetbrains.com/lp/devecosystem-2021/
Contrast that with C#, which was used by 21% and had the interest of 4% and Java, which was used by 49% and had the interest of 4%. There are probably better data points about this, i just found this vaguely interesting in regards to that argument. Of course, a counterpoint could be that Rust is new, but if it's so popular, then why people aren't adopting it more quickly?
Summary: regardless of the language being discussed (C/C++/Rust/...), i feel that there will always be fewer developers in the languages with lower level of abstraction, since they feel inherently harder.
> To me, it feels like C/C++ has a barrier of entry that's significantly higher than that of other languages: e.g. everything from JavaScript and Python to those like .NET and Java.
Ex C/C++ dev here. I deliberately moved away from working with them in my career (first 10 or so years doing it, last 10 years doing anything but). Not just because "C++ is hard" but because "C++ is intrinsically _unsafe_ in multiple, treacherous ways".
I still have tons of respect for programmers who choose to stay working with C++, but I decided long ago it wasn't for me.
> C/C++ has a barrier of entry that's significantly higher than that of other languages
Especially in this context it's important to mention that C and C++ are actually very different languages ;)
No, its actually absolutely irrelevant to the argument.
_but i'm not sure about the premise of that statement._
I think it might be a generational thing. I know a lot of 'mediocre' programmers in their 40s and 50s who learnt C++ as their first language and never bothered to really learn anything else (possibly some C#). I'm sure none of them have heard of the JetBrains Survey.
> We will understand that those people are the ONLY ones that stepped up to the task over 50 years to write this kind of software, organize standards bodies for their languages and platforms, get their software packaged as part of mainstream operating systems and out into the world, deal with patches from users, and help millions of other people make a living, enable the internet to happen, etc.
While my comment was written to be respectful, yours clearly wasn't. I'll politely point you Niklaus Wirth's work on Oberon as just one example of a giant blind spot in the popular consciousness. Once you learn a bit more about the many, many other systems that were built over the years, you can stop spreading a false narrative of heroism and supremacy.
> The answer will be: these people aren't doers - they are talkers.
Ok, now I am thoroughly done with your comment. Niklaus Wirth was anything but a "talker". Look what was built with Oberon and tell us more about how he was a "talker".
It's OK to not know things. But striking out in profound ignorance is completely unnecessary.
This comment misses the point so far as to be completely unnecessary.
It pretty much addresses your core thesis and shows it to be based on a fictional retelling of history.
Which in itself was a distraction from my core point that C is a hazardous material. Instead of having that conversation, they wanted to drag it into mudslinging, making it about good (heroic! doer) and bad (lazy! talker) people, and profoundly misunderstood history.
We don't vilify people who invented asbestos. We don't vilify workers who installed asbestos. They didn't know better. When we did start to know better, there was a period of denial and even willful ignorance, but eventually the evidence became overwhelming. But now, we know about asbestos, and we're stuck with the consequences to exactly the extent we don't engage in careful cleanups. With the dangers of profoundly unsafe code, we haven't even gotten everyone on the same page yet. It's a long process.
As far talking, I had hoped we could avoid the immature shouty part, as I explicitly acknowledged the many people who put in a lot of work to make C/C++ do things and run fast.
For the doing, I guess I'll be getting back to that. It's far more rewarding than these discussions, frankly.
Frankly it's a bit hard to untangle your point from the way you write. Something to consider.
> and profoundly misunderstood history.
There seems to be a good deal of that.
> They didn't know better.
For what it's worth, we always knew better with C and C++. As I mentioned elsewhere languages were often designed to be safer but as C became more popular the design mistakes of C were pushed on other languages. You can look at FORTRAN, Algol, etc, to see that compiler writers took measures to ensure certain bug classes weren't possible, and those languages were overtaken by those who wanted their code to just compile, correct or not.
Your post definitely comes across as very preachy, so it's odd that you say you were trying to "avoid the immature shouty part" - that's what the parent was basically criticizing you for.
And then to end it on the implication, once again, that some people are "doers" while the rest of us are just posting on HN (the same thing you are doing????) kinda exemplifies it.
I don't think that's actually true. I think it's rather than the market rewards performance above security.
Let's see: is there a mature, maintained TLS stack written in a memory-safe, high performant, cross platform language that Mozilla could have been using for free, instead of NSS? Your argument is that there isn't one because only C++ coders are "doers" instead of "talkers" but this argument is wrong because such a stack does exist. It's JCA/JSSE and ships with every OpenJDK.
JCA/JSSE is a full, open source TLS and cryptography stack. It's written, even at the lowest levels, in Java, in which this kind of memory error cannot occur. It's usable from C/C++ via JNI. It's commercially maintained and has been maintained for decades. It keeps up with the latest features. Its performance in the latest versions is comparable to OpenSSL even (not always in the past).
Mozilla could have used it. They could have improved it, even. They've shipped browsers with integrated Java before, in times when computers were much less powerful. They chose not to, why, well, the charitable explanation might be download size or performance, but we could just as easily argue it's because C++ guys don't really care about security. Bugs happen, they ship a fix, people let them off the hook. Eh another buffer overflow, who cares. Not like anyone is going to switch products over that, right?
Reality is, our industry could rewrite lots of software in safer languages _and already has_. Java has tons of libraries that do the same thing as C++ libraries, and which could be used today. You can shrink HotSpot down to about 7mb compressed, last time I experimented with this, which is like 10% of the size of a typical mobile app. It's just not a big deal. Perhaps the biggest problem is that JNI is an awkward FFI but there are fixes for that also, the new FFI (Panama) is a lot better than the old one.
Sincerely, as a C/C++ programmer, if I have to start a new project and you propose to link a Java library, it'd give me the creeps. Let's say I agree: does my project now requires to install a Java VM together with my project? Which Java VM? I still don't understand the difference between OpenJDK and the "other one"... JDK? JRE? I only installed it once many years ago because Eclipse CDT required it and I might have wrote "sudo update-java-alternatives --set /path/to/java/version" one too many times (AOSP development).
OK. I'll make an installer to install Java with my project. But the customer has another older version, or a competitor version, or ..... I don't care anymore.
Do I have to build this library? What paraphernalia do I have to install and learn to automatize, build and test a complex Java library? Do I have to maintain a JNI interface too?
Where is the JCA/JSSE source code? (I can't find it for real) And who maintains it? Oracle? What if Oracle one day pulls the plug or comes after me with one of their famous legal moves just because?
These are my concerns. You might try to convince me, but I already learnt about C/C++ libraries, DLL hell, etc, and have all my fixes in place.
> Bugs happen
They will still happen. Put your heart at peace.
I think you're making my point for me, no? Your arguments are all variants on "I don't understand modern Java and prefer to keep writing buffer overflows than finding out".
To answer your questions:
1. Java is these days like UNIX, there are lots of "distros". OpenJDK is the upstream on which most of them are based. You can just use that unless you have some particular preference for the other vendors. However the compatibility situation is much better, there aren't any compatibility issues and JVMs are all more or less drop-in replacements for each other (Android is an exception but has got a lot better over time).
2. You don't need to install Java alongside your app. The JVMs is just a library and a few data files. You can easily bundle it with your app and the user will never know.
3. JCA/JSSE is built in to Java so there's nothing to build. Additionally you don't have to build Java libraries to use them anyway, because binary distribution works so everyone just distributes binary JAR files.
4. The source code is spread around several modules because JCA is pluggable. But for example a lot of the crypto code is found here:
https://github.com/openjdk/jdk/tree/master/src/java.base/sha...
5. Oracle maintains it but other companies contribute, like Amazon, Microsoft, Red Hat etc (well, to Java as a whole). It's released under the GPL with Classpath exception license. They aren't going to come after you for using it - there are 12 million Java developers in the world. The only company that has ever got sued is Google and that's because they violated the (at the time non open source) license of Java to make a mobile version that wasn't from Sun. They knew they were doing it and chose to chance it anyway (which worked out quite well for them actually). Since then Java became fully open source and this no longer applies, and it'd have never applied to normal users of it anyway.
I'm not trying to convince you here of anything, only to point out that everyone saying "there was no alternative" is just wrong. The failure mode here is not sufficiently advanced fuzzers. The failure is that there has been an alternative to NSS for years but C++ shops like Mozilla keep maintaining crappy decades old C libraries with memcpy calls all over the place, because they know and like C. That's it. That's all it boils down to.
> "I don't understand modern Java"
Exactly!
> "and prefer to keep writing buffer overflows than finding out"
Not exactly. I would rather take the _risk_ of writing buffer overflows than using over-bloated infrastructure, being in code size and/or performance, tooling, including learning time. It's a tradeoff. It's always a tradeoff. Mozilla might have had their own reasons, and probably not on a whim. If I have to write the software for an ECU the approach will be different (no, it won't be Java).
I've been in the industry for a long time now (C/C++, mostly embedded). The pattern is the same. This time is you with Java, it might be the Rust/Go crowd , the other 4 users with Ada or the 2 users with FP: everybody loves to hate C/C++ and everybody have their whiteboard reasons, but C/C++ has been pushing the world forward for the last 30 years minimum, and it's not stopping anytime soon. There must be a reason other than "C/C++ programmer are lazy dummies, they prefer to write bugs instead of learning the marvels of [insert random language here], because they are in love with memcpy", don't you think?
The code is there. You can try to link Java JCA/JSSE to Firefox or whatever the project is about. I'm interested in learning how it looks and how it works.
Right, as I said up thread: "I think it's rather than the market rewards performance above security." - so we're not really disagreeing.
In this case I doubt there'd be a big CPU overhead. There'd be some memory overhead, and browsers compete on that, although of course they're kind of memory pigs already.
_"The code is there. You can try to link Java JCA/JSSE to Firefox or whatever the project is about. I'm interested in learning how it looks and how it works."_
I think once Panama (the new FFI) gets released it'd be interesting to experiment with this sort of thing, though I don't care about Firefox personally. That would eliminate a lot of the tedious boilerplate you'd otherwise have to churn out.
That’s the reason Java is used almost exclusively on servers. None of your pain points (well maybe except Oracle getting litigious) apply there.
Technobabble.
>The answer will be: these people aren't doers - they are talkers.
I think it is the market values more features and performance, with incremental improvement to current technology at small expense of security. Or the cost of features and performance with currently available tools and work force ( C / C++ ) is vastly cheaper than rewriting it in what ever PL that is.
The only reason why we are starting to do it now, is because the total computer market is 10x larger than we had. Tech is also used 10x more and all of a sudden the cost of security may be worth the cost to switch.
It is all a economic model. or Market Interest. Nothing to do with Armchair and Doers.
Lmao. Absolute perfection. Software that exists has way more value.
Not when that software and the ecosystem around it, which are entrenched, actively fight against change.
If you think that's not true, C++ programmers, raise your hand if you were ever working on/contributing to a C program and you proposed C++ as a safer/better language and got shot down.
So even C++ is facing the same kind of challenges now faced by newer programming languages.
> So even C++ is facing the same kind of challenges now faced by newer programming languages.
Couldn't agree more with this.
> wonder why all the Lisp and Haskell programmers .. didn't get off their asses
I'm sure all these things exist in these languages, totally unused.
I actually agree with you - in many cases, I am sure they do.
But are they complete, and usuable enough in production? Did they ever acquire enough mindshare?
Did they get enough buy in?
The clear answer is no.
Mindshare and buy-in might be an issue of the ecosystem though. Not enough lisp/Haskell devs.
Maybe thats because while those languages solve particular problem, they, or their proponents aren't solving the particular problems at hand.
Perhaps, but that's speculation. If nor corp will invest outside fortran, cobol, C++ or java then there's a limit to how much problem solving a small community can do on its own dime.
The point is, I think speculative characterisations of <community>-dev members as lazy/unproductive is off mark.
There are likely an order of magnitude more lines of Java, Python, and JavaScript out there that don’t have this problem. You’ve painted a pretty dumb false dichotomy here with the Haskell vs C trope.
More strongly: the idea that "fuzzing" is "doing all the right things" is insane and disappointing of a narrative. The code snippets I am seeing here are _ridiculous_ with manual, flag-based error handling? There is a use of something like memcpy that isn't hidden behind some range abstraction that manages all of the bounds?! This code probably is riddled with bugs, because they are doing _all_ the wrong things, and fuzzing the result counts for nothing. C++ sucks, but it is way better than this.
I don't think anyone is under the illusion that "fuzzing is doing all the things" or that it is a replacement for actual unit, integration and E2E tests.
It still is true that fuzzing has managed to find a whole slew of bugs that were otherwise not found, and it is generally easier to automate and add without requiring a ton of engineer time. It is meant to be an addition on top, not a replacement for other techniques.
"All the right things" included, according to the actual narrative in the post:
- Extensive non-fuzzing testing.
- Regular static analysis.
- Being actively maintained and overseen by a competent security team, with a bug bounty program on top.
I believe I've also seen mentions of independent security audits outside the post.
Edit: emphasis is on the fuzzing, because that is how the bug was finally discovered.
> but we will someday look back on it as asbestos and wonder why we kept at it for so damn long.
Maybe. But there is good reason for C dominance - it's low level and close to systems as in "syscalls and stuff". And we need that level of systems control, not only for max performance but also for not loosing what is available in hw and os.
Asm is the other option to have full functionality :) Maybe it's just case of available libraries but still C is curently only one option for feature completness. And on low levels all you have is bytes and bits and calls and that allows to do everything with bits, bytes and calls - Core Wars style - and most of languages try to prevent or easy use of that by cutting possibilities, eg. "goto is _so_ nono !11".
And yes, C is not perfect and bullshit even grows, eg. in last years NULL checks are "optimized-out"...
C actually should be improved and not fucked up into forced disuse...
> C actually should be improved and not fucked up into forced disuse...
It can't, it won't, it isn't.
The last major update has C99 and even that was minor from a security point of view.
For better or worse, it's frozen.
C isn't dominant in a lot of areas of software, obviously.
As for NSS. It's a pure data manipulation library. Algorithms, data structures, not more. It doesn't need to do syscalls and stuff. There's not much benefit to writing it in C and as we just saw, a lot of downsides. It could have been written in many other languages.
Memory is cheap. (But not free of course!) Scala, Kotlin, and Java exist. The HotSpot VM is amazing, its JIT is mindblowing. (Probaly even TS + V8 would do the job for high level stuff.)
It's complete nonsense that C is needed for accessing the HW. High level languages can do low-level bibanging just fine.
So yes, there's a good reason, but that is inertia. Literally too big to fail, so it keeps on living. Even if we tomorrow outlawed C, drafted people to start rewriting things in a memory-managed language, and so on, it would take decades to get rid of it.
Is it OK to remove modern C++ from your statement ? Using a `std::vector<std::byte>` wouldn't cause this problem. Don't know why everyone always berates C++ for vulnerabilities in traditional C code.
It wouldn't cause the problem in itself perhaps, but I find it a bit reductive to look at the type in isolation like that. Sometime, somewhere, someone will call std::vector<T,Allocator>::data on that vector and use the resulting pointer as the src argument to memcpy (or some other function), and someone else will make a change that causes an overflow of the dst buffer. Shit happens and code written in modern C++ also has bugs and some of those bugs have security implications.
_Sometime, somewhere, someone will call ..._
I call this the mold seeping through the wallpaper. C++ tries to paper over C's terrible array model by using collection class templates, but those constructs leak. Too many things need raw pointers.
You're just talking about more C interfaces, not C++. The same thing would happen in Rust if you tried to pass a chunk of memory to C.
> The same thing would happen in Rust if you tried to pass a chunk of memory to C.
In an 'unsafe' block.
The memory corruption would happen in the C code though. By the time that the program is actually affected by the memory error it could be much later, back in Rust code, and now you don't have any tools for debugging memory corruption because "that never happens in Rust".
The very first thing you would do is audit for 'unsafe' though.
Assuming you ever encounter the issue, sure. But this bug was only triggered by long keys, which are outside of the normal operating paradigms. So your Rust code calling into a C API would have had exactly the same security vulnerability as C++.
On the other hand, something like Java via JNI wouldn't, because it copies the data to a different address space as it goes through the language boundary. Horribly inefficient, but at least the C code only causes security issues in the C regions. By the time it gets back into Java it either crashes or it's safe again, no undefined behaviour leaks through the API boundary.
Copying the data before the FFI boundary would have done absolutely nothing to prevent this bug or make it any less difficult to exploit tho
True, except the code will be tainted and it is easy to find it.
Memory corruption doesn't always trigger segfaults. I don't believe it will be obvious why some random other part of your program will start giving intermittent errors even if it is in Rust.
In Rust, or any other systems programming language with unsafe code blocks, all the way back to JOVIAL and ESPOL, one can search for those code blocks.
At very least they provide an initial searching point.
On C, C++ and Objective-C, any line of code is a possible cause for memory corruption, integer overflow, or implicit conversions that lead to data loss.
This is starting from the point of knowing it is a memory corruption issue though. From my experience, memory corruption usually manifests in logic or data behaviour changes. In a Rust program you'd probably spend a few days pulling your hair out trying to understand why some data structure doesn't do what it's supposed to before considering that it's one of the unsafe blocks.
Yeah, but at least you know where to start searching afterwards.
This applies to other languages with unsafe code blocks, note that JOVIAL and ESPOL were the first ones offering such capability.
It would, because ISO std::vector<std::byte> doesn't do bounds checking by default unless you turn on the compiler checks in release builds, use at() everywhere, or create you own `std::checked_vector<std::byte>`.
The default [] for vector doesn't do bound checking, so I don't feel it helps they much.
One thing in practice I like about Rust over C++ is the "easy option" is safe, and you have to do things like write 'unsafe' to opt out of safety.
Certainly when teaching C++ I wish the defaults were safer. I can teach ".at", but it isn't what you see in most code / tutorials.
-D_GLIBCXX_ASSERTIONS
Good luck advocating for that on release code.
People who memcpy bytes into a destination without caring if there's enough room would also not hesitate to use memcpy to copy bytes into a std::vector though.
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines...
> Using a `std::vector<std::byte>`
And herein lies the problem. Your statement is correct. But that type definition is exactly why C++ feels like a 50's car retrofitted with an electric drivetrain and a DVD player on the dash
It will almost run like a modern thing but it will fail when it shouldn't.
At least Asbestos is inert once in place.
Isn’t asbestos always inert unless you matchmake it to a ridiculously strong oxidiser?
In my recollection it dealt purely physical damage, and its inability to react with much of anything is what leads to its accumulation in dwellings.
I believe they mean inert as it relate to health, not chemically.
Unfortunately it's too much work to throw away all code written in unsafe languages. So it's valuable to try and improve tools and techniques that make these languages less unsafe.
One reason why we are getting hardware memory tagging to transform modern computers into basically C Machines, is that everyone kind of gave up doing it at sofware level in regards to C and C++.
Just compile it all to Wasm and we'll run it in virtualization.
Give us memory segments back and I think we have a shot of making this a reality.
Except WASM made the big mistake of not having bounds checking on linear memory, so attacks with input data that corrupt internal state and by it try to influence the outcome of WASM modules behaviour are still a good attack vector.
Being sandboxed is good, however if one can start an internal fire from the outside, not so much.
Not sure what you were expecting, but no, Wasm doesn't magically make memory-unsafe languages internally uncorruptible, it limits all corruption to internal state. If you look at the PL research stretching back a couple decades on how to do that for C, you are looking at integer factor performance overheads in the general case. Wasm also doesn't make anything less safe than it was before (ok, ok, modulo the current lack of read-protected memory), and since we wisely chose to make the execution stack non-addressable, has CFI by default.
Wasm's sandboxing makes it impossible to escalate the privilege of a program or acquire capabilities that it didn't have before (i.e. were not explicit imported and granted). That's a strictly stronger primitive than a random userspace Unix process.
I was expecting that people don't oversell WebAssbemly as some kind of magic pixie dust of security, anyone that knows a bit about security spots those flaws.
You know what is also sandboxed? An OS process.
Ah, but an OS process has a wider syscalls surface, well lets than bring WASI or JavaScript bindings into the picture.
At least we are on the same page on the level of sandboxing provided. For the record, it was your comment that called it a "big mistake" to not have finer-grained memory protection. I explained why we did that. And now you're saying someone "oversold" it as magic pixie dust? I'm not sure which hype train you're referring to, TBH. It wasn't core members or the CG, as we understood very clearly which security properties we were gaining and which we were not gaining.
One of Bjarne's talks at CppCon 2021, is yet advocating again for the Code Guidelines, because they just keep being ignored by the community at large.
C ABI is still stable after all these years. Programs written in "The C Programming Language" bible still compile and work fine.
That's why.
Imagine maintaining a long-term project in the first place.
Such crazyness, like why?
Ironically STL supports bound checking, but is always turned off.
What's special here is the bug is a memory corruption, and memory corruption bugs in such libraries are usually instantly security bugs.
Otherwise, the same story could be told as a generic software testing joke: "unit-tests are short-sighted and coverage lies", i.e. an "extremely well-maintained codebase, with extensive unittest, >98% test coverage and constantly scanned by all-static-analyzers-you-may-come-up" can have fatal, trivial bugs.
Ah, brings to mind one of my favorite Dijsktra quotes, "Program testing can be used to show the presence of bugs, but never to show their absence!"
I've never understood that to mean that he wasn't in favor of automated testing, only that it's got its limits. In this case, they now know a test case that was missing.
Yup, in my previous gig we had an approach that if there is a bug in the code that the tests didn't catch, it's actually 2 bugs: a product and a test issue. Similarly, if a dev needed to have access to the live debugging session to diagnose the issue, it meant there was yet another bug (to improve debuggability of the product). It was quite nice, taught me a lot.
Indeed. This is why I find 100% unit test coverage can actually be harmful: it gives you an illusion of safety.
> What's special here is the bug is a memory corruption, and memory corruption bugs in such libraries are usually instantly security bugs.
Is that special? Are there buffer overflow bugs that are not security bugs? It could be just my bubble as a security consultant, since (to me) "buffer overflow" assumes remote code execution is a given. It's not my area of expertise, though, so perhaps indeed not all reachable buffer overflows are security issues. (Presuming no mitigations, of course, since those are separate from the bug itself.)
As a crude example, there sometimes are off-by-one bugs which allow the buffer to overflow by one byte, and that single overflowing byte is always 0 (the last bye in a zero-terminated string), and it overwrites data in a variable that doesn't affect anything meaningful, giving you a buffer overflow with no security impact.
Although single zero byte overflows are sometimes exploitable.
https://googleprojectzero.blogspot.com/2014/08/the-poisoned-...
> Are there buffer overflow bugs that are not security bugs? It could be just my bubble as a security consultant, since (to me) "buffer overflow" assumes remote code execution is a given.
Not necessarily:
1. Most compilers use aligned variables and fields unless forced not to with a flag. Going 3 bytes over an array of 12 bytes can result in an overflow that is never detectable at runtime because the extra memory being used is used exclusively by the bug.
2. Malloced memory is harder (but not impossible) to turn into an unintended execution because the pages may be marked for data only by the OS. The stack pages are ~marked~ [EDIT: ~NOT marked~] as executable.
There's probably millions of buffer overflows that will not only never be exploited, but will also never be detected.
Sorry, I mean the special part is "the bug itself is a memory corruption". The second sentence is a quick explanation for those not in our bubble.
Lots of bugs aren't particularly exploitable. As Tavis notes, one of the major problems here is that there's a structure containing function pointers adjacent to the buffer.
If all you have is a stack overflow you may have your work cut out for you.
Further, while this bug may exist, assertions elsewhere _could have_ meant it was unexploitable. So in isolation it's an overflow but in context it's not reachable. That didn't happen here. This happens a _lot_ actually.
Unit tests aren't really for bug catching, they're to ensure you haven't changed behavior when you don't expect to.
They enable refactoring code in ways not possible without them.
FWIW, this does not match my experience. I have caught lots of bugs with unit tests, especially in code that is fundamentally complex (because it does complex things, not because it needs polishing). OTOH, refactorings often span units because real simplification comes from changing the ways units interact, or even which units exist, so the tests have to be changed anyway.
Granted, even tests that have to be changed have some value in securing a refactoring.
If you are writing tests to check new code in tandem with writing that code (either via TDD, or some other code-test loop), or are writing tests for existing code, you can (and usually will) find and fix bugs. Likewise if you are investigating a problem and write one or more test cases to check the behaviour.
Once those tests have been written, then they act as regression tests like the parent comment notes.
On "unit tests", I view the behaviour of the class/function and all its dependencies as a single unit for the purpose of testing. I've never liked the idea of needing to mock out a class in order to test another class, just because the class being tested makes use of it. The only case where mocks/stubs/etc. should be used is when interacting with an external component like a database or HTTP API. -- You don't see people that do this mocking out a list, string or other library classes, so why should project classes be any different.
To clarify, when I wrote "the ways units interact", I was referring to units that represent external dependencies in some kind (often a database, as you said). Many refactorings change those interactions in some way.
I agree that there is no reason to mock data containers or other classes with fully self-contained behaviour.
> - "There is an arbitrary limit of 10000 bytes placed on fuzzed input. There is no such limit within NSS; many structures can exceed this size. This vulnerability demonstrates that errors happen at extremes"
This is the one that seemed short sighted to me. It's a completely arbitrary (and small!) limit that blinded the fuzzer to this very modest sized buffer overflow.
Oh, the fools! If only they'd built it with 6001 hulls! When will they learn?
Thank you! So many hindsight fortune tellers here.
Predictable Human Behaviour, you can literally count on it! LOL
Anyway the post mortem analysis is interesting because it gives away peoples knowledge or lack of.
What is it they say? The Devil is in the detail!
Not using small bounds on your fuzzing size isn’t exactly something you can only know via hindsight.
The buffer holds 2K, so this limit alone which exceeds the buffer by 8K'ish didn't blind the fuzzer. It's not clear a larger input would've caught anything due to other "what went wrong" items, specifically "each component is fuzzed independently."
The problem is that the search space grows (exponentially?) as you increase the fuzzer’s limit. So there’s a cost, and likely diminishing returns, to raising that limit.
Coverage-guided fuzzing dramatically mitigates the exponential nature of the search space. It used to be that searching for magic bits was impossible with fuzzing but now it is nearly trivial.
Are they checking every possible overflow up to the max? Like no overflow at 7377 bytes, lets try 7378...
While I can see targeting near natural boundaries (1025 bytes for example), you should be able to skip over most of the search space and verify that it doesn't blow up on enormous values like 16777216 bytes.
It's not that simple though. Single-variable integer overflows can be checked like that, but when the critical byte in a buffer might be at positions 1 through {bufferlength}, you have to do a shotgun approach and see if anything sticks, and at some point the number of possible combinations grows too big even for that.
I'm not an expert on fuzzing myself, but generally I do see the point of having a limit here. Why, then, that limit was not chosen to be the max size for each of the length-checked inputs, I don't know. That does seem a bit more obvious, but also I just read this article so I can't prove that I wouldn't have made the same mistake.
Is it possible to feed a fuzzer with information from static analysis to limit the search space? Such as, if you have a check like "someParameter > 0" in the code, have the fuzzer generate a positive, negative, and zero value for someParameter, but not thousands of them -- at least not based on this check alone -- because they will all behave the same.
There's whitebox fuzzing that's starting to become a thing, when your fuzzer gets stuck you give the non-giving corpus to an SMT solver and it'll try to see if there exists an input that could uncover a new path. I'm really excited about these but haven't really followed the advances.
If you increase the limit you lose in coverage as you can only do so many exec/s
How do you know that it was actually "extremely well-maintained"? Everybody thought OpenSSL was well-maintained since it was used as critical infrastructure by multi-billion dollar megacorporations, but it was actually maintained by two full-time employees and a few part-time volunteers with maybe a quick once-over before a commit if they were lucky. How about sudo, a blindly trusted extremely sensitive program [1], which is maintained by basically one person who has nearly 3,000,000(!) changes[2] over 30 or so years?
Assuming that something is well-maintained because it is important is pure wishful thinking. Absent a specific detailed high quality process, or an audit that they conform to a well-established process that has demonstrably produced objectively high-quality output in a large percentage of audited implementations of that process (thus establishing nearly every instance of the audited process -> high quality output) all evidence indicates that you should assume that these code bases are poorly maintained until proven otherwise[3]. And, even the ones that are demonstrably maintained usually use very low quality processes as demonstrated by the fact that almost nobody working on those projects would be comfortable using their processes on safety-critical systems [4] which is the minimum bar for a high quality process (note the bar is "believe it is okay for safety-critical"). In fact, most would be terrified of the thought and comfort themselves knowing that their systems are not being used in safety-critical systems because they are absolutely not taking adequate precautions, which is a completely reasonable and moral choice of action as they are not designing for those requirements, so it is totally reasonably to use different standards on less important things.
[1]
https://news.ycombinator.com/item?id=25919235
[2]
https://github.com/sudo-project/sudo/graphs/contributors
[3]
[4]
Tavis explains clearly why he thinks it's well maintained in the post, complete with linkages to source code. To paraphrase:
[...] NSS was one of the very first projects included with oss-fuzz [...]
[...] Mozilla has a mature, world-class security team. They pioneered bug bounties, invest in memory safety, fuzzing and test coverage. [... all links to evidence ...]
Did Mozilla have good test coverage for the vulnerable areas? YES.
Did Mozilla/chrome/oss-fuzz have relevant inputs in their fuzz corpus? YES.
Is there a mutator capable of extending ASN1_ITEMs? YES.
I don't think at any point anyone assumed anything.
OpenSSL is a project which is treated as a "somebody else's problem" dependency by everybody, and the extent to which everybody cares about TLS support, it's basically an "open up a TLS socket and what do you mean it's more complicated than that" situation.
By contrast, NSS is maintained by Mozilla as part of Firefox, and, furthermore, its level of concern is deep into the "we don't want to enable certain cipher suites, and we have very exacting certificate validation policies that we are part of the effort in _defining_"--that is to say, NSS _isn't_ a "somebody else's problem" dependency for Mozilla but a very "Mozilla's problem" dependency.
That said, this is CERT_VerifyCertificate, not mozilla::pkix, and since this is not used in Firefox's implementation of certificate validation, I would expect that this _particular_ code in the library would be less well-maintained than other parts. But the whole library itself wouldn't be in the same camp as OpenSSL.
_> Everybody thought OpenSSL was well-maintained since it was used as critical infrastructure by multi-billion dollar megacorporations_
I wasn't under the impression anybody who knew the project ever really thought that. Some other people may have assumed that as a default if they hadn't looked into it.
This article spells out a whole bunch of reasoning why this particular library was well maintained though. There's a difference between reasoning based on evidence and assumptions.
I don't know anybody who thought OpenSSL was well-maintained in and before the Heartbleed era (it's a fork of SSLeay, which was Eric Young's personal project). Post-Heartbleed --- a decade ago, longer than the time lapse between SSLay and OpenSSL --- maintenance of OpenSSL has improved dramatically.
I think you and a sibling comment might be too close to the problem. When heartbleed dropped my Twitter feed had a few crypto engineers saying "I mean, eventually this was going to happen" and a ton of developers whose main language starts with a "p" going "how?? OpenSSL is core plumbing of the internet, it can't be this bad can it???".
Edit: to be clear, not maligning the "p" language developers, I was one myself. Simply saying there are a ton of technical people who are blissfully ignorant of the tiny pillars of clay that hold up the internet.
To be fair, it was unreasonable to expect Prolog coders to keep up with crypto advancements, I mean the language pre-dates SSL by decades.
Care to expand on what Prolog coders have to do with OpenSSL? Not getting the context here.
A joke about "developers whose main language starts with a "p"", meaning presumably perl, php and python. I guess in this context, ruby and JavaScript are also languages starting with "p".
Don't worry, I told both of them what was going on ;)
Pascal does predate the era of SSL by some margin to be fair.
Yeah, those Perl hackers are really out of touch :-)
I think the main surprising thing here is that people are putting smallish arbitrary limits on the sizes of inputs that they let their fuzzer generate.
With the benefit of a little hindsight, that does feel rather like saying "please try not to find any problems involving overflows".
I agree. I haven’t done a lot of fuzzing, but my understanding is that this is how fuzzing can be helpful. Am I wrong? Or is it more complicated than that?
It's a trade-off. Larger input files may slow the fuzzing process, and therefore explore less of the problem space. You usually want to test many different kinds of inputs, not just more of the same.
OTOH file formats often include sizes of fields, which a fuzzer will set to arbitrarily high values. This tests (some) handling of overly large inputs without files being actually that large.
How realistic is it that this vulnerability can be exploited for $BAD_THINGS?
https://www.mozilla.org/en-US/security/advisories/mfsa2021-5...
notes "This vulnerability does NOT impact Mozilla Firefox. However, email clients and PDF viewers that use NSS for signature verification, such as Thunderbird, LibreOffice, Evolution and Evince are believed to be impacted.".
I don’t understand why the “lessons learned” doesn’t recommend always* passing the destination buffer size (using memcpy_s or your own wrapper). It has been a long time since I wrote C++, but when I did this would have been instantly rejected in code review.
That's because these are "lessons learned" for how to catch these bugs, instead of "how to write more secure code".
Because you can't.
You catch the bug by flagging the use of memcpy instead of something that takes the dest buffer size (like memcpy_s or whatever).
It seems to me linters have been flagging this kind of thing since forever. This code is using a wrapper, "PORT_memcpy", so a default ruleset isn't going to flag it.
So here I guess no one noticed PORT_memcpy == memcpy (or maybe noticed but didn't take the initiative to add a lint rule or deprecation entry or just created an issue to at least port existing code).
was no one linting the wrapper? the static analysis tools we use wouldn't like memcpy_s either. It would create a finding to use an stl container probably.
Counterexample: msgrcv(). This expects you to not be passing raw buffers, but messages with a particular structure: a long mtype, to specify what type of message it is, and then a char (byte, since this is C) array that is the buffer that contains the rest of the message. You pass these structures to msgsnd() and msgrcv(), along with a size. But the size is the size of the buffer component of the structure, not the size of the structure as a whole. If you pass the size of the structure, it will read sizeof(long) more than your structure can hold. Been bit by that...
So, just passing the size of the destination is something that you can still get wrong, in the case of data more complicated than just a single buffer.
[Edit: You can also design an API to be very misleading, even if it has a length parameter...]
It's not the case here (I think), but this can be common if you move functions around or expose functions that were previously internal.
For example, maybe your function is only called from another one that performs the appropriate bound checks, so checking again becomes redundant. After a simple refactoring, you can end up exposing your function, and screw up.
Usually people say "oh, it's just another typical failure of writing in memory-unsafe C", but here's a slightly different angle: why is this common error is not happening under a single abstraction like "data structure that knows it size"? If C was allowing for such things, then 100000 programs would be using same 5-10 standard structures where the copy-and-overflow bug would be fixed already.
Languages like Rust, of course, provide basic memory safety out of the box, but most importantly they also provide means to package unsafe code under safe API and debug it once and for all. And ecosystem of easy to use packages help reusing good code instead of reinventing your own binary buffers every single damn time, as it's usually done in C.
So maybe it's not the unsafeness itself, but rather inability to build powerful reusable abstractions that plagues C? Everyone has to step on the same rake again and again and again.
But performance! Rust and other languages with bounds checking go out of their way to not do it once it is proven that they don't need to. It would be hard to do that as a data structure.
Well, here comes the type system, so your fancy data structure has zero cost. Rust recently got more support for const generics, so you could encode size bounds right in the types and skip unnecessary checks.
Oh that is what I was saying about Rust. I don't think that is possible in C, at least not without a huge amount of effort.
Wow.
We continue to be reminded that it's hard to write fully memory secure code in a language that is not memory secure?
And by hard, I mean, very hard even for folks with lots of money and time and care (which is rare).
My impression is that Apple's imessage and other stacks also have memory unsafe languages in the api/attack surface, and this has led to remote one click / no click type exploits.
Is there a point at which someone says, hey, if it's very security sensitive write it in a language with a GC (golang?) or something crazy like rust? Or are C/C++ benefits just too high to ever give up?
And similarly, that simplicity is a benefit (ie, BoringSSL etc has some value).
It's hard to fault a project written in 2003 for not using Go, Rust, Haskell, etc... It is also hard to convince people to do a ground up rewrite of code that is seemingly working fine.
> _It is also hard to convince people to do a ground up rewrite of code that is seemingly working fine._
I think this is an understatement, considering that it's a core cryptographic library. It appears to have gone through at least five audits (though none since 2010), and includes integration with hardware cryptographic accelerators.
Suggesting a tabula rasa rewrite of NSS would more likely be met with genuine concern for your mental well-being, than by incredulity or skepticism.
> Suggesting a tabula rasa rewrite of NSS would more likely be met with genuine concern for your mental well-being, than by incredulity or skepticism.
In my experience, porting code more or less directly from one language to another is faster and easier than people assume. Its certainly way faster than I assumed. I hand ported chipmunk2d to javascript a few years ago. Its ~30k LOC and it took me about a month to get my JS port working correctly & cleaned up. I spend up throughout the process. By the end I had a bunch of little regexes and things which took care of most of the grunt work.
If we assume that rate (about 1kloc / person-day) then porting boringssl (at 356kloc[1]) to rust would take about one man-year (though maybe much less). This is probably well worth doing. If we removed one heartbleed-style bug from the source code on net as a result, it would be a massive boon.
(But it would probably be much more expensive than that because the new code would need to be re-audited.)
[1]
https://www.openhub.net/p/boringssl/analyses/latest/language...
> In my experience, porting code more or less directly from one language to another is faster and easier than people assume.
That's often true right up to the point where you have to be keenly aware of and exceptionally careful with details such as underlying memory management functionality or how comparisons are performed. With this in mind, cryptographic code is likely a pathological case for porting. It would be very easy to accidentally introduce an exploitable bug by missing, for example, that something _intentionally_ reads from uninitialized memory.
On top of the re-audit being expensive.
> for example, that something intentionally reads from uninitialized memory.
Sounds terrible. This should never happen in any program, so any behavior relying on it is already broken.
I'm way more concerned by memory safety issues than cryptographic issues. Frankly, history has shown that cryptographic bugs are far easier to shake out and manage than memory safety bugs.
> Frankly, history has shown that cryptographic bugs are far easier to shake out and manage than memory safety bugs.
and yet, we had the debian/ubuntu openssl bug of 2008... due to someone not wanting to intentionally read from uninitialized memory. Really, it kind of proved the opposite. Valgrind and other tools can tell you about memory safety bugs. Understanding that the fix would result in a crypto bug was harder.
OpenSSL's use of uninitialized memory to seed entropy was always a terrible idea. The PRNG was fundamentally flawed to begin with.
> Really, it kind of proved the opposite.
Not really. Exploited bugs in cryptographic protocols are extremely rare. Exploited memory safety bugs are extremely common.
> Valgrind and other tools can tell you about memory safety bugs.
Not really.
> Understanding that the fix would result in a crypto bug was harder.
Like I said, OpenSSL's PRNG was brutally flawed already and could have been broken on a ton of machines already without anyone knowing it. A compiler update, an OS update, or just unluckiness could have just as easily broken the PRNG.
Building memory unsafety into the prng was the issue.
Memory safety issues are exploited orders of magnitude more often than crypto bugs.
edit: Also, memory safety bugs typically have higher impact than crypto bugs. An attacker who can read arbitrary memory of a service doesn't _need_ a crypto bug, they can just extract the private key, or take over the system.
Crypto bugs are bad. Memory safety bugs are way, way worse.
If a programs reads from uninitialised memory, I hope for its sake that it does not do it in C/C++. Setting aside that uninitialised memory is a hopelessly broken RNG seed, or the fact that the OS might zero out the pages it gives you before you can read your "uninitialised" zeroes…
Reading uninitialised memory in C/C++ is Undefined Behaviour, plain and simple. That means Nasal Demons, up to and including arbitrary code execution vulnerabilities if you're unlucky enough.
Genuinely curious what the use case(s) of reading from uninitialized are. Performance?
It was used as a source of randomness. Someone blindly fixing a "bug" as reported by a linter famously resulted in a major vulnerability in Debian:
https://www.debian.org/security/2008/dsa-1571
This is incorrect.
If they had simply removed the offending line (or, indeed, set a preprocessor flag that was provided explicitly for that purpose) it would have been fine. The problem was that they also removed a similar _looking_ line that was the path providing actual randomness.
> In my experience, porting code more or less directly from one language to another is faster and easier than people assume
Converting code to Rust while keeping the logic one-to-one wouldn't work. Rust isn't ensuring memory safety by just adding some runtime checks where C/C++ aren't. It (the borrow checker) relies on static analysis that effectively tells you that the way you wrote the code is unsound and needs to be redesigned.
Sounds like a feature of the porting process. Not a bug. And I’d like to think that BoringSSL would be designed well enough internally to make that less of an issue.
I agree that this might slow down the process of porting the code though. I wonder how much by?
The article says Chromium replaced this in 2015 in their codebase. (With another memory-unsafe component, granted...)
BoringSSL started as a stripped down OpenSSL. That's very different from a ground-up replacement. The closest attempt here is
https://github.com/briansmith/ring
but even that borrows heavily the cryptographic operations from BoringSSL. Those algorithms themselves are generally considered to be more thoroughly vetted than the pieces like ASN.1 validation.
Ring would be cool if not for
https://github.com/briansmith/ring#versioning--stability
This sounds like a nightmare for any downstream users of this library. Any one of those bullet points in that section would be a major concern for me using it in anything other than a hobby project, but all of them together seem almost willfully antagonistic to users.
This is especially true given it’s a security library, which perhaps more than any other category I would want to be stable, compatible, and free of surprises.
“You must upgrade to the latest release the moment we release it or else you risk security vulnerabilities and won’t be able to link against any library that uses a different version of ring. Also, we don’t ‘do’ stable APIs and remove APIs the instant we create a new one, so any given release may break your codebase. Good luck have fun!”
Note that the readme is outdated. With the upcoming 0.17 release, which is in the making for almost a year already, you can link multiple versions of ring in one executable:
https://github.com/briansmith/ring/issues/1268
Similarly, while the policy is still that ring only supports the newest rust compiler version, due to the fact that there has been no update for months already, you can use it with older compiler versions.
Last, the author used to yank old versions of its library, which caused tons of downstream pains (basically, if you are a library and are using ring, I recommend you have a Cargo.lock checked into git). This yanking has stopped since 3 years already, too. Don't think this was mentioned in the readme, but I feel it's an important improvements for users.
So a great deal of things has improved, although I'd say only the first bullet point is a _permanent_ improvement, while the second two might be regressed upon. idk.
Yeah, that is pretty wild. Total prioritization of developer convenience over actual users of the library.
Or rust-crypto
nss was also generally considered to be thoroughly vetted though
There’s a world of difference between ASN.1 validation and validation of cryptographic primitives. The serialization/deserialization routines for cryptographic data formats or protocols are where you typically get problems. Things like AES and ECDSA itself, less so, especially when you’re talking about the code in BoringSSL. Maybe some more obscure algorithms but I imagine BoringSSL has already stripped them and ring would be unlikely to copy those.
Why? Cryptographic primitives don’t really have a lot of complexity. It a bytes in/bytes out system with little chance for overflows. The bigger issues are things like side channel leaks or incorrect implementations. The former is where validation helps and the latter is validated by round-tripping with one half using a known-working reference implementation. Additionally, the failure mode is typically safe - if you encrypt incorrectly then no one else can read your data (typically). If you decrypt incorrectly, then decryption will just fail. Ciphers that could encrypt in an unsafe way (ie implementation “encrypts” but the encryption can be broken/key recovered) typically implies the cipher design itself is bad and I don’t think such ciphers are around these days. Now of course something like AES-GCM can still be misused by reusing the nonce but that’s nothing to do with the cipher code itself. You can convince yourself by looking for CVEs of cryptographic libraries and where they live. I’m not saying it’s impossible, but cipher and digest implementations from BoringSSL seem like a much less likely place for vulnerabilities to exist (and thus the security/performance tradeoff probably tilts in a different direction unless you can write code that’s both safer while maintaining competitive performance).
For symmetric cryptography (ciphers & hashes), I agree. I'd say as far as to say they're stupidly easy to test.
Polynomial hashes, elliptic curves, and anything involving huge numbers however are more delicate. Depending on how you implement them, you could have subtle limb overflow issues, that occur so extremely rarely by chance that random test don't catch them. For those you're stuck with either proving that your code does not overflow, or reverting to simpler, slower, safer implementation techniques.
That's a very good point. Thanks for the correction!
The Ring readme doesn't really cover its functionality but sounds like it may be a lower level crypto lib than NSS? And it also seems to be partly written in C.
Anyway, NSS wouldn't necessarily need to be replaced with a Rust component, it could well be an existing lib written in another (possibly GCd) safeish language, or some metaprogramming system or translator that generated safe C or Rust, etc. There might be something to use in Go or .net lands for example.
ring incorporates a barebones ASN.1 parser that also webpki uses, which is probably the crate you want to use if you want to do certificate verification in Rust. webpki is C-free but it does use ring for cryptographic primitives so that will have to be replaced if you don't like ring. More generally, I think Firefox wants to have great control over this specific component so they likely want to write it themselves, or at least maintain a fork.
Perl generated assembly. Hehh…
> Suggesting a tabula rasa rewrite of NSS would more likely be met with genuine concern for your mental well-being, than by incredulity or skepticism.
Why? I don't get it. Maintenance of NSS has to be seriously expensive.
The NSA was caught intentionally complicating that spec. The idea was to ensure it was impossible to implement correctly, and therefore be a bottomless well of zero days for them to exploit.
Gotta love the US government’s war against crypto.
Sorry for sounding like a broken record, but source please?
Can you provide more information on this? I'd be interested to read about this topic.
He's confused it with Dual_EC_DRBG, a backdoored random number generator in a different non-international standard.
SSL is complicated because we didn't understand how to design secure protocols in the 90s. Didn't need help.
No; this predated that by about a decade. They had moles on the committees that codified SSL in the 90’s. Those moles added a bunch of extensions specifically to increase the likelihood of implementation bugs in the handshake.
I’m reasonably sure it was covered in cryptogram a few decades ago. These days, it’s not really discoverable via internet search, since the EC thing drowned it out.
Edit: Here’s the top secret budget for the program from 2013. It alludes to ensuring 4G implementations are exploitable, and to some other project that was adding exploits to something, but ramping down. This is more than a decade after the SSL standards sabotage campaign that was eventually uncovered:
http://s3.documentcloud.org/documents/784159/sigintenabling-...
With SSL, the moles kept vetoing attempts to simplify the spec, and also kept adding complications, citing secret knowledge. It sounds like they did the same thing to 4G.
Note the headcount numbers: Over 100 moles, spanning multiple industries.
To be fair you don't need to rewrite the whole thing at once. And clearly the audits are not perfect, so I don't think it's insane to want to write it in a safer language.
It may be too much work to be worth the time, but that's an entirely different matter.
> may be too much work
I wonder, how many work years could be too much
(What would you or sbd else guess)
>seemingly worked fine
That’s just it though, it never was. That C/C++ code base is like a giant all-brick building on a fault line. It’s going to collapse eventually, and your users/the people inside will pay the price.
>>seemingly worked fine
>That’s just it though, it never was. That C/C++ code base is like a giant all-brick building on a fault line. It’s going to collapse eventually, and your users/the people inside will pay the price.
Sure, but everything is a trade-off[1]. In this particular case (and many others) no user appeared to pay any price, which tells me that the price is a spectrum ranging from 'Nothing' to 'FullyPwned' with graduations in between.
Presumably the project will decide on what trade-off they are willing to make.
[1] If I understand your comment correctly, you are saying that any C/C++ project has a 100% chance of a 'FullyPwned' outcome.
What's somewhat interesting is memory safety is not a totally new concept.
I wonder if memory safety had mattered more, whether other languages might have caught on a bit more, developed more etc. Rust is the new kid, but memory safety in a language is not a totally new concept.
The iphone has gone down the memory unsafe path including for high sensitivity services like messaging (2007+). They have enough $ to re-write some of that if they had cared to, but they haven't.
Weren't older language like Ada or Erlang memory safe way back?
Memory safe language that can compete with C/C++ in performance and resource usage is a new concept.
AFAIK ADA guarantees memory safety only if you statically allocate memory, and other languages have GC overhead.
Rust is really something new.
There's different classes of memory un-safety: buffer overflow, use after free, and double free being the main ones. We haven't seen a mainstream language capable of preventing use and free and double free without GC overhead until Rust. And that's because figuring out when an object is genuinely not in use anymore, at compile time, is a really hard problem. But a buffer overflow like from the article? That's just a matter of saving the length of the array alongside the pointer and doing a bounds check, which a compiler could easily insert if your language had a native array type. Pascal and its descendants have been doing that for decades.
> That's just a matter of saving the length of the array alongside the pointer and doing a bounds check, which a compiler could easily insert if your language had a native array type. Pascal and its descendants have been doing that for decades.
GCC has also had an optional bounds checking branch since 1995. [0]
GCC and Clang's sanitisation switches also support bounds checking, for the main branches, today, unless the sanitiser can't trace the origin or you're doing double-pointer arithmetic or further away from the source.
AddressSanitizer is also used by both Chrome & Firefox, and failed to catch this very simple buffer overflow from the article. It would have caught the bug, if the objects created were actually used and not just discarded by the testsuite.
[0]
https://gcc.gnu.org/extensions.html
> It would have caught the bug, if the objects created were actually used and not just discarded by the testsuite.
They were only testing with AddressSanitizer, not running the built binaries with it? Doing so is slow to say the least, but you can run programs normally with these runtime assertions.
It even has the added benefit of serving as a nice emulator for a much slower system.
> We haven't seen a mainstream language capable of preventing use and free and double free without GC overhead until Rust.
Sorry, that just isn’t the case. It is simple to design an allocator that can detect any double-free (by maintaining allocation metadata and checking it on free), and prevent any use-after-free (by just zeroing out the freed memory). (Doing so efficiently is another matter.) It’s not a language or GC issue at all.
> prevent any use-after-free (by just zeroing out the freed memory)
It's not quite that simple if you want to reuse that memory address.
Not reusing memory addresses is a definite option, but it won't work well on 32-bit (you can run out of address space). On 64-bit you may eventually hit limits as well (if you have many pages kept alive by small amounts of usage inside them).
It is however possible to make use-after-free _type_-safe at least, see e.g. Type-After-Type,
https://dl.acm.org/doi/10.1145/3274694.3274705
Type safety removes most of the risk of use-after-free (it becomes equivalent to the indexes-in-an-array pattern: you can use the wrong index and look at "freed" data but you can't view a raw pointer or corrupt one.). That's in return for something like 10% overhead, so it is a tradeoff, of course.
Rust is a definite improvement on the state of the art in this area.
One of the things I like about Zig is that it takes the memory allocator as a kind of “you will supply the correct model for this for your needs/architecture” as a first principle, and then gives you tooling to provide guarantees downstream. You’re not stuck with any assumptions about malloc like you might be with C libs.
On the one hand, you might need to care more about what allocators you use for a given use case. On the other hand, you can make the allocator “conform” to a set of constrictions, and as long as it conforms at `comptime`, you can make guarantees downstream to any libraries, with the a sort of fundamental “dependency injection” effect that flows through your code at compile time.
Zig is, however, not memory safe, which outweighs all of those benefits in this context.
It can be memory safe. It's up to you to choose memory safety or not. That's a piece of what I was getting at. Unless I misunderstand something. I've only dabbled with Zig.
I'm not aware of any memory safety that works in Zig, other than maybe Boehm GC. The GeneralPurposeAllocator quarantines memory forever, which is too wasteful to work in practice as one allocation is sufficient to keep an entire 4kB page alive.
I mean, if you don't care about efficiency, then you don't need any fancy mitigations: just use Boehm GC and call it a day. Performance is the reason why nobody does that.
Zeroing out freed memory in no way prevents UAFs. Consider what happens if the memory which was freed was recycled for a new allocation?
Maybe an example will help make it clearer? This is in pseudo-C++.
struct AttackerChosenColor { size_t foreground_color; size_t background_color; }; struct Array { size_t length; size_t *items; }; int main() { // A program creates an array, uses it, frees it, but accidentally forgets that it's been freed and keeps using it anyway. Mistakes happen. This sort of thing happens all of the time in large programs. struct Array *array = new Array(); ... free(array); // Imagine the allocation is zeroed here like you said. The array length is 0 and the pointer to the first item is 0. ... struct AttackerChosenColor *attacker = new AttackerChosenColor(); // The allocator can reuse the memory previously used for array and return it to the attacker. Getting this to happen reliably is sometimes tricky, but it can be done. // The attacker chooses the foreground color. They choose a color value which is also the value of SIZE_T_MAX. // The foreground_color *overlaps\* with the array's length, so when we change the foreground color we also change the array's size.
attacker->foreground_color = SIZE_T_MAX;
// The background_color _overlaps_ with the array's size, so when we change the background color we also change the array's start.
// The attacker chooses the background color. They choose a color value which is 0.
attacker->background_color = 0;
// Now say the attacker is able to reuse the dangling/stale pointer. // Say that they can write a value which they want to wherever they want in the array. This is // Like you suggested it was zeroed when it was freed, but now it's been recycled as a color pair and filled in with values of the attacker's choosing. // Now the attacker can write whatever value they want wherever they want in memory. They can change return addresses, stack values, secret cookies, whatever they need to change to take control of the program. They win. if (attacker_chosen_index < array->length) { array->items[attacker_chosen_index] = attacker_chosen_value; } }
> Zeroing out freed memory in no way prevents UAFs.
Maybe they meant it zeroes out all the references on free? This is possible if you have a precise GC, although not sure if it's useful.
The trick is not that the language support a safe approach (C++ has smart pointers / "safe" code in various libraries) in my view but simply that you CAN'T cause a problem even being an idiot.
This is where the GC languages did OK.
As far as I know, nothing in the C/C++ standard precludes fat pointers with bounds checking. Trying to access outside the bounds of an array is merely undefined behavior, so it would conform to the spec to simply throw an error in such situations.
There's address-sanitizer, although for ABI compatibility the bound is stored in shadow memory, not alongside the pointer. It is very expensive though. A C implementation with fat pointer would probably have significantly lower overhead, but breaking compatibility is a non-starter. And you still need to deal with use-after-free.
I believe that's how D's betterC compiler [0] works, whilst retaining the memory safe features.
[0]
https://dlang.org/spec/betterc.html
> But a buffer overflow like from the article? That's just a matter of saving the length of the array alongside the pointer and doing a bounds check, which a compiler could easily insert
Both the array length and the index can be computed at runtime based on arbitrary computation/input, in which case doing bounds checks at compile time is impossible.
What new concept? When C and C++ appeared they were hardly usefull for game development, hence why most games were coded in Assembly.
After C, alongside TP, got a spot on languages accepted for game development, it took about 5 years for C++ to join that spot, mostly triggered by Watcom C++ on the PC, and PS2 SDK.
There was no GC overhead on the systems programming being done in JOVIAL, NEWP, PL/I,...
There were OSe written in memory safe languages before two persons decided to create a toy OS in Assembly for their toy PDP-7.
The issue isn't really that there was a shortage of memory safe languages, it's that there was a shortage of memory safe languages that you can easily use from C/C++ programs. Nobody is going to ship a JVM with their project just so they can have the "fun" experience of using Java FFI to do crypto.
Realistically Rust is still the only memory safe language that you could use, so it's not especially surprising that nobody did it 18 years ago.
> The issue isn't really that there was a shortage of memory safe languages, it's that there was a shortage of memory safe languages that you can easily use from C/C++ programs.
Just as importantly, there was also a shortage of memory safe languages that had good performance.
AFAIK the issue with messaging isn't that the core app itself is written in an unsafe language , but that many components it interacts with are unsafe. E.g file format parsers using standard libraries to do it.
Granted those should also be rewritten in safer languages but often they're massive undertakings
In 2003 you could have generated C from a DSL, for one. Like yacc and lex had been standard practice (although without security focus) since the 80s.
Or generate C from a safe GP language, eg C targeting Scheme such as Chicken Scheme / Bigloo / Gambit.
People have been shipping software in memory safe languages all this time, since way before stack smashing was popularized in Phrack, after all.
I think a good approach could be what curl is doing. AFAIK they are replacing some security-critical parts of their core code with Rust codebases, importantly without changing the API.
Modula-2 (1978), Object Pascal (1980), .....
My language selection checklist:
1. Does the program need to be fast or complicated? If so, don't use a scripting language like Python, Bash, or Javascript.
2. Does the program handle untrusted input data? If so, don't use a memory-unsafe language like C or C++.
3. Does the program need to accomplish a task in a deterministic amount of time or with tight memory requirements? If so, don't use anything with a garbage collector, like Go or Java.
4. Is there anything left besides Rust?
By 'complicated' in point 1, do you mean 'large'? Because a complex algorithm should be fine -- heck, it should be _better_ in something like Python because it's relatively easy to write, so you have an easier time thinking about what you're doing, avoid making a mistake that would lead to an O(n³) runtime instead of the one you were going for, takes less development time, etc.
I assume you meant 'large' because, as software like Wordpress beautifully demonstrates, you can have the simplest program (from a user's perspective) in the fastest language but by using a billion function calls for the default page in a default installation, you can make anything slow. Using a slow language for large software, if that's what you meant to avoid then I agree.
And as another note, point number 2 basically excludes all meaningful software. Not that I necessarily disagree, but it's a bit on the heavy-handed side.
By complicated I guess I mean "lots of types". Static typing makes up for its cost once I can't keep all the types in my head at the same time.
Point number 2 excludes pretty much all network-connected software, and that's intentional. I suppose single-player games are ok to write in C or C++.
> 2 excludes pretty much all network-connected software
Not caring about local privilege escalation I see ;). Attack surface might be lower, but from lockscreens to terminals there is a lot of stuff out there that doesn't need to be connected to the internet itself before I find it quite relevant to consider whether it was written in a dumb language.
I suspect Ada would make the cut, with the number of times it's been referenced in these contexts, but I haven't actually taken the time to learn Ada properly. It seems like a language before its time.
It's never too late to start!
-
-
-
https://pyjarrett.github.io/programming-with-ada/
As I understand it it's only memory safe if you never free your allocations, which is better than C but not an especially high bar. Basically the same as GC'd languages but without actually running the GC.
It does have support for formal verification though unlike most languages.
> if you never free your allocations
Technically, it used to be memory safe before this and they rolled back the restrictions to allow "unchecked deallocation".
Pointers ("Accesses") are also typed, e.g. you can have two incompatible flavors of "Widget*" which can't get exchanged which helps reduce errors, and pointers can only point at their type of pointer unless specified otherwise, are null'd out on free automatically and checked at runtime. In practice, you just wrap your allocations/deallocations in a smart pointer or management type, and any unsafe usages can be found in code by looking for "Unchecked" whether "Unchecked_Access" (escaping an access check) or "Unchecked_Deallocation".
The story is quite different in Ada because of access type checks, it doesn't use null-terminated strings, it uses bounded arrays, has protected types for ensuring exclusive access and the language implicitly passes by reference when needed or directed.
My experience with writing concurrent Ada code has been extremely positive and I'd highly recommend it.
Ada has improved a lot since Ada 83, it is quite easy to use RAII since Ada 2005.
Re 3, people have known how to build real-time GCs since like the 70s and 80s. Lots of Lisp systems were built to handle real-time embedded systems with a lot less memory that our equivalent-environment ones have today. Even Java was originally built for embedded. While it's curious that mainstream GC implementations don't tend to include real-time versions (and for harder guarantees need to have all their primitives documented with how long they'll execute for as a function of their input, which I don't think Rust has), it might be worth it to schedule 3-6 months of your project's planning to make such a GC for your language of choice if you need it. If you need to be hard real time though, as opposed to soft, you're likely in for a lot of work regardless of what you do. And you're not likely going to be building a mass-market application like a browser on top of various mass-market OSes like Windows, Mac, etc.
If your "deterministic amount of time" can tolerate single-digit microsecond pauses, then Go's GC is just fine. If you're building hard real time systems then you probably want to steer clear of GCs. Also, "developer velocity" is an important criteria for a lot of shops, and in my opinion that rules out Rust, C, C++, and every dynamically typed language I've ever used (of course, this is all relative, but in my experience, those languages are an order of magnitude "slower" than Go, et al with respect to velocity for a wide variety of reasons).
My impression was Go's GC was a heck of a lot slower than "single-digit microsecond pauses." I would love a source on your claim
I had seen some benchmarks several years ago around the time when the significant GC optimizations had been made, and I could've sworn they were on the order of single-digit microseconds; however, I can't find _any_ of those benchmarks today and indeed any benchmarks are hard to come by except for some pathological cases with enormous heaps. Maybe that single-digit
µs values was a misremembering on my part. Even if it's sub-millisecond that's plenty for a high 60Hz video game.
If it can really guarantee single-digit microsecond pauses in my realtime thread no matter what happens in other threads of my application, that is indeed a game changer. But I'll believe it when I see it with my own eyes. I've never even used a garbage collector that can guarantee single-digit millisecond pauses.
Have you measured the pause times of free()? Because they are not deterministic, and I have met few people who understand in detail how complex it can be in practice. In the limit, free() can be as bad as GC pause times because of chained deallocation--i.e. not statically bounded.
People don't call free from their realtime threads.
This is true, but for performance's sake, you should not alloc/free in a busy loop, especially not on a real time system.
Allocate in advance, reuse allocated memory.
But you can generally control when free is called.
Not sure current state of the art, but Go's worst-case pause time five years ago was 100µs:
https://groups.google.com/g/golang-dev/c/Ab1sFeoZg_8
Discord was consistently seeing pauses in the range of several hundred ms every 2 minutes a couple years ago.
https://blog.discord.com/why-discord-is-switching-from-go-to...
Hard to say without more details, but those graphs look very similar to nproc numbers of goroutines interacting with the Linux-of-the-time's CFS CPU scheduler. I've seen significant to entire improvement to latency graphs simply by setting GOMAXPROC to account for the CFS behavior. Unfortunately the blog post doesn't even make a passing mention to this.
Anecdotally, the main slowdown we saw of Go code running in Kubernetes at my previous job was not "GC stalls", but "CFS throttling". By default[1], the runtime will set GOMACSPROCS to the number of cores on the machine, not the CPU allocation for the cgroup that the container runs in. When you hand out 1 core, on a 96-core machine, bad things happen. Well, you end up with a non-smooth progress. Setting GOMACPROCS to ceil(cpu allocation) alleviated a LOT of problems
Similar problems with certain versions of Java and C#[1]. Java was exacerbated by a tendency for Java to make everything wake up in certain situations, so you could get to a point where the runtime was dominated by CFS throttling, with occasional work being done.
I did some experiments with a roughly 100 Hz increment of a prometheus counter metric, and with a GOMAXPROCS of 1, the rate was steady at ~100 Hz down to a CPU allocation of about 520 millicores, then dropping off (~80 Hz down to about 410 millicores, ~60 hz down to about 305 millicores, then I stopped doing test runs).
[1] This MAY have changed, this was a while and multiple versions of the compiler/runtime ago. I know that C# had a runtime release sometime in 2020 that should've improved things and I think Java now also does the right thing when in a cgroup.
AFAIK, it hasn't changed, this exact situation with cgroups is still something I have to tell fellow developers about. Some of them have started using [automaxprocs] to automatically detect and set.
[automaxprocs]:
https://github.com/uber-go/automaxprocs
Ah, note, said program also had one goroutine trying the stupidest-possible way of finidng primes in one goroutine (then not actyakly doing anything with the found primes, apart from appending them to a slice). It literally trial-divided (well, modded) all numbers between 2 and isqrt(n) to see if it was a multiple. Not designed to be clever, explicitly designed to suck about one core.
I found this go.dev blog entry from 2018. It looks like the average pause time they were able to achieve was significantly less than 1ms back then.
"The SLO whisper number here is around 100-200 microseconds and we will push towards that. If you see anything over a couple hundred microseconds then we really want to talk to you.."
https://go.dev/blog/ismmkeynote
I believe Java’s ZGC has max pause times of a few milliseconds
Shenandoah is in the the same latency category as well. I haven't seen recent numbers but a few years ago it was a little better latency but a little worse throughput.
3b. Does your program need more than 100Mb of memory?
If no, then just use a GC'd language and preallocate everything and use object pooling. You won't have GC pauses because if you don't dynamically allocate memory, you don't need to GC anything. And don't laugh. Pretty much all realtime systems, especially the hardest of the hard real time systems, preallocate everything.
> My language selection checklist:
1. What are the people going to implement this an expert in?
Choose that. Nothing else matters.
pretty sure the people who wrote the vulnerable code were experts.
Answering a question with a sincere question: if the answer to 3 is yes to deterministic time, but no to tight memory constraints, does Swift become viable in question 4? I suspect it does, but I don’t know nearly enough about the space to say so with much certainty.
I'm not super familiar with Swift, but I don't see how it could be memory-safe in a multi-threaded context without some sort of borrow checker or gc. So I think it is rejected by question #2.
Swift uses automatic reference counting. From some cursory reading, the major difference from Rust in this regard is that Swift references are always tracked atomically, whereas in Rust they may not be atomic in a single-owner context.
To my mind (again, with admittedly limited familiarity), I would think:
- Atomic operations in general don’t necessarily provide deterministic timing, but I'm assuming (maybe wrongly?) for Rust’s case they’re regarded as a relatively fixed overhead?
- That would seem to hold for Swift as well, just… with more overhead.
To the extent any of this is wrong or missing some nuance, I’m happy to be corrected.
Incrementing an atomic counter every time a reference is copied is a significant amount of overhead, which is why most runtimes prefer garbage collection to reference counting (that, and the inability of referencing counting to handle cycles elegantly).
Rust doesn't rely on reference counting unless explicitly used by the program, and even then you can choose between atomically-reference-counted pointers (Arc) vs non-atomic-reference-counted pointers (Rc) that the type system prevents from being shared between threads.
I promise I’m not trying to be obtuse or argumentative, but I think apart from cycles your response restates exactly what I took from my reading on the subject and tried to articulate. So I’m not sure if what I should take away is:
- ARC is generally avoided by GC languages, which puts Swift in a peculiar position for a language without manual memory management (without any consideration of Swift per se for the case I asked about)
- Swift’s atomic reference counting qualitatively eliminates it from consideration because it’s applied even in single threaded workloads, negating determinism in a way I haven’t understood
- It’s quantitatively eliminated because that overhead has such a performance impact that it’s not worth considering
Swift has a similar memory model to Rust, except that where Rust forbids things Swift automatically copies them to make it work.
People using other languages appear terrified of reference count slowness for some reason, but it usually works well, and atomics are fast on ARM anyway.
It's important to note that while Swift often _allows_ for code similar to Rust to be written, the fact that it silently inserts ARC traffic or copies often means that people are going to write code that do things that Rust won't let them do and realize after the fact that their bottleneck is something that they would never have written in Rust. I wouldn't necessarily call this a language failure, but it's something worth looking out for: idiomatic Swift code often diverges from what might be optimally efficient from a memory management perspective.
To provide some context for my answer, I’ve seen, first hand, plenty of insecure code written in python, JavaScript and ruby, and a metric ton - measured in low vulnerabilities/M LoC - of secure code written in C for code dating from the 80s to 2021.
I personally don’t like the mental burden of dealing with C any more and I did it for 20+ years, but the real problem with vulnerabilities in code once the low hanging fruit is gone is the developer quality, and that problem is not going away with language selection (and in some cases, the pool of developers attached to some languages averages much worse).
Would I ever use C again? No, of course not. I’d use Go or Rust for exactly the reason you give. But to be real about it, that’s solving just the bottom most layer.
C vulnerabilities do have a nasty habit of giving the attacker full code execution though, which doesn’t tend to be nearly so much of a problem in other languages (and would likely be even less so if they weren’t dependant on foundations written in C)
I don’t disagree with you. But just before writing that message I was code reviewing some python that was vulnerable a to the most basic SQL injection. Who needs or wants execution authority when you can dump the users table of salted sha passwords?
In C you can just use code execution to grab the passwords before they get salted…
One is the whole database.
I am very aware of code execution attacks and persistence around them. C has a class of failures that are not present in most other languages that are in current use, my point is that it really only solves part of the problem.
From a security perspective, the 90% issue is the poor quality of developers in whatever language you choose.
I think the difference is in the difficulty in preventing the bugs. I'd be very surprised if our codebase at work contained SQL injection bugs. We use a library that protects against them by default, and all code gets reviewed by a senior developer. SQL injection is easy to prevent with a simple code review process.
Subtle issues with undefined behaviour, buffer overflows, etc in C are much trickier and frequently catch out even highly experienced programmers. Even with high quality developers your C program is unlikely to be secure.
SQL or shell command injection also gives attackers full code execution.
> or something crazy like rust?
There's nothing crazy about rust.
If your example was ATS we'd be talking.
It’s not lost on me that the organization that produced NSS also invented Rust. That implies the knowledge of this need is there, but it’s not so straightforward to do.
> Or are C/C++ benefits just too high to ever give up?
FFI is inherently memory-unsafe. You get to rewrite security critical things from scratch, or accept some potentially memory-unsafe surface area for your security critical things for the benefit that the implementation behind it is sound.
This is true even for memory-safe languages like Rust.
The way around this is through process isolation and serializing/deserializing data manually instead of exchanging pointers across some FFI boundary. But this has non-negligible performance and maintenance costs.
Writing a small wrapper that enforces whatever invariants are needed at the FFI boundary is much, much easier to do correctly than writing a whole program correctly.
You are never going to get 100% memory safety in any program written in any language, because ultimately you are depending on someone to have written your compiler correctly, but you can get much closer than we are now with C/C++.
> FFI is inherently memory-unsafe
Maybe this specific problem needs attention. I wonder, is there a way we can make FFI safer while minimizing overhead? It'd be nice if an OS or userspace program could somehow verify or guarantee the soundness of function calls without doing it every time.
If we moved to a model where everything was compiled AOT or JIT locally, couldn't that local system determine soundness from the code, provided we use things like Rust or languages with automatic memory management?
This is a really hard problem because you have to discard the magic wand of a compiler and look at what is really happening under the hood.
At its most rudimentary level, a "memory safe" program is one that does not access memory that is forbidden to it at any point during execution. Memory safety can be achieved using managed languages or subsets[1] of languages like Rust - but that only works if the language implementations have total knowledge of memory accesses that the program may perform.
The trouble with FFIs is that by definition, the language implementations cannot know anything about memory accesses on the other side of the interface - it is foreign, a black box. The interface/ABI does not provide details about who is responsible for managing this memory, whether it is mutable or not, if it is safe to be reused in different threads, indeed even what the memory "points to."
On top of that, most of the time it's on the programmer to express to the implementation what the FFI even is. It's possible to get it wrong without actually breaking the program. You can do things like ignore signedness and discard mutability restrictions on pointers with ease in an FFI binding. There's nothing a language can do to prevent that, since the foreign code is a black box.
Now there are some exceptions to this, the most common probably being COM interfaces defined by IDL files, which are language agnostic and slightly safer than raw FFI over C functions. In this model, languages can provide some guarantees over what is happening across FFI and which operations are allowed (namely, reference counting semantics).
The way around all of this is simple : don't share memory between programs in different languages. Serialize to a data structure, call the foreign code in an isolated process, and only allow primitive data (like file descriptors) across FFI bounds.
This places enormous burden on language implementers, fwiw, which is probably why no one does it. FFI is too useful, and it's simple to tell a programmer not to screw up than reimplement POSIX in your language's core subroutines.
[1] "safe" rust, vs "unsafe" where memory unsafety is a promise upheld by the programmer, and violations may occur
Is there a way to do FFI without having languages directly mutate each other's memory, but still within the same process. So all 'communication' between the languages happens by serializing data, no shared memory being used for FFI. But you don't get the massive overhead of having to launch a second process.
You are still depending on the called function not clobbering over random memory. But if the called function is well-behaved you would have a clean FFI.
In practice you run all your process-isolated code in one process as a daemon instead of spawning it per call.
I think you're on the right track in boiling down the problem to "minimize the memory shared between languages and restrict the interface to well defined semantics around serialized data" - which in modern parlance is called a "remote procedure call" protocol (examples are gRPC and JSON RPC).
It's interesting to think about how one could do an RPC call without the "remote" part - keep it all in the same process. What you could do is have a program with an API like this :
int rpc_call (int fd);
where `fd` is the file descriptor (could even be an anonymous mmap) containing the serialized function call, and the caller gets the return value by reading the result back.
One tricky bit is thread safety, so you'd need a thread local file for the RPC i/o.
Presumably as long as FFIs are based on C calling conventions and running native instructions it would be unsafe. You could imagine cleaner FFIs that have significant restrictions placed on them (I'd imagine sandboxing would be required) but if the goal is to have it operate with as little overhead as possible, then the current regime is basically what you end up with and it would be decidedly unsafe.
C/C++ don’t really have “benefits”, they have inertia. In a hypothetical world where both came into being at the same time as modern languages no one would use them.
Sadly, I’m to the point that I think a lot of people are going to have to die off before C/C++ are fully replaced if ever. It’s just too ingrained in the current status quo, and we all have to suffer for it.
On any given platform, C tends to have the only lingua franka ABI. For that reason it will be around until the sun burns out.
The C ABI will outlive C, like the term "lingua franca" outlived the Franks. Pretty much every other language has support for the C ABI.
One complication is that it's not just about ABIs but at least as much about APIs. And C headers often make some use of the preprocessor. Usage of the preprocessor often even is part of the API, i.e. APIs expose preprocessor macros for the API consumer.
Zig has a built-in C compiler and supposedly you can just "include <header.h>" from within a Zig source file. Rust has a tool called bindgen. There are other tools, I haven't tried either of them, but the fact alone that I'm somewhat familiar with the Windows (and some other platforms') headers makes me not look forward to the friction of interfacing with some tried and true software platforms and libraries from within a different language.
I know there has been some work going on at Windows on porting their APIs to a meta-language. Does anyone know how much progress was made on that front?
The Windows meta-API thing is done:
We'll probably continue using C headers as a poor ABI definition format, even without writing programs in C. Sort-of like JSON used outside of JavaScript, or M4 known as autoconf syntax, rather than a standalone language.
But C headers as an ABI definition format are overcomplicated and fragile (e.g. dependent on system headers, compiler-specific built-in defs, and arbitrary contraptions that make config.h), and not expressive enough at the same time (e.g. lacking thread-safety information, machine-readable memory management, explicit feature flags, versioning, etc.).
So I think there's a motivation to come up with a better ABI and a better ABI definition format. For example, Swift has a stable ABI that C can't use, so there's already a chunk of macOS and iOS that has moved past C.
Not disagreeing about the issues of header files and the difficulty of consuming them from other languages (which was my point).
But regarding ABI definitions, I suspect that introducing "thread-safety information, machine-readable memory management, explicit feature flags" will make interopability at the binary level difficult or impossible, which is even worse.
Why do you think so? These things don't define binary layout, so they shouldn't interfere with it. In the worst case the extra information can be ignored entirely, and you'll have the status quo of poor thread safety, fragility of manual memory management, and crashes when the header version is different than binary .so version (the last point is super annoying. Even when the OS can version .so libs, C gives you absolutely no guarantee that the header include path and defines given to the compiler are compatible with lib include paths given to the linker).
On Android, ChromeOS, IBM i, z/OS, ClearPath it isn't.
C/C++ will be around for at least a hundred years. Our descendants will be writing C/C++ code on Mars.
I don’t know about that, I can see Rust having a certain aesthetic appeal to martians.
Nah, they definitely use Zig.
Why not write it in a language not written yet?
As I replied elsewhere.
You don't need to explain to _Mozilla_ about rewriting code from C/C++ to Rust.
this is C code.
stuff like void*, union and raw arrays do not belond in modern C++.
while C++ is compatible with C it provides ways to write safer code that C doesn't.
writing C code in a C++ project is similar to writing inline assembly.
Here,
https://android.googlesource.com/platform/ndk/+/refs/heads/m...
Well, there's no time machine.
Also, as far as I know, a full replacement for C doesn't exists yet.
Are you suggesting that this crypto library would not be possible or practical to be built with rust? What features of C enable this library which Rust does not?
There is no time machine to bring rust back to when this was created, but as far as I know, there is no reason it shouldn't be Rust if it was made today.
It's more of whether Rust fits into every workflow, project, team, build chain, executable environment, etc., that C does. Does rust run everywhere C runs? Does rust build everywhere C builds? Can rust fit into every workflow C does? Are there rust programmers with all the same domain expertise as for C programmers?
(Not to mention, the question here isn't whether to write in rust or write in C. It's whether to leave the C code as-is -- zero immediate cost/time/attention or rewrite in rust -- a significant up-front effort with potential long term gains but also significant risk.)
Rust does not run everywhere C runs. At least not yet - there's a couple efforts to allow rust to compile to all platforms GCC supports[1]. But we don't need rust to work everywhere C works to get value out of a native rust port of OpenSSL. Firefox and Chrome (as far as I know) only support platforms which have rust support already.
As I said in another comment, in my experience, porting code directly between two C-like languages is often much faster & cheaper than people assume. You don't have to re-write anything from scratch; just faithfully port each line of code in each method and struct across; with some translation to make the new code idiomatic. 1kloc / day is a reasonable ballpark figure, landing us at about one person-year to port boringssl to rust.
The downside of this is that we'd probably end up with a few new bugs creeping in. My biggest fear with this sort of work is that the fuzzers & static analysis tools might not be mature enough in rust to find all the bugs they catch in C.
[1]
https://lwn.net/Articles/871283/
Speaking as a static analysis person, C and C++ are unholy nightmares for static analysis tools to work with. Even very very basic issues like separate compilation make a mess of things. If the world can successfully shift away from these languages, the supporting tools won't have trouble keeping up.
>"1kloc / day is a reasonable ballpark figure, landing us at about one person-year to port boringssl to rust."
Porting 1k lines a day, testing and catching and fixing errors to language with incompatible memory model and doing it for a year is insanity. Programmers who propose this kind of productivity most likely have no idea about real world.
There’s an old saying attributed to Abe Lincoln: “Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”
I contend that most software engineers (and most engineering houses) don’t spend any time sharpening the axe. Certainly not when it comes to optimizing for development speed. Nobody even talks about deliberate practice in software. Isn’t that weird?
1000 lines of code is way too much code to write from scratch in a day, or code review. I claim it’s not a lot of code to port - especially once you’re familiar with the code base and you’ve built some momentum. The hardest part would be mapping C pointers into rust lifetimes.
> most likely have no idea about real world.
I have no doubt this sort of speed is unheard of in your office, with your processes. That doesn’t tell us anything about the upper bound of porting speed. If you have more real world experience than me, how fast did you port code last time you did it?
Mind you, going hell to leather is probably a terrible idea with something as security sensitive as BoringSSL. It’s not the project to skip code review. And code review would probably take as long again as porting the code itself.
I'm also really skeptical that one could maintain 1K/lines per day for more than a couple weeks if that.
There have been a lot of studies that measure average output of new code at only ~15 or so LOC/day. One can manage more on small projects for a short amount of time.
I could believe porting between two C-like languages is 1 order of magnitude easier, but not 2. Std library differences, porting idioms, it adds up.
Even just reading and really understanding 1K lines/day is a lot.
I'd love to see some data about that. Even anecdotal experiences - I can't be the only one here who's ported code between languages.
I agree with that LOC/day output figure for new code. I've been averaging somewhere around that (total) speed for the last few months - once you discard testing code and the code I'll end up deleting. Its slower than I usually code, but I'm writing some very deep, new algorithmic code. So I'm not beating myself up about it.
But porting is very different. You don't need to think about how the code works, or really how to structure the data structures or the program. You don't need much big picture reasoning at all, beyond being aware of how the data structures themselves map to the target language. Most of the work is mechanical. "This array is used as a vec, so it'll become Vec<>. This union maps to an enum. This destructor should actually work like this...". And for something like boringSSL I suspect you could re-expose a rust implementation via a C API and then reuse most of the BoringSSL test suite as-is. Debugging and fuzz testing is also much easier - since you just need to trace along the diff between two implementations until you find a point of divergence. If the library has tests, you can port them across in the same way.
The only data point I have for porting speed is from when I ported chipmunk2d in a month from C to JS. Chipmunk only has a few data structures and it was mostly written by one person. So understanding the code at a high level was reasonably easy. My ported-lines-per-day metric increased dramatically throughout the process as I got used to chipmunk, and as I developed norms around how I wanted to translate different idioms.
I have no idea how that porting speed would translate to a larger project, or with Rust as a target language. As I said, I'd love to hear some stories if people have them. I can't be the only one who's tried this.
> I have no idea how that porting speed would translate to a larger project
IME it doesn't.
I'm not at liberty to discuss all that much detail, but I what I can say is: This was a mixture of C and C++ -> Scala. This was pretty ancient code which used goto fairly liberally, etc. so it would often require quite a lot of control flow rewriting -- that take a looooong time. I'd be lucky to get through a single moderately complex multiply nested loop per day. (Scala may be a bit of an outlier here because it doesn't offer a c-like for loop, nor does it offer a "native" break statement.)
I've ported / developed from scratch way more than 1000 a day on emergency basis. Doing it reliably and with proper testing every day for a year - thanks, but no thanks.
>"I have no doubt this sort of speed is unheard of in your office, with your processes."
I am my own boss for the last 20+ years and I do not have predefined processes. And I am not in a dick size contest.
Most fuzzers I’m aware of work with Rust. You can use the same sanitizers as well.
Static analysis means a wide range of things, and so some do and some don’t work with Rust. I would be very interested to learn about C static analysis that somehow wouldn’t work with Rust at all; it should be _easier_ to do so in rust because there’s generally so much more information available already thanks to the language semantics.
Eg: the borrow checker is a kind of static analysis that's quite difficult to do soundly for C/C++ (and most other languages)
It only matters if rust runs everywhere that Firefox runs, which it does.
Considering firefox already uses rust components, that seems a safe bet.
I'd like to write go or rust but embedded constraints are tough. I tried and the binaries are just too big!
How big is too big? I haven't run into any size issues writing very unoptimized Go targeting STM32F4 and RP2040 microcontrollers, but they do have a ton of flash. And for that, you use tinygo and not regular go, which is technically a slightly different language. (For some perspective, I wanted to make some aspect of the display better, and the strconv was the easiest way to do it. That is like 6k of flash! An unabashed luxury. But still totally fine, I have megabytes of flash. I also have the time zone database in there, for time.Format(time.RFC3339). Again, nobody does that shit on microcontrollers, except for me. And I'm loving it!)
Full disclosure, Python also runs fine on these microcontrollers, but I have pretty easily run out of RAM on every complicated Python project I've done targeting a microcontroller. It's nice to see if some sensor works or whatever, but for production, Go is a nice sweet spot.
I have 16MB of flash and I wanted to link in some webrtc Go library and the binary was over 1MB. As I had other stuff it seemed like C was smaller.
I took a look at using github.com/pion/webrtc/v3 with tinygo, but it apparently depends on encoding/gob which depends on reflection features that tinygo doesn't implement. No idea why they need that, but that's the sort of blocker that you'll run into.
The smallest binary rustc has produced is 138 bytes.
It is true that it’s not something you just get for free, you have to avoid certain techniques, etc. But rust can fit just fine.
Do you have a link to an article / report about that 138 byte program? I'd be interested how to achieve that.
https://github.com/tormol/tiny-rust-executable
got it to 137, but here's a blog post explaining the process to get it to 151:
http://mainisusuallyafunction.blogspot.com/2015/01/151-byte-...
I'd also like to see the smallest Rust binaries that are achieved by real projects. When the most size-conscious users use Rust to solve real problems, what is the result?
Our embedded OS, Hubris, released yesterday, with an application as well, the entire image is like, 70k? I’ll double check when I’m at a computer.
Yeah, so, using arm-none-eabi-size, one of these apps I have built, the total image size is 74376 bytes. The kernel itself is 29232 of those, the rest are various tasks. RCC driver is 4616, USART driver is 6228.
We have to care about binary size, but we also don't have to _stress_ about it. There are some things we could do to make sizes get even smaller, but there's no reason to go to that effort at the moment. We have the luxury of being on full 32-bit environments, so while there isn't a ton of space by say, desktop or phone standards, it's also a bit bigger than something like a PIC.
Lots of folks also seem to get the impression that Rust binary sizes are bigger than they are because they use ls instead of size; we care deeply about debugging, and so the individual tasks are built with debug information. ls reports that rcc driver is 181kb, but that's not representative of what actually ends up being flashed. You can see a similar effect with desktop Rust software by running strip on the binaries, you'll often save a ton of size from doing that.
This is not true. Lots of people are putting Rust on microcontrollers now - just have to stick to no_std.
I wondered this when I recently saw that flatpak is written in C. In particular for newer projects that don't have any insane performance constraints I wonder why people still stick to non memory-managed languages.
Dumb question: Do we need to use C++ anymore? Can we just leave it to die with video games? How many more years of this crap do we need before we stop using that language. Yes I know, C++ gurus are smart, but, you are GOING to mess up memory management. You are GOING to inject security issues with c/c++.
If you drop the parts of C++ that are that way because of C it is a much safer language. Weird and inconsistent, but someone who is writing C++ wouldn't make the error in the code in question any more than they would in rust. In C++ we never is unbounded arrays, just vectors and other bounded data structures.
I often see students asking a C++ question and when I tell then that is wrong they respond that their professor has banned vector. We have a real problem with bad teachers in C++, too many people learn to write C that builds with a C++ compiler and once in a while has a class.
If im going to be making code that needs to run fast, works on a bit level, and isn't exposed to the world, then I am picking up C++
It's more convenient than C. It's easier to use (at the cost of safety) compared to Rust.
Perhaps this will change if I know rust better. But for now C++ is where it's at for me for this niche.
C/C++ is great for AI/ML/Scientific computing because at the end of the day, you have tons of extremely optimized libraries for "doing X". But the thing is, in those use cases your data is "trusted" and not publicly accessible.
Similarly in trading, C/C++ abounds since you really do have such fine manual control. But again, you're talking about usage within internal networks rather than publicly accessible services.
For web applications, communications, etc.? I expect we'll see things slowly switch to something like Rust. The issue is getting the inertia to have Rust available to various embedded platforms, etc.
I'm a big proponent of rust, but I doubt rust will displace nodejs & python for web applications.
Web applications generally care much more about developer onboarding and productivity than performance. And for good reason - nodejs behind nginx is fast enough for the bottom 99% of websites. Rust is much harder to learn, and even once you've learned it, its more difficult to implement the same program in rust than it is in javascript. Especially if that program relies heavily on callbacks or async/await. For all the hype, I don't see rust taking off amongst application developers.
Its a great language for infrastructure & libraries though. This bug almost certainly wouldn't have existed in rust.
Please....
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=rust
I read "double free", "denial of service", "out-of bounds read", "NULL pointer dereference", etc...
And that's a list of vulnerabilities found for a language that is barely used compared to C/C++ (in the real world).
It won't change. C/C++ dominates and will dominate for a very long time.
You need to read the list more carefully.
• The list is not for Rust itself, but every program ever written in Rust. By itself it doesn't mean much, unless you compare prevalence of issues among Rust programs to prevalence of issues among C programs. Rust doesn't promise to be bug-free, but merely catch certain classes of bugs in programs that don't opt out of that. And it delivers: see how memory unsafety is rare compared to assertions and uncaught exceptions:
https://github.com/rust-fuzz/trophy-case
• Many of the memory-unsafety issues are on the C FFI boundary, which is unsafe due to C lacking expressiveness about memory ownership of its APIs (i.e. it shows how dangerous is to program where you don't have the Rust borrow checker checking your code).
• Many bugs about missing Send/Sync or evil trait implementations are about type-system loopholes that prevented compiler from catching code that was already buggy. C doesn't have these guarantees in the first place, so in C you have these weaknesses all the time by design, rather than in exceptional situations.
> The list is not for Rust itself, but every program ever written in Rust.
This is, I think, obvious unless you are talking about C/C++ compiler bugs (which I am not).
But if you think it that way, the same happens with C/C++! Besides compiler bugs, the published CVEs for C/C++ "are not for C/C++ itself, but every program ever written in C/C++".
Still, potential severe vulnerabilities in Rust stdlib can happen too (
https://www.cvedetails.com/cve/CVE-2021-31162/
or
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-3631...
), so there is a blurry limit between "a bug is in Rust" or not.
> unless you compare prevalence of issues among Rust programs to prevalence of issues among C programs.
I don't know but you mention an interesting PoV. Do you propose to compare the ratio of total quantity of code EVER written in C/C++ and the numbers of "issues" found? Compared to the quantity of "issues" per code EVER written in Rust?
I guess nobody can figure out those numbers, but if I have to guess, then the ratio of total "Quantity of code"/"issues" will be in favor of C/C++.
I don't want to discuss the cause/reason of those Rust vulnerabilities, but the message I read is: _"unaware programmers can still introduce potential UAF, double-free, read uninitialized memory, dereference NULL pointers, etc. Just like with C/C++ but just in smaller quantities"_.
Mitre doesn't tag every program written in C with the C tag.
A few bugs in std happened, but they're also mostly in edge-case situations that in C/C++ would be either straight-up UB or "you're bad for even trying this", like integer overflow, throwing an exception from a destructor, or implementing operator overloading that gives randomized results.
It's not a tag but a keyword search. I don't know projects entirely written in Rust (other than single crates) to look for (by name). So probably there are more Rust-based vulnerabilities around than the ones from the Mitre query from the link.
And yes, edge-cases are the worst. The only concern is that these vulnerabilities were introduced by people who know the language more than anyone (I'd like to think that patches mainlined into std are written, revised and maintained by the best Rust developers). I'm afraid to imagine what kind of vulnerabilities could a person like me introduce in my own Rust programs.
That's the point of splitting Rust into safe and unsafe. If you're not trusting yourself with C-like danger, then stick to writing safe Rust. When you need to do something risky, then it will require unsafe{} blocks, which are a signal to be extra careful, and stand out in code reviews.
Also keep in mind that std is in unusual position, because it provides the unsafe foundation for safe programs. For example, you as a Rust user can't cause any memory unsafety when using the String type, but std had to define it for you from unsafe primitives. This concept of building safe abstraction from unsafe primitives is similar to Python: CPython is written in an unsafe language, but this unsafety is hidden from Python programs.
>"Dumb question..."
Yep
C++ is awesome and fast, don't blame it for human error.
when "human error" happens at a much higher rate than the alternatives, it's fair to blame it.
FWIW, Go absolutely would not stop you writing unbounded data into a bounded struct. Idiomatic Go would be to use byte slices which auto-resize, unlike idiomatic C, but you still have to do it.
Go would stop this from being exploitable. You might be able to make a slice larger than it is "supposed" to be, but it won't overwrite anything else because Go will be allocating new memory for your bigger slice.
But this is hardly a big claim for Go. The reality is that of all current languages in use, _only_ C and C++ will let this mistake happen and have the consequence of overwriting whatever happens to be in the way. Everything else is too memory safe for this to happen.
What's the exploit path assuming no use of unsafe?
I can see situations where I could probably get go to crash, but not sure how I get go to act badly.
Note: Not a go / Haskell / C# expert so understanding is light here.
Go is sometimes considered memory unsafe because of the presence of data races. (This is a controversial semantics.)
Even in the case of data races, you could not develop an exploit like the one discussed in this blog post, right? It's kinda a non-sequitur in this context?
Go allows data races to arbitrarily corrupt metadata, which is the precursor to an exploit like this. A brief rule-of-thumb is if the race allows you to directly touch data that isn't available via APIs, such as the count on a vector–once you do that, you can "resize" it to be larger than its actual size, and run off the end to do whatever you want. (There are many other ways to achieve something similar: type confusion, etc.)
Then Java is also unsafe by the same standard.
Why do you say that? Go's data races can produce memory corruption through bounds checking failures. I'm not aware of Java having that kind of memory corruption.
Yes. Java and C# are unsafe by the same standard. Last I checked, they both allow multiple threads to nonatomically modify a raw variable - which can lead to data races and data corruption.
I haven't heard of anyone getting RCE out of this class of bug in C# or java though.
JVM semantics[1] don't allow this to corrupt program state arbitrarily, just the data variables in question, right? Whereas in Go.. Search for "corruption" in
https://research.swtch.com/gomm
[1] Not just Java, meaning all the many nice JVM languages enjoy this is as well, eg Clojure, Scala etc
It's common to exploit JavaScript engines with this kind of bug, and JS engines probably have better security teams at this point, so I expect you could get an RCE if you really tried.
Yes, but those would be bugs in the runtime itself, rather than in the programming language. Java and JavaScript both define behavior of racy code; Go does not.
Go has no "unsafe" keyword and several parts of the language are unsafe, you're thinking of Rust which has much tighter guarantees.
Go idioms, like accepting data into buffers that are resized by "append", work around the unsafe parts of the language.
Go has an unsafe package.
Is there an example of even "bad" go code that gets you from a overflow to an exploit? I'm curious, folks (usually rust folks) do keep making this claim, is there a quick example?
You can totally do this with bad concurrency in Go: read-after-write of an interface value may cause an arbitrarily bad virtual method call, which is somewhat UB.
I am not aware of single goroutone exploits, though.
Concurrency issues / bad program flow feel a bit different don't they? I mean, I can store the action to take on a record in a string in any language, then if I'm not paying attention on concurrency someone else can switch to a different action and then when that record is processed I end up deleting instead of editing etc.
I mention this because in SQL folks not being careful end up in all sorts of messed up situations with high concurrency situations.
It's a different kind of bug–changing the type on a record cannot give you a shell, it can just let you do something funny with the record, such as deleting it. Which is bad, of course, but a _bounded_ bad.
Memory corruption is _unbounded_ bad: in general, corruption is arbitrary code execution. Your program might never interact with the shell but it's going to anyways, because an attacker is going to redirect code execution through the libc in your process. This is just not possible in languages like Java* , which provide details of what kinds of (mis)behaviors are permissible when a race occurs. The list of things is always something like "one of the two writes succeeds" or similar, not "¯\_(ツ)_/¯".
*Barring bugs in the runtime, which do exist…but often because they're written in unsafe languages ;) Although, a bug in runtime written in a safe language will also give you arbitrary code execution…but that's because the job of the code is to enable arbitrary code execution.
Is it idiomatic go to memcpy into a struct? I would think that this whole paradigm would be impossible in safe golang code.
That's what I'm trying to understand.
Let's ignore idiomatic code, people do crazy stuff all the time.
What's the go example that gets you from for example an overflow to exploit? That's what I'm trying to follow (not being an expert).
I am skeptical that you could do it without either using the assembler or the unsafe package, but we will see what Julian says.
Idiomatic go would have you using bounded readers though.
Either you read data into a fixed byte[] and stop at its capacity, or you read data into an unbounded byte[] by using append, and Go looks after the capacity, either way, you can't go off the end.
This sounds like a very good argument for switching over to Rust.
Genuine question: How to switch codes written in 2003 to Rust?
Maybe this is not so much about switching individual existing projects to Rust, but about switching the "industry".
Some new projects are still written in C.
Yeah, this makes sense. I'm optimistic and would like to say we're already half-way there. From my PoV very few new projects were written in C in recent years, except those inherently-C (their purpose was to call some other libraries written in C, or libc) and/or embedded (code size and portability requirements).
The same way you would write code written in 2021 over to Rust: by rewriting it from the ground up.
Auto-translation won't work because Rust won't allow you to build it in the same way you would have built it in C. It requires a full up redesign of the code to follow the Rust development model.
> Auto-translation won't work because Rust won't allow you to build it in the same way you would have built it in C.
That is not entirely true, but if you translate the C code to Rust, you get C code, in Rust, with similar issues (or possibly worse).
Of course the purpose would be to clean it up from there on, but it's unclear whether that's a better path than doing the conversion piecemeal by hand. The C2Rust people certainly seem to think so, but I don't know if there are good "client stories" about that path so far, whereas the manual approach does have some (e.g. librsvg), though it's not for the faint of heart.
> That is not entirely true, if you translate the C code to Rust, you get C code, in Rust, with similar issues (or possibly worse).
thus it was basically true after all? Like, sure, Rust is turing-complete so you can simulate whatever C did and thus _technically_ you can translate anything that C can do into Rust. But if it doesn't fix any problems, then have you really translated it into Rust?
> thus it was basically true after all?
No?
> Like, sure, Rust is turing-complete so you can simulate whatever C
It's not simulating anything, and has nothing to do with turing completeness.
> But if it doesn't fix any problems, then have you really translated it into Rust?
Yeees? Unless your definition of "translated" has nothing to do with the word or the concept.
You end up with a project full of rust code which builds using rust's toolchains. That sounds like a translation to me.
The jury's out on whether Rust code with unsafe around all of it is better than C. It's good that you can slowly reduce the unsafe, which is not something you can do with C, but the code is frequently harder to read and may have new bugs.
> The jury's out on whether Rust code with unsafe around all of it is better than C.
The goal definitely isn't to keep it as is, the entire point of C2Rust is to provide a jumping point then not have to shuffle between the two as you perform the conversion.
librsvg 1.0 was in 2001. Federico Mena-Quintero started switching it to Rust in 2017 (well technically October 2016), the rewrite of the core was finished early 2019, though the test suite was only finished converting (aside from the C API tests) late 2020.
So... carefully and slowly.
Thanks, that's exactly what I'm seeking for.
Before that I've never heard of projects which successfully did C to Rust transition, keeping its C API intact and could be used as a drop-in replacement. Glad to hear that there are already some success stories.
Not quite the same, but maybe of interest:
https://daniel.haxx.se/blog/2020/10/09/rust-in-curl-with-hyp...
By work? It's pure risk-management, do you need it? Is it worth the potential risk/work?
Slowly and steadily.
More than just one memory safe language
But Mozilla is already using Rust. They are a major proponent of Rust, so if they did switch to a memory safe language it would seem like Rust would be the most likely choice for them.
Not very many with 1) no garbage collection and 2) any meaningful open source adoption.
This component (NSS) would work fine with GC.
Rust has been around for more than a decade now and is still very niche. Evidently, it isn't a good enough argument.
I don't understand your argument.
In 1982 C was a decade old and still very niche.
In 1992 C++ was a decade old and still very niche.
In 2002 Python were both about a decade old and very niche.
In 2005 Javascript was a decade old and still very niche (only used on some webpages, the web was usable without javascript for the most part).
I think it's safe to say that all of them went on to enjoy quite a bit of success/widespread use.
Some languages take off really fast and go strong for a long time (php and java come to mind).
Some languages take off really fast and disappear just as fast (scala, clojure).
Some languages get big and have a long tail, fading into obscurity (tcl, perl).
Some languages go through cycles of ascendancy and descendancy (FP languages come to mind for that).
Dismissing a language because of it's adoption rate seems kinda silly - no one says "don't use python because it wasn't really popular til it had existed for over a decade".
Languages take off broadly because there's something compelling about them (and it isn't necessarily a technical reason). One of most compelling reasons for adopting Rust is memory safety and that may not be terribly compelling.
The comment about it being over a decade old was mostly that it wasn't some new thing that people are unsure about where it can be used. It's mature and has been successful in some niches (and keep in mind that niches can be large).
How is memory safety not a compelling argument when we're literally in a thread about memory unsafety leading to security exploits?
> How is memory safety not a compelling argument when we're literally in a thread about memory unsafety leading to security exploits?
The ratio of { memory-safety-bugs-in-code : bugs-in-code } is too small in many cases to warrant redoing the entire project.
My last C (and C++) role: over the course of 3 years, a large C++ project had over 1000 bug reports closed, of which _one_ turned out to be a memory safety issue that would have been prevented in Rust. My previous C position had a similar rate.
It's hard to convince the company to throw a 5-person team at a rewrite project for 3 years just to avoid the bug that they got in the previous 3 years, in the process incurring _new_ bugs.
If you're the owner of a company, you'd dismiss anyone who tells you that you need to spend a few million redoing work already completed, with additional risk that the redone work will have _new_ errors that were already fixed in the existing work.
But the “vulnerability value” of memory-safety bugs is much higher: it’s almost always a security exploit.
Also, you don’t have to rewrite your code base immediately, or at all: just build new parts in Rust or another compatible memory-safe language.
your code probably isn't worth attacking
> your code probably isn't worth attacking
Maybe, maybe not. What bar do _you_ set for "this product is worth attacking"? Because the product owners and the product users definitely thought that their product was valuable[1].
[1]The products in question were 1) Payment acquirer and processor, and 2) Munitions control software.
a product can be valuable without being worth attacking, like something that only runs on trusted inputs. the bar for software "worth attacking" for me is that there are people who are paid mainly to find exploits on it, and not just incidentally as a product of development.
> a product can be valuable without being worth attacking, like something that only runs on trusted inputs. the bar for software "worth attacking" for me is that there are people who are paid to find exploits on it.
I'm in agreement, but then we get back to the fact that memory safety may not be a compelling argument to use a new language.
After all, _most_ products aren't "worth attacking" until they are large and successful enough, thus reinforcing the decision not to rewrite a product in a new language. This means that memory safety alone is not a compelling argument for switching languages.
why is the "payment processing" software written in c/c++ and not something like java in the first place? i would imagine not needing to care too much about memory would speed up development velocity. was it before java became popular? i can understand the munitions control software possibly running on a constrained device and not needing very sophisticated memory management (cue the old joke about the "ultimate in garbage collection").
> why is the "payment processing" software written in c/c++ and not something like java in the first place?
Until recently[1] payment terminals were memory constrained devices. Even right now, a significant portion of payment terminals are constrained (128MB of RAM, slow 32-bit processors).
Even though Java was around in 2006, the payment terminals I worked on then ran on 16-bit NEC processors (8088, basically).
Even if we _aren't_ looking payment terminals (or any of the intermediaries), there's still a large and not-insignificant class of devices for which anything but C or a reduced set of C++ is possible. Having common libraries (zip, tls, etc) written in C means that those libraries are usable on all devices from all manufacturers.
[1] The industry-wide trend right now is a move towards android-based terminals. While this does mean that you can write applications in Java and Kotlin for these terminals, it's still cheaper to port the existing C or C++ codebase to Android and interface via JNI, as that reduces the costs (of which certification is a significant minority).
Even right now, there is more portability in writing the EMV transaction logic in plain C because then it can be used from almost anywhere. A team that went ahead and wrote the core payment acquisition logic in Java would find themselves offering a smaller variety of products and will soon get beaten in the market by those manufacturers who packaged the logic up into a C library.
Don't underestimate how price-sensitive embedded consumers are. A savings of a few dollars per product can absolutely lead to a really large leg up over the competition.
idk, many large companies report about 70% of their security issues are due to memory unsafety. reducing your security bugs by a factor of 3 sounds pretty compelling to me...
Yeah, it's weird. I really don't know why Rust isn't more popular.
Edit: I just read another comment you made about somebody's program not being worth attacking. I think you are on to something with that argument. Most software isn't worth attacking.
> Some languages take off really fast and disappear just as fast (scala, clojure).
I don't know about Clojure but I don't think that Scala has "disappeared". The hype has subsided, certainly. I for one certainly hope that one of the best programming languages in existence doesn't disappear.
Actually, we are talking the creators of rust here. The same guys who were owning it with the idea to rewrite the entire browser in it. The more plausible reason might be that the rewrite to rust haven't advanced to this component yet.
Yeah, that could be. I was speaking about the wider development ecosystem. Rust is doing well in a few places and that's enough for it to survive and exist long term, or at least as long as Mozilla is relevant.
Mozilla's relevance hasn't mattered to Rust for a while.
So if Mozilla decided to step back from Rust it wouldn't be a major blow to Rust? I was under the impression that they were still important.
They already stepped back a year ago. They fired most of the rust team.
https://blog.mozilla.org/en/mozilla/changing-world-changing-...
They fired all of their employees who worked on Rust as their job, but that was a very small percentage of the overall Rust team, to be clear. Like, one or two percent.
They are still members of the Rust Foundation though.
A decade seems like an appropriate amount of time for a language to mature and take off. Ruby was very niche for 10 years until rails came out. Rust now seems to be spreading pretty steadily and smaller companies are trying it out.
I guess the ecosystem that might make a language attractive was not built overnight. I'm not sure looking at the popularity since an initial release is the best way to measure how good a language is for a particular purpose.
To me, PORT_Memcpy is one problem here.
There are two buffers and one size -- the amount of memory to copy.
There should be PORT_Memcpy2(pDest, destSize, pSource, numBytesToCopy) (or whatever you want to call it) which at least prompts the programmer to account for the size destination buffer.
Then flag all calls to PORT_Memcpy and at least make a dev look at it.
(Same for the various similar functions like strcpy, etc.)
Of course it would just end up being
PORT_Memcpy2(cx->u.buffer, sigLen, sig->data, sigLen);
Someone could do that, but the point of having the dest buffer size is to at least give the programmer a chance to try to get it right.
I also wonder if a linter could notice that the dest buffer size passed isn’t the actual size of the buffer. (That leads the the next problem in the code, if you look at the definition of that buffer, so that’s good.)
numBytesToCpy != sourceSize
Otherwise that's a potential read out of bounds.
Ah; the title means "this shouldn't have happened [because the vender was in fact doing everything right]", not "this shouldn't have happened [because it's so stupid]".
Don't think for a minute this wasn't on purpose.
Project Zero exists for the sole purpose of trashing and defacing Google competition.
In the absence of actual process failure to report on they just resort to a disparagingly memorable title.
This is baseless. As per the article Google Chrome used NSS by default for years during which this vulnerability existed, so they're admitting their own product was affected. The article goes into detail about how Google's oss-fuzz project neglected to find this bug.
The author was even so kind as to boldface the first sentence here saying "the vendor did everything right":
> This wasn’t a process failure, the vendor did everything right. Mozilla has a mature, world-class security team. They pioneered bug bounties, invest in memory safety, fuzzing and test coverage.
I don't know how anyone could find a more gracious way to find and publish a vulnerability.
>_This wasn’t a process failure, the vendor did everything right. Mozilla has a mature, world-class security team. They pioneered bug bounties, invest in memory safety, fuzzing and test coverage._
Yep, definitely sounds like Project Zero is trashing Mozilla in this blog post.
They checked for process failure didn't they?
Nobody will remember that line. Everyone is going to remember the title.
The title doesn't name a vendor... So you'd have to read the article to see the vendor, where you would presumably read the line where they say they have a "world-class security team" among other praise.
I don't like Google one bit, but my god these are some extraordinary hoops people are jumping through just so they can yell "Google's evil!".
I mean Google has this blog specifically to report on security vulnerabilities.
That is literally like Volvo running a YT channel where they crash test cars from other companies and assess the damage to the dummy. "In the name of safety."
I'm not the one stretching here.
Google's security team is doing exactly what everyone in the security industry considers a best practice: publicly disclosing vulnerabilities. However, since you're putting quotes around 'safety', I'll assume you must not be familiar with the fact that it is widely regarded as a best practice, and suggest you start with [0]. Out of curiosity, would you feel safer if the vulnerabilities Project Zero found were never found or fixed? Would you feel safer if they were found, but kept hidden so that other similar programs/libraries were unaware of the potential vulnerabilities?
The only people that seem to be upset about this are those on HN and Reddit who use every excuse to get angry at Google, context be damned.
Your analogy completely falls apart when you consider that Project Zero reports on their own products (by blog posts - Google: 24, Apple: 28, MS: 36). Make sure to consider the security footprint of the products offered between the three, as more footprint results in more bugs, and you'll see they (comparatively to the footprint) report on Google owned products quite frequently.
A more apt analogy would be if crash safety experts at Volvo spun off their own crash test centre, loosely under the umbrella of Volvo, and tested _all_ (including many Volvo made) available cars for safety then publicly reported the results, data, and methodology. Then they offer suggestions for all car manufacturers, including Volvo, on how to make their cars safer for use.
Personally, I'd be just fine with that, too.
[0]
https://www.schneier.com/essays/archives/2007/01/schneier_fu...
There's probably a disconnect between safety culture and PR culture here. It is true that this shouldn't have happened. It's extremely important to work out how to avoid it happening in a general sense. No-one here is angry at NSS or Mozilla. This safety culture pose reads to you as an attack because you're reading it in PR culture pose.
Ex googler.
Project zero is amazing and well respected. I don't get why the hate.
> Until 2015, Google Chrome used NSS, and maintained their own testsuite and fuzzing infrastructure independent of Mozilla. Today, Chrome platforms use BoringSSL, but the NSS port is still maintained...
> Did Mozilla/chrome/oss-fuzz have relevant inputs in their fuzz corpus? YES.
Since this comes up whenever there is a Project Zero article, here is a summary I made in summer 2020 on the distribution of the bugs they find/report:
Since this always comes up, here's an overview I made several weeks ago about where Project Zero focuses their efforts:
All counts are rough numbers. Project zero posts:
Google: 24
Apple: 28
Microsoft: 36
I was curious, so I poked around the project zero bug tracker to try to find ground truth about their bug reporting:
https://bugs.chromium.org/p/project-zero/issues/list
For all issues, including closed:
product=Android returns 81 results
product=iOS returns 58
vendor=Apple returns 380
vendor=Google returns 145 (bugs in Samsung's Android kernel,etc. are tracked separately)
vendor=Linux return 54
To be fair, a huge number of things make this not an even comparison, including the underlying bug rate, different products and downstream Android vendors being tracked separately. Also, # bugs found != which ones they choose to write about.
I'm not clear. What do these varying counts imply to you?
Because someone will invariably accuse project zero of being hostile to google's competitors.
> I made several weeks ago about where Project Zero focuses their efforts
Are you sure the numbers track where they put effort? I can think of a couple of confounding factors (including P0 methodologies, and number of low-hanging[1] bugs in target products)
1. Relatively speaking
I absolutely agree that there are numerous reasons why the numbers are not equal.
My sole reason for posting is the "P0 is a hit squad against Apple/Mozzilla/Microsoft" comments that come up whenever their blog posts end up on HN.
Why isn't static analysis taint-checking the boundedness of data? Unbounded data should be flagged as unbounded and that flag should propagate through checking until it can be proven to be bounded.
The disclosure and test cases:
https://www.openwall.com/lists/oss-security/2021/12/01/4
Kinda tangent, but when I was browsing NSS' repo (
https://hg.mozilla.org/projects/nss
or mirror:
https://github.com/nss-dev/nss/commits/master
) I found that the latest commit has a much older date (7 weeks ago) than the following ones. Why is that? (Sorry I don't know much about git other than push/pull.)
The date of the commit is metadata which can be pushed later than it was made or even altered.
If you look around you can find cute tools to alter your repo history and have the github commit history graph act as a pixelated billboard.
To expand on what others have said, the date shown is when the change was authored, which is not neccesarily the date when the commit object was created.
In open source collaborative development, pathes are usually shared either old-style on mailing lists or review software (phabricator in this case it seems) as patches which include a date and then only applied in the repo once they are reviewed.
You can also get non-monotonic authorship dates without leaving a git repo (and without manually overriding the date) by cherry-picking or rebasing commits onto different branches.
Also, the first link is not a git repository but a mercurial one.
Committed locally long ago and recently pushed?
Good lesson also about how much our security relies on these largish ongoing fuzzing efforts, and makes you think what's going on at even larger fuzzing efforts that are less public.
Always place char arrays at the end of a struct - rule of thumb I heard somewhere, maybe from CERT-C
That way if you do have memory corruption, the memory following your buffer is less predictable.
I'm really curious why static analysis didn't catch this. If they weren't doing static analysis, I would probably have asserted that static analysis would catch this fairly easily.
My guess would be too many (false?) positives on bounds-checking causing them to disable that check, but I can't be sure.
Now take a deep look at the POSIX C standard, Annex K. The bounds checked extensions. Using these would have definitely avoided the problem. memcpy_s requires the size of dest to be defined.
The applause goes to the glibc maintainers, who still think they are above that.
Annex K is pretty awful[1]. There are plenty of fine solutions here in hindsight, but glibc adopting Annex K isn’t one of them. NSS has a plethora of build targets, including Windows - which does not implement Annex K either (despite inspiring it).
[1]:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm
Wrong. First it's good, not awful. Awful are only the ones not using it.
Second, Windows implements Annex K as only major provider. The others are some minor embedded targets, plus Android Bionic recently.
Implementations, such as safeclib, are cross platform. you can use it everywhere. Security and crypto people per se don't use it (incompetence or not invented here), but security aware people, such as on embedded or in the industry.
Doesn't memcpy_s not take into account the sources size though? You could still read past the end of an object with it, right?
nope, it checks both. How could you call it secure without checking both sizes? memcpy_s(dest,dmax,src,slen)
and my implementation safeclib even at compile-time, similar to glibc's FORTIFY.
For at least a decade, and in the teams I was in, C"++" written like this would not pass code review precisely because it is incredibly brittle.
Uhuh. On cue a C++ programmer arrives to tell us that a _true_ Scotsman wouldn't have introduced this bug... Where can we see "at least a decade" of this code you and your teams wrote?
Issue #2 Arbitrary size limits.[...]
A reasonable choice might be 2^24-1 bytes, the largest possible certificate
How does one treat untrusted input whose length might exceed available memory? I am working on a patch for a jwks implementation which does not even have upper bounds in the spec. Accepting any valid input until OOMing seems like a suboptimal solution.
In a sense, reducing the error case to the physical limitations of a device is a perfectly "optimal" solution
Generally, taint tracking.
You'd think that in the wake of "goto fail;" the organization that pioneered rust would have rewritten their core TLS certificate checking library in rust by now, hacker news tropes notwithstanding.
Which end-user applications are affected? And what would an attacker have to do to exploit this in the wild?
Why don't we have linters that would complain about copying memory without bounds checking?
The sooner we can rewrite our programs in Go and Rust, the more secure we will be. Our shells, coreutils, mail readers and web browsers have to be written in safer languages.
Also, far, _far_ easier to build than all of these C programs with their own bespoke build systems and implicit dependency management. The more of the software stack that can be built by mere mortals, the better.
Honestly I don't like the build process of most go/rust/javascript software _any_ better than C++.
It's harder to find the dependencies for building the latter, but the former has its own version of dependency hell. I have real trouble building both types of projects, though admittedly (especially when the building instructions don't work when followed to the letter) C++ a bit more than the strategy of "everything is just pulled from github, you only have to make sure you've got gigabytes of free space in ~/.cache/, a build environment that was released in the past four to nine days, and have appropriate isolation or simply not care about potentially vulnerable or compromised code being run on your system".
On a rare occasion, I will find a nice and small program using only standard libraries that compiles simply with `cc my.c && ./a.out` or runs simply with `python3 my.py`, demonstrating it doesn't depend on the language to have an easy time building it, but in both categories it's the exception for some reason. I see so much software that needs only standard libraries and runs on literally any python version released in the last decade, but to run it you have to setup some environment or globally install it with setuptools or something.
> everything is just pulled from github
I hear this a lot, but I can't divine any substance from it. Why is GitHub a less-secure repository medium than SourceForge + random website downloads + various Linux package managers? Maybe this is a red herring and your real complaint is that the Rust ecosystem is less secure than the C/++ ecosystem?
> you only have to make sure you've got gigabytes of free space in ~/.cache/
I cleared my $GOCACHE relatively recently (I thought maybe I had a cache issue, but I was mistaken), but it's currently at 75M while my .cargo directory weighs 704M. If these ever really got too big I would just `go clean -cache` and move on with life. If this is one of the biggest issues with Go/Rust/etc then I think you're arguing my point for me.
> a build environment that was released in the past four to nine days
What does this even mean? You can compile Go or Rust programs on any Linux machine with the build tools. On the contrary, C/C++ dependencies are _very tightly coupled_ to the build environment.
> have appropriate isolation or simply not care about potentially vulnerable or compromised code being run on your system
Not sure about Rust programs, but Go programs absolutely don't run arbitrary code at compile/install time. C programs on the other hand absolutely _do_ run arbitrary code (e.g., CMake scripts, Makefiles, random bash scripts, etc).
> I see so much software that needs only standard libraries and runs on literally any python version released in the last decade, but to run it you have to setup some environment or globally install it with setuptools or something.
Yeah, Python package management is a trashfire; however, this is _entirely_ because it is so tightly coupled to C dependencies (many Python libraries are thin wrappers around various C programs, each with their own bespoke build system). Python package management tries to paper over the universe of C packages and it kind of works as long as you're on a handful of well-supported distributions and your dependencies have been well-vetted and well-maintained.
I don't think the exact URL is the problem, it is the fact that it is so easy to include dependencies from external repository that is the problem.
In Rust every non-trivial library pulls in 10s or even 100s of dependencies.
I don't think anyone can expect that all of these libraries are of good quality but how would one even try to verify that? And you have to verify it every time you update your project.
Then there is the issue of licencing - how to verify that I am not using some library in violation of its licence and what happens if the licence changes down the road and I don't notice it because I am implicitly using 500 dependencies due to my 3 main libraries?
Rust and Go have solved memory safety compared to C and C++ but have introduced dependency hell of yet unknown proportions.
Python and other dynamically typed languages are in a league of their own in that on top of the dependency hell they also do not provide compiler checks that would allow user to see the problem before the exact conditions occur at runtime.
They are good for scripting but people keep pumping out full applications and to be honest there is not much difference between giant Python application, giant maze of Excel VBA and giant Node.js heap of code. Of those, Excel VBA is most likely to work for 5 years and across 5 versions of the product yet it is also the most likely one to receive the most negative comments.
> I don't think the exact URL is the problem, it is the fact that it is so easy to include dependencies from external repository that is the problem. In Rust every non-trivial library pulls in 10s or even 100s of dependencies.
But it's also quite a lot easier to _audit_ those dependencies, even automatically (incidentally, GitHub provides dependency scanning for free for many languages).
> Then there is the issue of licencing - how to verify that I am not using some library in violation of its licence and what happens if the licence changes down the road and I don't notice it because I am implicitly using 500 dependencies due to my 3 main libraries?
This is also an automated task. For example,
https://github.com/google/go-licenses
: "go-licenses analyzes the dependency tree of a Go package/binary. It can output a report on the libraries used and under what license they can be used. It can also collect all of the license documents, copyright notices and source code into a directory in order to comply with license terms on redistribution."
> Rust and Go have solved memory safety compared to C and C++ but have introduced dependency hell of yet unknown proportions.
I mean, it's been a decade and things seem to be going pretty well. Also, I don't think anyone who has actually used these languages seriously has ever characterized their dependency management as "dependency hell"; however, lots of people talk about the "dependency hell" of managing C and C++ dependencies.
> Python and other dynamically typed languages are in a league of their own in that on top of the dependency hell they also do not provide compiler checks that would allow user to see the problem before the exact conditions occur at runtime.
I won't argue with you there.
> In Rust every non-trivial library pulls in 10s or even 100s of dependencies.
You're exaggerating here. The most recent project I've been working on pulls in 6 dependencies. The anyhow crate has no dependencies, regex 3 (recursively!), clap and csv each 8. Only handlebars and palette pull in 10s of dependencies, and I can trim a fair few dependencies of palette by opting out of named color support (dropping the phf crate for perfect hash functions).
It's typical for very large C projects to have perhaps 2-5 dependencies that aren't libc, often something very basic such as zlib, curl or openssl. A rust CSV parser has 8 dependencies?
Three of those dependencies are not entirely unreasonable: one is the actual "core" CSV implementation (with one recursive dependency), one is the dependency for the standard "this interface allows you to serialize/deserialize arbitrary data to arbitrary formats", and one crate that adds traits to make &[u8]/Vec<u8> work much more like &str/String (which has 4 recursive dependencies, although two of those are already mentioned). The last two dependencies are crates that provide accelerated routines for serializing integers and floating points, which are IMHO somewhat excessive, but if it's easy to pull in such dependencies, why not? It would be helpful in places where CSV serialization might actually be a bottleneck.
There are two key differences between large projects in ecosystems that have working package managers and those in C/C++. For starters, the basic working 'package' tends to be smaller. Consider something like LLVM (just LLVM itself, none of its subprojects like Clang). This you'd count as "one project" without any dependencies (at least, no required dependencies that aren't vendored, liked zlib and googletest). But if you were to make it in something like Rust, it would likely consist of a dozen or more packages: the core IR, analysis passes, command-line tools, transformations, codegen (with each target likely being its own package!). With such fragmentation, you'd have dozens of dependencies that are essentially nothing more than the 'project' itself.
But more importantly is that, because dependencies are hard to add, code duplication is the norm; it's only when what you want to do is too impossible to duplicate that you resort to a dependency. For any large C++ project, I can virtually guarantee that there will exist some form of 'standard library++' dependency in the code that will contain at least an intrusive linked list, better hashtable implementation, std::vector-lookalike that avoids heap allocation for small lists, replacement for iostream, helpers for process creation, custom string formatting routines, and something to make dealing with segfaults sane. In a package ecosystem, each of those kinds of things would be a separate dependency. Is it really necessary for everybody who wants to write something dealing with graphs to have write their own graph-based data structure, as opposed to just using the ecosystem-standard graph library (e.g., networkx in Python, or petgraph in Rust)? Or to have everybody that wants to use a hashtable to have to write their own implementation of a notoriously tricky data structure?
I wasn't attempting to make a value-judgement about whether something /should/ have these dependencies, but only to point out that if you think this is comparable to C-based projects you seemed to be pretty out of touch with common practices in that domain.
I take your point that NIH is a large thing driving this sort of divergence in many languages, with some (such as Common Lisp, and C) being more notorious than others.
But I'd say it's just as common in other languages for programmers to pile on needless dependencies. E.g. how many Rust/Node.js/Java etc. projects are using some "standard library++" to e.g. get better memory behavior for strings, but for whom string allocation overhead is never ever going to be an issue?
It would be an interesting metric to check what % of the code in a given library is commonly used by its consumers across languages. Even if you only used 1% you're implicitly assuming some of the maintenance overhead for 100% of it. E.g. if it won't build on an older toolchain that's now a hassle for someone packaging your program, even if it's in the 99% you don't use.
A Go program wouldn't even need curl, zlib, or openssl as it has equivalent implementations in its stdlib.
The CSV crate has five of them:
https://crates.io/crates/csv/1.1.6/dependencies
druid v0.7.0 (\druid\druid)
druid-derive v0.4.0 (proc-macro) (\druid\druid-derive)
proc-macro2 v1.0.32
unicode-xid v0.2.2
quote v1.0.10
proc-macro2 v1.0.32 (_)
syn v1.0.81
proc-macro2 v1.0.32 (_)
quote v1.0.10 (_)
unicode-xid v0.2.2
[dev-dependencies]
druid v0.7.0 (\druid\druid) (_)
float-cmp v0.8.0
trybuild v1.0.52
glob v0.3.0
lazy_static v1.4.0
serde v1.0.130
serde_derive v1.0.130 (proc-macro)
proc-macro2 v1.0.32 (_)
quote v1.0.10 (_)
syn v1.0.81 (_)
serde_json v1.0.69
itoa v0.4.8
ryu v1.0.5
serde v1.0.130 (_)
termcolor v1.1.2
winapi-util v0.1.5
winapi v0.3.9
winapi-x86_64-pc-windows-gnu v0.4.0
toml v0.5.8
serde v1.0.130 (_)
druid-shell v0.7.0 (\druid\druid-shell)
anyhow v1.0.45
cfg-if v1.0.0
instant v0.1.12
cfg-if v1.0.0
keyboard-types v0.5.0
bitflags v1.3.2
kurbo v0.8.2
arrayvec v0.7.2
lazy_static v1.4.0
piet-common v0.5.0-pre1
cfg-if v1.0.0
piet v0.5.0-pre1
kurbo v0.8.2 (_)
unic-bidi v0.9.0
matches v0.1.9
unic-ucd-bidi v0.9.0
unic-char-property v0.9.0
unic-char-range v0.9.0
unic-char-range v0.9.0
unic-ucd-version v0.9.0
unic-common v0.9.0
piet-direct2d v0.5.0-pre1
associative-cache v1.0.1
dwrote v0.11.0
lazy_static v1.4.0
libc v0.2.107
winapi v0.3.9 (_)
wio v0.2.2
winapi v0.3.9 (_)
piet v0.5.0-pre1 (_)
utf16_lit v2.0.2
winapi v0.3.9 (_)
wio v0.2.2 (_)
png v0.17.2
bitflags v1.3.2
crc32fast v1.2.1
cfg-if v1.0.0
deflate v0.9.1
adler32 v1.2.0
encoding v0.2.33
encoding-index-japanese v1.20141219.5
encoding_index_tests v0.1.4
encoding-index-korean v1.20141219.5
encoding_index_tests v0.1.4
encoding-index-simpchinese v1.20141219.5
encoding_index_tests v0.1.4
encoding-index-singlebyte v1.20141219.5
encoding_index_tests v0.1.4
encoding-index-tradchinese v1.20141219.5
encoding_index_tests v0.1.4
miniz_oxide v0.4.4
adler v1.0.2
[build-dependencies]
autocfg v1.0.1
scopeguard v1.1.0
time v0.3.5
tracing v0.1.29
cfg-if v1.0.0
pin-project-lite v0.2.7
tracing-attributes v0.1.18 (proc-macro)
proc-macro2 v1.0.32 (_)
quote v1.0.10 (_)
syn v1.0.81 (_)
tracing-core v0.1.21
lazy_static v1.4.0
winapi v0.3.9 (_)
wio v0.2.2 (_)
[dev-dependencies]
piet-common v0.5.0-pre1 (_)
static_assertions v1.1.0
test-env-log v0.2.7 (proc-macro)
proc-macro2 v1.0.32 (_)
quote v1.0.10 (_)
syn v1.0.81 (_)
tracing-subscriber v0.2.25
ansi_term v0.12.1
winapi v0.3.9 (_)
chrono v0.4.19
libc v0.2.107
num-integer v0.1.44
num-traits v0.2.14
[build-dependencies]
autocfg v1.0.1
[build-dependencies]
autocfg v1.0.1
num-traits v0.2.14 (_)
winapi v0.3.9 (_)
lazy_static v1.4.0
matchers v0.0.1
regex-automata v0.1.10
regex-syntax v0.6.25
regex v1.5.4
regex-syntax v0.6.25
serde v1.0.130 (_)
serde_json v1.0.69 (_)
sharded-slab v0.1.4
lazy_static v1.4.0
smallvec v1.7.0
thread_local v1.1.3
once_cell v1.8.0
tracing v0.1.29 (_)
tracing-core v0.1.21 (_)
tracing-log v0.1.2
lazy_static v1.4.0
log v0.4.14
cfg-if v1.0.0
tracing-core v0.1.21 (_)
tracing-serde v0.1.2
serde v1.0.130 (_)
tracing-core v0.1.21 (_)
unicode-segmentation v1.8.0
fluent-bundle v0.15.2
fluent-langneg v0.13.0
unic-langid v0.9.0
unic-langid-impl v0.9.0
tinystr v0.3.4
fluent-syntax v0.11.0
thiserror v1.0.30
thiserror-impl v1.0.30 (proc-macro)
proc-macro2 v1.0.32 (_)
quote v1.0.10 (_)
syn v1.0.81 (_)
intl-memoizer v0.5.1
type-map v0.4.0
rustc-hash v1.1.0
unic-langid v0.9.0 (_)
intl_pluralrules v7.0.1
tinystr v0.3.4
unic-langid v0.9.0 (_)
rustc-hash v1.1.0
self_cell v0.10.1
smallvec v1.7.0
unic-langid v0.9.0 (_)
fluent-langneg v0.13.0 (_)
fluent-syntax v0.11.0 (_)
fnv v1.0.7
instant v0.1.12 (_)
tracing v0.1.29 (_)
tracing-subscriber v0.2.25 (_)
unic-langid v0.9.0 (_)
unicode-segmentation v1.8.0
xi-unicode v0.3.0
[dev-dependencies]
float-cmp v0.8.0
open v1.7.1
winapi v0.3.9 (_)
piet-common v0.5.0-pre1 (_)
pulldown-cmark v0.8.0
bitflags v1.3.2
memchr v2.4.1
unicase v2.6.0
[build-dependencies]
version_check v0.9.3
tempfile v3.1.0
cfg-if v0.1.10
rand v0.7.3
getrandom v0.1.16
cfg-if v1.0.0
rand_chacha v0.2.2
ppv-lite86 v0.2.15
rand_core v0.5.1
getrandom v0.1.16 (_)
rand_core v0.5.1 (_)
remove_dir_all v0.5.3
winapi v0.3.9 (_)
winapi v0.3.9 (_)
test-env-log v0.2.7 (proc-macro) (_)
tracing-subscriber v0.2.25 (_)
druid-derive v0.4.0 (proc-macro) (\druid\druid-derive) (_)
druid-shell v0.7.0 (\druid\druid-shell) (_)
> What does this even mean? You can compile Go or Rust programs on any Linux machine with the build tools.
I can't build github.com/restic/rest-server because my golang compiler is too old. Admittedly I'm running something relatively old (but still supported) at the moment but this isn't the first time. Such errors are my experience every time I try to build go/js code. If you're actually using a stable distribution you're simply out of luck and need to find another system to contribute to those projects.
> Why is GitHub a less-secure repository medium than SourceForge + random website downloads + various Linux package managers?
I'd trust GitHub a lot more than SourceForge, but GitHub is where everyone hosts everything these days so I meant it to mean: random repositories from the internet. When trying to build the aforementioned project, I saw a it pull from a ton of domains, but also a lot form GitHub (many different authors).
That's the crux: I may trust github itself and the author of this one repository (in this case, the author is actually a friend of a friend, in other cases it might just be a well-known person like cperciva or sircmpwn), and if my OS is compromised then I'm screwed anyway so the repositories are also considered mostly trusted, but I don't for a minute believe that the author of the code I'm trying to build managed to vet the 600 repositories it's pulling.
Also, six hundred repositories. All this thing does is spawn an http server and handle a few API calls, acting as a dummy storage server (implementing things like 'get me this file' or 'put this file here').
> Go programs absolutely don't run arbitrary code at compile/install time
I didn't actually know that, that's nice. I mean, still if I can't trust the binary then it's of little use to me personally, but any malicious code would presumably have to be called into and build servers will have an easier time serving binaries, so that's a good thing.
Rust programs can in fact run arbitrary code at build time, unlike Go. Pros and cons to both approaches.
To be clear, Cargo doesn’t pull code from GitHub.
Autotools is the de facto build system for most of the GNU system programs. The bit about dependency management mostly fits but I would argue that letting us figure out how to build and install the dependencies is fairly UNIXy. It’s also unclear to me that centralized package managers are necessarily better for security, though they’re easier to use. Also a lot of more modern tools I’ve tried to build in recent months do not give a crap about cross compilation as a use case. At least with autotools its supported by default unless the library authors did something egregious like hard coding the sysroot or toolchain paths.
EDIT: Just re-read the below and realized it might sound terse and argumentative; apologies, I was typing quickly and didn't mean to be combative. :)
> I would argue that letting us figure out how to build and install the dependencies is fairly UNIXy
Crumby build systems _force_ you to figure out how to build and install dependencies (or die trying). Modern build systems _allow_ you to figure out how to build and install dependencies. If the former is "more UNIXy" than the latter, then I strongly contend that "UNIXy" is not a desirable property.
> It’s also unclear to me that centralized package managers are necessarily better for security, though they’re easier to use.
"Centralized" is irrelevant. Go's package manager is decentralized, for example. Moreover, many folks in the C world rely heavily on centralized repositories. Further, I would be _shocked_ if manually managing your dependencies was somehow _less_ error prone (and thus more secure) than having an expert-developed program automatically pull and verify your dependencies.
> Also a lot of more modern tools I’ve tried to build in recent months do not give a crap about cross compilation as a use case.
I mean, C doesn't care about _anything_, much less cross compilation. It puts the onus on the developer to figure out how to cross compile. Some build system generators (e.g., CMake, Autotools) purport to solve cross compilation, but I've always had problems. Maybe I just don't possess the mental faculties or years of experience required to master these tools, but I think that supports my point. By comparison, cross compilation in Go is trivial (set `CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build` works _virtually_ every time from any platform). I haven't done much Rust cross-compilation, but I would be surprised if it were harder than C/C++.
I can't speak for cmake but in autotools it's always just been "./configure --host=aarch64-linux-gnueabi" or whatever other target triplet is of relevance. It's similar in Meson.
The annoying factor is gathering the cross compiler and target dependencies, which is fortunately becoming easier with tools such as llvm/clang, and distros such as debian have always made huge efforts to facilitate it.
> Also a lot of more modern tools I’ve tried to build in recent months do not give a crap about cross compilation as a use case
On the other hand, more modern languages like Rust and Go tend to have a single standard library and compiler that can be used across basically any common operating system, whereas trying to use gcc/libstdc++ or clang/libc++ on Windows can be an ordeal, and using MSVC on Unix is essentially not an option at all. Cross compilation for other architectures might be a bit more work, but the majority of developers don't have to worry about that very much, whereas support across Linux, MacOS, and Windows is a lot more common.
> far, far easier to build than all of these C programs
One of my friends who work on AIX machines without direct Internet access does not share the same view, though.
Why is indirect Internet access less of a problem for C than Rust/Go/etc? Seems like for modern systems, you just run a pre-populated caching proxy on your target and `cargo install` like you normally would. In C, you're manually checking versions and putting files in the right spot on disk for every stage of the build (this can be alleviated a bit if you can find pre-built binaries and so on, but even in the best case it's far behind "advanced" systems).
> Why is indirect Internet access less of a problem for C than Rust/Go/etc?
Because C codes tend to have less dependencies and shallow/more "clustered" dependency graph.
To be fair, that's more or less due to dependency management being a 100% pain 0 fun experience.
I agree with your characterization of the dependency graphs, but I don't see how that changes the calculus. Let's say in both cases you're copying a tarball of dependencies onto your friend's AIX machine--why is it harder to copy a tarball with a few large dependencies rather than a tarball with more small dependencies (I also posit that the Rust tarball would be smaller because you're less likely to be bringing in things you don't need--e.g., think of all the things curl does versus a straightforward HTTP lib)?
> a few large dependencies
Unfortunately this does not match my experience :( Rust projects tend to depend on a lot of smaller, single-purposed (or, npm-y) dependencies, for example, Debian's ripgrep package has a X-Cargo-Built-Using saying:
X-Cargo-Built-Using: rust-aho-corasick (= 0.7.10-1), rust-atty (= 0.2.14-2), rust-base64 (= 0.12.1-1), rust-bitflags (= 1.2.1-1), rust-bstr (= 0.2.12-1), rust-bytecount (= 0.6.0-1), rust-byteorder (= 1.3.4-1), rust-cfg-if-0.1 (= 0.1.10-2), rust-clap (= 2.33.3-1), rust-crossbeam-utils (= 0.7.2-2), rust-encoding-rs (= 0.8.22-1), rust-encoding-rs-io (= 0.1.6-2), rust-fnv (= 1.0.6-1), rust-globset (= 0.4.5-1), rust-grep-cli (= 0.1.5-1), rust-grep (= 0.2.7-1), rust-grep-matcher (= 0.1.4-1), rust-grep-pcre2 (= 0.1.4-2), rust-grep-printer (= 0.1.5-1), rust-grep-regex (= 0.1.8-1), rust-grep-searcher (= 0.1.7-1), rust-ignore (= 0.4.16-2), rust-itoa (= 0.4.3-1), rust-lazy-static (= 1.4.0-1), rust-libc (= 0.2.80-1), rust-log (= 0.4.11-2), rust-memchr (= 2.3.3-1), rust-memmap (= 0.7.0-1), rust-num-cpus (= 1.13.0-1), rust-pcre2 (= 0.2.3-1), rust-pcre2-sys (= 0.2.2-1), rust-regex-automata (= 0.1.8-2), rust-regex (= 1.3.7-1), rust-regex-syntax (= 0.6.17-1), rust-ryu (= 1.0.2-1), rust-same-file (= 1.0.6-1), rust-serde (= 1.0.106-1), rust-serde-json (= 1.0.41-1), rust-strsim (= 0.9.3-1), rust-termcolor (= 1.1.0-1), rust-textwrap (= 0.11.0-1), rust-thread-local (= 1.0.1-1), rust-unicode-width (= 0.1.8-1), rust-walkdir (= 2.3.1-1), rustc (= 1.48.0+dfsg1-2)
Building it using cargo with Internet access is a breeze. Figuring out how to `cargo vendor` is not. And the sheer number of the dependencies makes it not practical to manually do stuff.
In short, what cargo actively supports and everyone uses are great, otherwise it's disaster.
If the trade is getting proper memory safety for the 99.99% of code which never has to be built directly on a mainframe without internet access in exchange for the code that does have to be built on mainframes without internet access a bit harder, I think I'm fine with that.
A century ago, buildings were quite dangerous, and likely to kill you in all sorts of situations. Wood burns, and concrete and brick don't. Clearly wood is an "unsafe material". But just changing the material didn't result in safer buildings. Buildings made of brick and concrete still killed people.
It turns out that there are a lot of factors that go into building safety. The material is one vulnerability, sure. But there's also the connection method and strength, the calculated loads, shear forces, wind, earthquakes, egress, and a billion other considerations.
What resulted in better building safety was the development of building codes. Even using flammable building materials, people adapted the _way_ they built so that the end result was much safer than before. If you told a builder you'd never buy a wooden home because wood is "unsafe", they'd laugh at you - and then sell you a bunch of "inflammable" crap you don't need.
> A century ago, buildings were quite dangerous, and likely to kill you in all sorts of situations. Wood burns, and concrete and brick don't. Clearly wood is an "unsafe material". But just changing the material didn't result in safer buildings. Buildings made of brick and concrete still killed people.
This will seem trite, but I think it's just literally easier to figure out how to build wooden buildings that are fire resistant than it is to write memory safe code in C/C++.
Well, it did take us a few thousand years to get to safe wooden buildings...
I actually don't think securing C/C++ code is that hard. It's certainly a skill you need to learn, but so is writing linked lists and qsort. I think people just aren't applying themselves. But the language seems to catch the flack rather than the programmer.
From the article:
The bug is that there is simply no bounds checking at all; sig and key are arbitrary-length, attacker-controlled blobs, and cx->u is a fixed-size buffer.
As we can see, the programmer just made no effort to secure the code. But we still blame the language, like blaming wood for being flammable.
Anyway. I'm definitely not against new languages. But I think before a program is _rewritten_, it should be for a reason much better than "I didn't want to secure the code".
All ASN1 parsers need to get replaced with safe Rust code, full stop.
This absolutely should have happened. "Mature, world-class security teams" are, as a general rule, objectively terrible at creating products that meet any meaningful, objective definition of security.
Remember a few years ago when Apple, the world's most valuable company, released a version of macOS that not only let you log into root with no password(!), but actually helpfully created a root account with the password supplied for the first person who tried to login to root[1]? Zerodium can purchase a vulnerability of similar severity to the one described in the article in Mozilla's premier product, Firefox, which undoubtedly has the best engineers at Mozilla and has had hundreds of millions if not billions spent on its development for $100k [2]. Even if we lowball the consulting rates for a skilled engineer at ~$500k, that means that we should expect a single, skilled engineer to, on average, find such a vulnerability with ~2 months of fulltime work otherwise the supply would have dried up.
By no objective metric does taking 2 months of a single engineer's time to completely defeat the security of a widely used product constitute a meaningful, objective level of security. Even a two order of magnitude underestimation, literally 100x more than needed, still puts it in the range of a small team working for a year which still does not qualify as meaningful security. And, we can verify that this assessment is fairly consistent with the truth because we can ask basically any security professional if they believe a single person or a small team can completely breach their systems and they will invariably be scared shitless by the thought.
The processes employed by the large, public, commercial tech companies that are viewed as leaders in security systemically produce software with security that is not only imperfect, it is not even good; it is terrible and is completely inadequate for any purpose where even just small scale criminal operations can be expected as seen by the rash of modern ransomware. Even the engineers who made these systems openly admit to this state of affairs [3] and many will even claim that it can not be made materially better. If the people making it are saying it is bad as a general rule, you should run away, fast.
To achieve adequate protection against threat actors who actually act against these products would require not mere 100% improvements, it would require 10,000% or even 100,000% improvements in their processes. To give some perspective on that, people who tout Rust say that it if we switch to it we will remove the memory safety defects which are 70% of all security defects. If we use quantity of security defects as a proxy for security (which is an okay proxy to first order), that would require 6 successive switches to technologies each as much better than the last as people who like Rust say Rust is better than C++. That is how far away it all is, the security leaders do not need just a silver bullet, they need a whole silver revolver.
In summary, a vulnerability like this is totally expected and not because they failed to have "world-class security" but because that is what "world-class security" actually means.
[1]
https://arstechnica.com/information-technology/2017/11/macos...
[2]
https://zerodium.com/program.html
(ZERODIUM Payouts for Desktops/Servers:Firefox RCE+LPE)
[3]
[4]
https://www.zdnet.com/article/microsoft-70-percent-of-all-se...
I mostly agree with you. I think it's going to take some rough years or decades before we re-architect all the things we have grown accustomed to.
https://dwheeler.com/essays/apple-goto-fail.html
Buffer Overflow is a classic right? (queue Rust enthusiasts)
"This shouldn't have happened," says user of the only language where this regularly happens.
https://www.theonion.com/no-way-to-prevent-this-says-only-na...
Yep, Rust at best eliminates some already weak excuses to keep doing security critical parsing in the chainsaw-juggling traditon, when we've known better for 20+ years.
I've been meaning for some time to write one of these (with auto-generation whereas I believe The Onion actually has staff write a new one each time they run this article) for password database loss. [Because if you use Security Keys this entire problem dissipates. Your "password database" is just a bunch of public data, stealing it is maybe mildly embarrassing but has no impact on your users and is of no value to the "thieves"]
But a Rust one for memory unsafety would be good too.
It's wild how ahead of it's time Ada was.
They are not wrong
Packers and Movers Bangalore - Reliable and Verified Household Shifting Service Providers Give Reasonable ###Packers and Movers Charges. Cheap and Best Office Relocation Compare Quotation for Assurance for Local and Domestic House Shifting and Get estimates today to save upto 20%, **Read Customer Reviews - @
https://packersmoversbangalore.in/
This is what happens when you let people write C that don't know what they are doing. Probably a new grad ?
so who wants to tell linus to rewrite everything in rust?
i love the fact that people can read and write whatever they want wherever they want
go and rust make things a bit boring, c makes me feel like a kid
of course the price for freedom is security haha
What went wrong - Issue #0: The library was not re-written in a language that prevents undefined behavior (UB).
I don't think there are any general purpose programming languages with decent performance which outright "prevent undefined behaviour" in something like NSS. Rust, for example, does not.
_safe_ Rust doesn't have undefined behaviour but of course you can (and a large project like this will) use _unsafe_ Rust and then you need the same precautions for that code. This sharply reduces your exposure if you're doing a decent job - and is therefore worthwhile but it is not a silver bullet.
Outright preventing undefined behaviour is hard. Java outlaws it, but probably not successfully (I believe it's a bug if your Java VM exhibits undefined behaviour, but you may find that unless it was trivial to exploit it goes in the pile of known bugs and nobody is jumping up and down to fix it). Go explains in its deeper documentation that concurrent Go is unsafe (this is one of the places where Rust is safer, _safe_ Rust is still safe concurrently).
Something like WUFFS prevents undefined behaviour and has excellent performance but it has a deliberately limited domain rather than being a general purpose language. _Perhaps_ a language like WUFFS should exist for much of the work NSS does. But Rust does exist, it was created at Mozilla where NSS lives, and NSS wasn't rewritten in Rust so why should we expect it to be rewritten in this hypothetical Wrangling Untrusted Cryptographic Data Safely language?
> I don't think there are any general purpose programming languages with decent performance which outright "prevent undefined behaviour" in something like NSS. Rust, for example, does not.
You know what I meant by "a language that prevents UB". Your comment argues semantics. That's not nice. Please stop.
> safe Rust doesn't have undefined behaviour but of course you can (and a large project like this will) use unsafe Rust and then you need the same precautions for that code.
How can that can be true? A large Rust project uses unsafe only because its authors don't care enough. Instead of spending the effort to make the safe code fast enough, they resort to using `unsafe`. It's the same reason that people add code without tests or add APIs without docs.
A special purpose language for writing cryptography code is a terrible idea. Once we have a language suitable for handling untrusted data, we should make it good enough to use for everything and then use it for everything.
I have put some effort into making safe Rust usable for everything. I wrote a safe regex library [0] and a safe async library [1]. I plan to put more effort into these and other libraries. For example, I am working on safe Rust TLS and HTTP libraries. Eventually, safe Rust will be production quality and fast enough for most use-cases.
[0]
https://crates.io/crates/safe-regex
[1]
https://crates.io/crates/safina
> A large Rust project uses unsafe only because its authors don't care enough.
This is not true at all. There are plenty of reasons to occasionally use unsafe code, even if your bar for "is it really worth it" is quite high. One reason, if you're writing a crypto library, is that certain kinds of timing attacks are pretty much impossible to prevent without inline assembly.
Preventing timing leaks requires unsafe code because the Rust compiler lacks support for constant-runtime code [0]. Adding that support is one step in making it "good enough to use for everything".
What are several more of the "plenty of reasons to occasionally use unsafe code"? I can think of only one: interfacing with hardware or unsafe OSes that control access to the hardware.
[0]
https://github.com/rust-lang/rfcs/issues/2533
> safe Rust doesn't have undefined behaviour but of course you can (and a large project like this will) use unsafe Rust and then you need the same precautions for that code.
It's the difference between crossing a minefield with 1000 hidden mines that blow off if you happen to step on them, or a clean field with a single hole with a big warning sign next to it.
Sure, none prevent you from injuries crossing it, but don't tell me it's the same risk
When will we switch to memory safe (but reasonably performant) languages like Go/Rust/C#?
Performance critical sections could remain in C/C++, but MOST code doesn’t need the kind of performance C/C++ provides.
Hopefully C/C++ will go the way of assembly: present in TINY doses.
I don't think you need to explain to _Mozilla_ about Just Rewriting It to Rust
Didn't they recently fire the Rust team?
While they did let go the folks who were working on Rust (see the other comments in this thread) they are still members of the Rust foundation, and folks still write code in Rust at Mozilla. For example you have this from last week
https://groups.google.com/a/mozilla.org/g/dev-platform/c/dVU...