I contributed a number of performance patches to this release of zlib-rs. This was my first time doing perf work on a Rust project, so here are some things I learned:
Even in a project that uses `unsafe` for SIMD and internal buffers, Rust still provided guardrails that made it easier to iterate on optimizations. Abstraction boundaries helped here: a common idiom in the codebase is to cast a raw buffer to a Rust slice for processing, to enable more compile-time checking of lifetimes and array bounds.
The compiler pleasantly surprised me by doing optimizations I thought I’d have to do myself, such as optimizing away bounds checks for array accesses that could be proven correct at compile time. It also inlined functions aggressively, which enabled it to do common subexpression elimination across functions. Many times, I had an idea for a micro-optimization, but when I looked at the generated assembly I found the compiler had already done it.
Some of the performance improvements came from better cache locality. I had to use C-style structure declarations in one place to force fields that were commonly used together to inhabit the same cache line. For the rare cases where this is needed, it was helpful that Rust enabled it.
SIMD code is arch-specific and requires unsafe APIs. Hopefully this will get better in the future.
Memory-safety in the language was a piece of the project’s overall solution for shipping correct code. Test coverage and auditing were two other critical pieces.
Interesting! I wonder if you have used PGO in the project? Forcing fields to be located next to each other kind of feels like something that PGO could do for you.
I basically did manual PGO because I was also reducing the size of several integer fields at the same time to pack more into each cache line. I’m excited to try out the rustc+LLVM PGO for future optimizations.
Kidding aside, I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library. At what point does it really stop mattering if this is C or Rust?
Presumably with inline assembly both languages can emit what is effectively the same machine code. Is the Rust compiler a better optimizing compiler than C compilers?
Using unsafe blocks in Rust is confusing when you first see it. The idea is that you have to opt-out of compiler safety guarantees for specific sections of code, but they’re clearly marked by the unsafe block.
In good practice it’s used judiciously in a codebase where it makes sense. Those sections receive extra attention and analysis by the developers.
Of course you can find sloppy codebases where people reach for unsafe as a way to get around Rust instead of writing code the Rust way, but that’s not the intent.
You can also find die-hard Rust users who think unsafe should never be used and make a point to avoid libraries that use it, but that’s excessive.
TIL also - until today, I thought it was just "mercaptan". Turns out there are actually two variants of that:
> Ethanethiol (EM), commonly known as ethyl mercaptan is used in liquefied petroleum gas (LPG) and resembles odor of leeks, onions, durian, or cooked cabbage
Methanethiol, commonly known as methyl mercaptan, is added to natural gas as an odorant, usually in mixtures containing methane. Its smell is reminiscent of rotten eggs or cabbage.
...but you can still call it "mercaptan" and be ~ correct in most cases.
Note that that is a doubly linked list, because it is a "soup of ownership" data structure. A singly linked list has clear ownership so it can be modelled in safe Rust.
On modern aschitectures you shouldn't use either unless you have an extremely niche use-case. They are not general use data structures anymore in a world where cache locality is a thing.
No you don’t. You can use the standard linked list that is already included in the standard library.
Coming up with these niche examples of things you need unsafe for in order to discredit rust’s safety guarantees is just not interesting. What fraction of programmer time is spent writing custom linked lists? Surely way less than 1%. In most of the other 99%, Rust is very helpful.
I think the point is that it's funny that the standard library has to use unsafe to implement a data structure that's like the second data structure you learn in an intro to CS class
C has to make a syscall to the kernel which ultimately results in a BIOS interrupt to implement printf, which you need for the hello world program on page 1 of K&R.
Does that mean that C has no abstraction advantage over directly coding interrupts with asm? Of course not.
> C has to make a syscall to the kernel which ultimately results in a BIOS interrupt to implement printf,
That's not the case since the late 1990s. Other than during early boot, nobody calls into the BIOS to output text, and even then "BIOS interrupt" is not something normally used anymore (EFI uses direct function calls through a function table instead of going through software interrupts).
What really happens in the kernel nowadays is direct memory access and direct manipulation of I/O ports and memory mapped registers. That is, all modern operating systems directly manipulate the hardware for text and graphics output, instead of going through the BIOS.
I love how the most common negative thing I hear about rust is how a really uncommon data structure no one should write by hand and should almost always import can be written using the unsafe rust language feature. Meanwhile rust application s tend to in most cases be considerably faster, more correct and more enjoyable to maintain than other languages. Must be a really awesome technology.
This is far less of a problem than it would be in a C-like language, though.
You can implement that linked list just once, audit the unsafe parts extensively, provide a fully safe API to clients, and then just use that safe API in many different places. You don't need thousands of project-specific linked list reimplementations.
Hydrogen Sulfide is highly corrosive (big problem in sewers and associated infrastructure) I highly doubt you would choose to introduce it to gas pipelines on purpose.
Hydrogen sulfide is highly toxic (it's comparable to carbon monoxide). I doubt anyone in their right mind would put it intentionally in a place where it could leak around humans.
> Hydrogen sulfide is highly toxic (it's comparable to carbon monoxide)
It's a bad comparison since CO doesn't smell, which is what makes it dangerous, while H2S is detected by our sense of smell at concentrations much lower than the toxic dose (in fact, its biggest dangers comes from the fact that at dangerous concentration it doesn't even smell anything due to our receptors being saturated).
It's not what's being put in natural gas, but it wouldn't be that dangerous if we did.
Mercaptan is a group of compounds, more than one of which are used as gas odorants, so in some places, gas smells of rotten eggs, similar to H2S, while in others gas doesn't smell like that at all, but a quite distinct smell that's reminiscent garlic and durian.
I have. It's worse no doubt. But it's not the smell of rotten eggs. My comment was meant to be tongue-in-cheek to correct the mistake of saying "H2S" in the GP comment.
They are marked as unsafe because there are hundreds and hundreds of intrinsics, some of which do memory access, some have side effects and others are arithmetic only. Someone would have to individually review them and explicitly mark the safe ones.
There was a bug open about it and the rationale was that no one with the expertise (some of these are quite arcane) was stepping up to do it. (edit: other comments in this thread suggest that this effort is now underway and first changes were committed a few weeks ago)
You can do safe SIMD using std::simd but it is nightly only at this point.
For now the caller has to ensure proper alignment of SMID lines. But in the future a safe API will be made available, once the kinks are ironed out. You can already use it in fact, by enabling a specific compiler feature [1].
there are no loads in the above unsafe block, in practice loadu is just as fast as load, and even if you manually use the aligned load or store, you get a crash. it's silly to say that crashes are unsafe.
Well, there's a category difference between a crash as in a panic and a crash as in a CPU exception. Usually, "safe" programming limits crashes to language-level error handling, which allows you to easily reason about the nature of crashes: if the type system is sound and your program doesn't use unsafe, the only way it should crash is by panic, and panics are recoverable and leave your program in a well-defined state. By the time you get to a signal handler, you're too late. Admittedly, there are some cases where this is less important than others... misaligned load/store wouldn't lead to a potential RCE, but if it can bring down a program it still is a potential DoS vector.
Of course, in practice, even in Rust, it isn't strictly true that programs without unsafe can't crash with fatal runtime errors. There's always stack overflows, which will crash you with a SIGABRT or equivalent operating system error.
As you point out later, a SIGBRT or a SIGBUS would both be perfectly safe and really no different than a panic. With enough infra you could convert them to panic anyway (but probably not worth the effort).
Well, that's the thing though: in terms of Rust and Go and other safe programming languages, CPU exceptions are not "safe" even though they are not inherently dangerous. The point is that the subset of the language that is safe can't generate them, period. They are not accounted for in safe code.
There are uses for this, especially since some code will run in environments where you can not simply handle it, but it's also just cleaner this way; you don't have to worry about the different behaviors between operating systems and possibly CPU architectures with regards to error recovery if you simply don't generate any.
Since there are these edge cases where it wouldn't be possible to handle faults easily (e.g. some kernel code) it needs to be considered unsafe in general.
That’s largely true, but there are some exceptions (pun not intended).
In Rust, the CPU exception resulting from a stack overflow is considered safe. The compiler uses stack probing to ensure that as long as there is at least one page of unmapped memory below the stack (guard page), the program will reliably fault on it rather than continuing to access memory further below. In most environments it is possible to set up a guard page, including Linux kernel code if CONFIG_VMAP_STACK is enabled. But there are other environments where it’s not, such as WebAssembly and some microcontrollers. In those environments, the backend would have to add explicit checks to function prologs to ensure enough stack is available. I say “would have to”, not “does”: I’ve heard that on at least the microcontrollers, there are no such checks and Rust is just unsound at the moment. Not sure about WebAssembly.
Meanwhile, Go uses CPU exceptions to handle nil dereferences.
Yeah, I glossed over the Rust stack overflow case. I don't know why: Literally two parent comments up I did bother to mention it.
That said, I actually entirely forgot Go catches nil derefs in a segfault handler. I guess it's not a big deal since Go isn't really suitable for free-standing environments where avoiding CPU exceptions is sometimes more useful, so there's no particular reason why the runtime can't rely on it.
Also, AFAIK panics are not always recoverable in Rust. You can compile your project with `panic = "abort"`, in which case the program will quit immediately whenever a panic is encountered.
Sure, but that is beside the point: if you compile code like that, you're intentionally making panics unrecoverable. The nature of panics from the language perspective is not any different; you're still in a well-defined state when it happens.
It's also possible to go a step further and practice "panic-free" Rust where you write code in such a way that it never links to the panic handler. Seems pretty hard to do, but seems like it might be worth it sometimes, especially if you're in an environment where you don't have anything sensible to do on a panic.
1. You're calling out to what's basically assembly, so buyer beware. This is basically FFI into C/asm.
2. There's no guarantee on what comes out of those 128-bit vectors after to follow any sanity or expectations, so... buyer beware. Same reason std::mem::transmute is marked unsafe.
It's really the weakest form of unsafe.
Still entirely within the bounds of a sane person to reason about.
The example here is trivially safe but more general SIMD safety is going to be extremely difficult to analyze for safety, possibly intractable.
For example, it is perfectly legal to dereference a vector pointer that references illegal memory if you mask the illegal addresses. This is a useful trick and common in e.g. idiomatic AVX-512 code. The mask registers are almost always computed at runtime so it would be effectively impossible to determine if a potentially illegal dereference is actually illegal at compile-time.
I suspect we’ll be hand-rolling unsafe SIMD for a long time. The different ISAs are too different, inconsistent, and weird. A compiler that could make this clean and safe is like fusion power, it has always been 10 years away my entire career.
Presumably a bounds check on the mask could be done or a safe variant exposed that does that trick under the hood. But yeah I don’t disagree that it’s “safe SIMD” is unlikely to scratch the itch for various applications but hopefully at least it’ll scratch a lot of them enough that the remaining unsafe is reduced.
> they've been sitting in nightly unstable for years
So many very useful features of Rust and its core library spend years in "nightly" because the maintainers of those features don't have the discipline to see them through.
Before I started working with Rust, I spent a lot of time using Swift for systems-y/server-side code, outside of the Apple ecosystem. There is a lot I like about that language, but one of the biggest factors that drove me away was just how fast the Apple team was to add more and more compiler-magic features without considering whether they were really the best possible design. (One example: adding compiler-magic derived implementations of specific protocols instead of an extensible macro system like Rust has.) When these concerns were raised on the mailing lists, the response from leadership was "yes, something like that would be better in the long run, but we want to ship this now." Or even in one case, "yes, that tweak to the design would be better, but we already showed off the old design at the WWDC keynote and we don't want to break code we put in a keynote slide."
When I started working in Rust, I'd want some feature or function, look it up, and find it was unstable, sometimes for years. This was frustrating at first, but then I'd go read the GitHub issue thread and find that there was some design or implementation concern that needed to be overcome, and that people were actively working on it and unwilling to stabilize the feature until they were sure it was the best possible design. And the result of that is that features that do get stabilized are well thought out, generalize, and compose well with everything else in the language.
Yes, I really want things like portable SIMD, allocators, generators, or Iterator::intersperse. But programming languages are the one place I really do want perfect to be the enemy of good. I'd rather it take 5+ years to stabilize features than for us to end up with another Swift or C++.
Rust's async model was shipped as an MVP, not in the sense of "this is a bad design and we just want to ship it"; but rather, "we know this is the first step of the eventual design we want, so we can commit to stabilizing these parts of it now while we work on the rest." There's ongoing work to bring together the rest of the pieces and ergonomics on top of that foundational model; async closures & trait methods were recently stabilized, and work towards things like pin ergonomics & simplifying cheap clones like Rc are underway.
Rust uses this strategy of minimal/incremental stabilization quite often (see also: const generics, impl Trait); the difference between this and what drove me away from Swift is that MVPs aren't shipped unless it's clear that the design choices being made now will still be the right choices when the rest of the feature is ready.
IMO shipping async without a standardized API for basic common async facilities (like thread spawning, file/network I/O) was a mistake and basically means that tokio has eaten the whole async side of the language.
Why define runtime independence as a goal, but then make it impossible to write runtime agnostic crates?
>IMO shipping async without a standardized API for basic common async facilities (like thread spawning, file/network I/O) was a mistake and basically means that tokio has eaten the whole async side of the language.
I would argue that it's the opposite of a mistake. If you standardize everything before the ecosystem gets a chance to play with it, you risk making mistakes that you have to live with in perpetuity.
Unless you clearly define how and when you’re going to handle removing a standard or updating it to reflect better use cases.
Language designers admittedly should worry about constant breakage but it’s fine to have some churn, and we shouldn’t be so concerned of it that it freezes everything
My personal opinion is that if you want to contribute a language feature, shit or get off the pot. Leaving around a half-baked solution actually raises the required effort for someone who isn't you to add that feature (or an equivalent) because they now have to either (1) ramp up on the spaghetti you wrote or (2) overcome the barrier of explaining why your thing isn't good enough. Neither of those two things are fun (which is important since writing language features is volunteer work) and those things come in the place of doing what is actually fun, which is writing the relevant code.
The fact that the Rust maintainers allow people to put in half-baked features before they are fully designed is the biggest cultural failing of the language, IMO.
>The fact that the Rust maintainers allow people to put in half-baked features before they are fully designed is the biggest cultural failing of the language, IMO.
In nightly?
Hard disagree. Letting people try things out in the real world is how you avoid half-baked features. Easy availability of nightly compilers with unstable features allows way more people to get involved in the pre-stabilization polishing phase of things and raise practical concerns instead of theoretical ones.
C++ takes the approach of writing and nitpicking whitepapers for years before any implementations are ready and it's hard to see how that has led to better outcomes relatively speaking.
Yeah, we're going to have to agree to disagree on the C++ flow (really the flow for any language that has a written standard) being better. That flow is usually:
1. Big library/compiler does a thing, and people really like it
2. Other compilers and libraries copy that thing, sometimes putting their own spin on it
3. All the kinks get worked out and they write a white paper
4. Eventually the thing becomes standard
That way, everything in the standard library is something that is fully-thought-out and feature-complete. It also gives much more room for competing implementations to be built and considered before someone stakes out a spot in the standard library for their thing.
>That way, everything in the standard library is something that is fully-thought-out and feature-complete
Are C++ features really that much better thought out? Modules were "standardized" half a decade ago, but the list of problems with actually using them in practice is still pretty damn long to the point where adoption is basically non-existent.
I'm not going to pretend to be nearly as knowledgeable about C++ as Rust, but it seems like most new C++ features I hear about are a bit janky or don't actually fit that well with the rest of the language. Something that tends to happen when designing things in an ivory tower without testing them in practice.
They absolutely are. The reason many features are stupid and janky is because the language and its ecosystem has had almost 40 more years to collect cruft.
The fundamental problem with modules is that build systems for C++ have different abstractions and boundaries. C++ modules are like Rust async - something that just doesn't fit well with the language/system and got hammered in anyway.
The reason it seems like they come from nowhere is probably because you don't know where they come from. Most things go through boost, folly, absl, clang, or GCC (or are vendor-specific features) before going to std.
That being said, it's not just C++ that has this flow for adding features to the language. Almost every other major language that is not Rust has an authoritative specification.
Unfortunely C++ on the last set of revisions has gotten that sequence wrong, many ideas are now PDF implemented before showing up in any compiler years later.
Fully-thought-out and feature-complete is something that since C++17 has been hardly happening.
> maintainers of those features don't have the discipline to see them through.
This take makes me sad. There are a lot of reasons why an open source contributor may not see something through. "Lack of discipline" is only one of them. Others that come to mind are: lack of time, lack of resources, lack of capability (i.e good at writing code, but struggles to navigate the social complexities of sheparding a significant code change), clinically impaired ability to "stay the course" and "see things through" (e.g. ADHD), or maybe it was a collaborative effort and some of the parties dropped out for any of the aforementioned reasons.
I don't have a solution, but it does kinda suck that open source contribution processes are so dependent on instigators being the responsible party to seeing a change all the way through the pipeline.
simd and allocator_api are the two that irritate me enough to consider a different language for future systems dev projects.
I don't have the personality or time to wade into committee type work, so I have no idea what it would take to get those two across the finish line, but the allocator one in particular makes me question Rust for lower level applications. I think it's just not going to happen.
If Zig had proper ADTs and something equivalent to borrow checker, I'd be inclined to poke at it more.
generic simd abstractions are of quite limited use. I'm not sure what's objectionable about the thing Rust has shipped (in nightly) for this, which is more or less the same as the stuff Zig has shipped for this (in a pre-1.0 compiler version).
I don't read any moralizing in my previous comment. And it seems to mirror the relevant section in the book:
"People are fallible, and mistakes will happen, but by requiring these five unsafe operations to be inside blocks annotated with unsafe you’ll know that any errors related to memory safety must be within an unsafe block. Keep unsafe blocks small; you’ll be thankful later when you investigate memory bugs."
I hope the SIMD intrinsics make it to stable soon so folks can ditch unnecessary unsafes if that's the only issue.
This is not really true. You have to uphold those guarantees yourself. With unsafe preconditions, if you don't, the code will still crash loudly (which is better than undefined behaviour).
With unsafe you get exactly the same kind of semantics as C, if you don't uphold the invariant the unsafe functions expect, you end up with UB exactly like in C.
If you want a clean crash instead on indeterministic behavior, you need to use assert like in C, but it won't save you from compiler optimization removing checks that are deemed useless (again, exactly like in C).
> With unsafe you get exactly the same kind of semantics as C, if you don't uphold the invariant the unsafe functions expect, you end up with UB exactly like in C.
This is not exactly true. Even in production code, unsafe preconditions check if you violate these rules.
> Safe Rust: memory safe, no undefined behavior possible.
Unsafe Rust: can trigger undefined behavior if preconditions are violated.
So Unsafe Rust from a UB perspective is no different than C/C++. If preconditions are violated, UB can occur, affecting anywhere in the program. Its unclear how the compiler could check anything about preconditions in a block explicitly used to say that the developer is the one upholding the preconditions.
Using references in unsafe Rust is harder than using raw pointers in C.
Using raw pointers in unsafe Rust is easier than using raw pointers in C.
The solution is to not manipulate references in unsafe code. The problem is that in old versions of Rust this was tricky. Modern versions of Rust have addressed this by adding first-class facilities for producing pointers without needing temporary references: https://blog.rust-lang.org/2024/10/17/Rust-1.82.0.html#nativ...
Isn't it the case that once you use unsafe even a single time, you lose all of Rust's nice guarantees? As far as I'm aware, inside the unsafe block you can do whatever you want which means all of the nice memory-safety properties of the language go away.
It's like letting a wet dog (who'd just been swimming in a nearby swamp) run loose inside your hermetically sealed cleanroom.
"You can take five actions in unsafe Rust that you can’t in safe Rust, which we call unsafe superpowers. Those superpowers include the ability to:
Dereference a raw pointer
Call an unsafe function or method
Access or modify a mutable static variable
Implement an unsafe trait
Access fields of a union
It’s important to understand that unsafe doesn’t turn off the borrow checker or disable any other of Rust’s safety checks: if you use a reference in unsafe code, it will still be checked. The unsafe keyword only gives you access to these five features that are then not checked by the compiler for memory safety. You’ll still get some degree of safety inside of an unsafe block.
In addition, unsafe does not mean the code inside the block is necessarily dangerous or that it will definitely have memory safety problems: the intent is that as the programmer, you’ll ensure the code inside an unsafe block will access memory in a valid way.
People are fallible, and mistakes will happen, but by requiring these five unsafe operations to be inside blocks annotated with unsafe you’ll know that any errors related to memory safety must be within an unsafe block. Keep unsafe blocks small; you’ll be thankful later when you investigate memory bugs."
This description is still misleading. The preconditions for the correctness of an unsafe block can very much depend on the correctness of the code outside and it is easy to find Rust bugs where exactly this was the cause. This is very similar where often C out of bounds accesses are caused by some logic error elsewhere. Also an unsafe block has to maintain all the invariants the safe Rust part needs to maintain correctness.
So, it's true that unsafe code can depend on preconditions that need to be upheld by safe code.
But using ordinary module encapsulation and private fields, you can scope the code that needs to uphold those preconditions to a particular module.
So the "trusted computing base" for the unsafe code can still be scoped and limited, allowing you to reduce the amount of code you need to audit and be particularly careful about for upholding safety guarantees.
Basically, when writing unsafe code, the actual unsafe operations are scoped to only the unsafe blocks, and they have preconditions that you need to scope to a particular module boundary to ensure that there's a limited amount of code that needs to be audited to ensure it upholds all of the safety invariants.
Ralf Jung has written a number of good papers and blog posts on this topic.
And you think one can not modularize C code and encapsulate critical buffer operations in much safer APIs? One can, the problem is that a lot of legacy C code was not written this way. Also lot of newly written C code is not written this way, but the reason is often that people cut corners when they need to get things done with limited time and resources. The same you will see with Rust.
There is no distinction between safe and unsafe code in C, so it's not possible to make that same distinction that you can in Rust.
And even if you try to provide some kind of safer abstraction, you're limited by the much more primitive type system, that can't distinguish between owned types, unique borrows, and shared borrows, nor can it distinguish thread safety properties.
So you're left to convention and documentation for that kind of information, but nothing checking that you're getting it right, making it easy to make mistakes. And even if you get it right at first, a refactor could change your invariants, and without a type system enforcing them, you never know until someone comes along with a fuzzer and figures out that they can pwn you
There is definitely a distinction between safe and unsafe code in C, it is just not a simple binary distinction. But this does not make it impossible to screen C for unsafe constructions and it also does not mean that detecting unsafe issues in Rust is always trivial.
But this is also easy to protect against if you use the tools available to C programmers. It is part of the Rust hype that we would be completely helpless here, but this is far from the truth.
I assume you are hinting at 'int' is signed here? And, that signed overflow is UB in C? Real question: Ignoring what the ISO C language spec says, are there any modern hardware platforms (say: ARM64 and X86-64) that do not use two's complement to implement signed integers? I don't know any. As I understand, two's complement correctly supports overflow for signed arithmetic.
I might be old, but more than 10 years ago, hardly anyone talked about UB in C and C++ programming. In the last 10 years, it is all the rage, but seems to add very little to the conversation. For example, if you program C or C++ with the Win32 API, there are loads of weird UB-ish things that seem to work fine.
> Ignoring what the ISO C language spec says, are there any modern hardware platforms (say: ARM64 and X86-64) that do not use two's complement to implement signed integers?
This is not how compilers work. Optimization happens based on language semantics, not on what platforms do.
At least in recent C++ standards, integers are defined as two’s complement. As a practical matter what hardware like that may still exist doesn’t have a modern C++ compiler, rendering it a moot point.
UB in C is often found where different real hardware architectures had incompatible behavior. Rather than biasing the language for or against different architectures they left it to the compiler to figure out how to optimize for the cases where instruction behavior diverge. This is still true on current architectures e.g. shift overflow behavior which is why shift overflow is UB.
int average(int x, int y) {
long sum = (long)x + y;
if(sum > INT_MAX || sum < INT_MIN)
return -1; // or any value that indicates an error/overflow
return (int)(sum / 2);
}
There is no guarantee that sizeof(long) > sizeof(int), in fact the GNU libc documentation states that int and long have the same size on the majority of supported platforms.
> return -1; // or any value that indicates an error/overflow
-1 is a perfectly valid average for various inputs. You could return the larger type to encode an error value that is not a valid output or just output the error and average in two distinct variables.
> There is no guarantee that sizeof(long) > sizeof(int), in fact the GNU libc documentation states that int and long have the same size on the majority of supported platforms.
That used to be the case for 32-bit platforms, but most 64-bit platforms in which GNU libc runs use the LP64 model, which has 32-bit int and 64-bit long. That documentation seems to be a bit outdated.
(One notable 64-bit platform which uses 32-bit for both int and long is Microsoft Windows, but that's not one of the target platforms for GNU libc.)
I don't know why this answer was downvoted. It adds valuable information to this discussion. Yes, I know that someone already pointed out that sizeof(int) is not guaranteed on all platforms to be smaller than sizeof(long). Meh. Just change the type to long long, and it works well.
Copypasting a comment into an LLM, and then copypasting its response back is not a useful contribution to a discussion, especially without even checking to be sure it got the answer right. If I wanted to know what an LLM had to say, I can go ask it myself; I'm on HN because I want to know what people have to say.
average(INT_MAX,INTMAX) should return INT_MAX, but it will get that wrong and return -1.
average(0,-2) should not return a special error-code value, but this code will do just that, making -1 an ambiguous output value.
Even its comment is wrong. We can see from the signature of the function that there can be no value that indicates an error, as every possible value of int may be a legitimate output value.
It's possible to implement this function in a portable and standard way though, along the lines of [0].
sorting floats with NaN ? almost anything involving threading and mutation where people either don't realise how important locks are, or don't realise their code has suddenly been threaded?
You're a lot more limited more limited to the kinds of APIs you can safely encapsulate in C. For example, you can't safely encapsulate an interface that shares memory between the library and the caller in C. So you're forced into either:
- Exposing an unsafe API and relying on the caller to manually uphold invariants
- Doing things like defensive copying at a performance cost
In many cases Rust gives you the best of both worlds: sharing memory liberally while still having the compiler enforce correctness.
Which is just a convoluted way of saying that it is possible to write bugs in any language. Still, it's undeniable that some languages make a better job at helping you avoid certain bugs than others.
It's true, but I think it's only fair if you hold Rust to this analysis, other languages should too; the scrutiny you're implying you need in an unsafe Rust block needs to be applied to all C code, because all C code could depend on code anywhere else for its safety characteristics.
In practice (in both languages) you check what the actual unsafe code does (or "all" code in C's case), note code that depends on external actors for safety (it's not all C code, nor is it all unsafe Rust blocks), and check their callers (and callers callers, etc).
What is true is that there are more operations in C which can cause undefined behavior and those are more densely distributed over the C code, making it harder to screen for undefined behavior. This is true and Rust certainly has an advantage, but it not nearly as big of an advantage as the "Rust is safe" (please do not look at all the unsafe blocks we need to make it also fast!) and "all C is unsafe" story wants you to believe.
What Rust provides is a way to build safe abstractions over unsafe code.
Rust's type system (including ownership and borrowing, Sync/Send, etc), along with it's privacy features (allowing types to have private fields that can only be accessed by code in the module that defined them) allows you to create fully safe interfaces around code that uses unsafe; there is provably no combination of uses of the interface which lead to undefined behavior.
Now, yeah, it's possible to also use unsafe in Rust just for applying a local optimisation. And that has fewer benefits than a fully encapsulated safe interface, though is still easier to audit for potential UB than C.
So you're right that it's on a continuum, but the distinction between safe and unsafe code means you can more easily find the specific places where UB could occur, and the encapsulation and type system makes it possible to create safe abstractions over unsafe code.
You sound pretty biased, gotta tell you. That snark is not helping any argument you think you might be doing -- and you are not doing any; you are kind of just making fun of Rust, which is pretty boring and uninformative for any reader.
From my past experiences with Rust, the team never had to think about data race once, or mutable volatile globals. And we all there suffered from those decades ago with C and sometimes C++ as well.
You like those and don't want to migrate? More power to ya! But badmouthing Rust with what seem fairly uninformed comments is just low. Inform yourself first.
The places where undefined behaviour can occur are also limited in scope; you insist that that part isn't true, because operations outside those unsafe blocks can impact their safety.
That's only true at the same level of scrutiny as "all C operations can cause undefined behaviour, regardless of what they are", which I find similarly shallow.
Rust is plenty fast, in fact there are countless examples of safe rust that will trivially beat out C in performance due to no aliasing, enabling better vectorization among others. Let alone being simply a more expressive language and allowing writing better optimizations (e.g. small strings, vs the absolutely laughable c-strings that perform terribly, but also you can actually get away with sharing more stuff in memory vs doing defensive copies everywhere because it is safe to do so, etc)
And there is not many things we have statistics on in CS, but memory vulnerabilities being absolutely everywhere in unsafe languages, and Rust cleaning up the absolute majority of them even when only the new parts are written in Rust are some of the few we do know, based on actual, real life projects at Google/Microsoft among others.
A memory safe low-level language is as novel as it gets. Rust is absolutely not just hype, it actually delivers and you might want to get on with the times.
> absolutely laughable c-strings that perform terribly
Not much being said here in 2025. Any good project will quickly switch to a tiny structure that holds char* and strlen. There are plenty of open source libs to help you.
Sure, you can technically just write your own vulnerability for your own program and inject it at an unsafe and see the whole world crumble... but the exact same is true for any form of FFI calls in any language. Is Java memory safe? Yeah, just because I can grab a random pointer and technically break anything I want won't change that.
The fact that a memory vulnerability error may either appear at no place at all OR at the couple hundred lines of code thorough the whole project is a night and day difference.
But “Dereference a raw pointer”, in combination with the ability to create raw pointers pointing to arbitrary memory addresses (that, you can do even in safe rust) allows you to write arbitrary memory from unsafe rust.
So, in theory, unsafe rust opens the floodgates. In practice, though, you can use small fragments of unsafe code that programmers can fairly easily check to be safe.
Then, once you’ve convinced yourself that those fragments are safe, you can be assured that your whole program is safe (using ‘safe’ in the rust sense, of course)
So, there may be some small islands of unsafe code that require extra attention from the programmer, but that should be just a tiny fraction of all lines, and you should be able to verify those islands in isolation.
This is where the rubber hits the road. Rust does not allow you to do this, in the sense that this is possibly undefined behavior. That "possibly" is why the compiler allows you to write this code, because by saying "unsafe", you are promising that this specific arbitrary address is legal for you to write to. But that doesn't mean that it's always legal to do so.
The compiler won't allow you to compile such code without the unsafe. The unsafe is *you* promising the compiler that *you* have checked to ensure that the address will always be legal. So that the compiler will allow you to compile the code.
I believe the post you are replying to was referring to the fact that you could take actions in that unsafe block that would compromise the guarantees of rust; eg you could do something silly, leave the unsafe block, then hit an “impossible” condition later in the program.
A simple example might be modifying a const value deep down in some class, where it only becomes apparent later in the program’s execution. Hence their analogy of the wet dog in a clean room - whatever beliefs you have about the structure of memory in your entire program, and guaranteed by the compiler, could have been undone by a rogue unsafe.
Would someone with more experience be able to explain to me why can't these operations be "safe"? What is blocking rust from producing the same machine code in a "safe" way?
Rust's raw pointers are more-or-less equivalent to C pointers, with many of the same types of potential problems like dangling pointers or out-of-bounds access. Rust's references are the "safe" version of doing pointer operations; raw pointers exist so that you can express patterns that the borrow checker can't prove are sound.
Rust encourages using unsafe to "teach" the language new design patterns and data structures; and uses this heavily in its standard library. For example, the Vec type is a wrapper around a raw pointer, length, and capacity; and exposes a safe interface allowing you to create, manipulate, and access vectors with no risk of pointer math going wrong -- assuming the people who implemented the unsafe code inside of Vec didn't make a mistake, the external, safe interface is guaranteed to be sound no matter what external code does.
Think of unsafe not as "this code is unsafe", but as "I've proven this code to be safe, and the borrow checker can rely on it to prove the safety of the rest of my program."
Why does Vec need to have any unsafe code? If you respond "speed"... then I will scratch my chin.
> For example, the Vec type is a wrapper around a raw pointer, length, and capacity; and exposes a safe interface allowing you to create, manipulate, and access vectors with no risk of pointer math going wrong -- assuming the people who implemented the unsafe code inside of Vec didn't make a mistake, the external, safe interface is guaranteed to be sound no matter what external code does.
I'm sure you already know this, but you can do exactly the same in C by using an opaque pointer to protect the data structure. Then you write a bunch of functions that operate on the opaque pointer. You can use assert() to protect against unreasonable inputs.
Rust doesn't have compiler-magic support for anything like a vector. The language has syntax for fixed-sized arrays on the stack, and it supports references to variable-length slices; but it has no magic for constructing variable-length slices (e.g. C++'s `new[]` operator). In fact, the compiler doesn't really "know" about the heap at all.
Instead, all that functionality is written as Rust code in the standard library, such as Vec. This is what I mean by using unsafe code to "teach" the borrow checker: the language itself doesn't have any notion of growable arrays, so you use unsafe to define its semantics and interface, and now the borrow checker understands growable arrays. The alternative would be to make growable arrays some kind of compiler magic, but that's both harder to implement correctly and not generalizable.
> you can do exactly the same in C by using an opaque pointer to protect the data structure. Then you write a bunch of functions that operate on the opaque pointer. You can use assert() to protect against unreasonable inputs.
That's true and that's a great design pattern in C as well. But there are some crucial differences:
- Rust has no undefined behavior outside of unsafe blocks. This means you only need to audit unsafe blocks (and any invariants they assume) to be sure your program is UB-free. C does not have this property even if you code defensively at interface boundaries.
- In Rust, most of the invariants can be checked at compile time; the need for runtime asserts is less than in C.
- C provides no way to defend against dangling pointers without additional tooling & runtime overhead. For instance, if I write a dynamic vector and get a pointer to the element, there's no way to prevent me from using that pointer after I've freed the vector, or appended an element causing the container to get reallocated elsewhere.
Rust isn't some kind of silver bullet where you feed it C-like code and out comes memory safety. It's also not some kind of high-overhead garbage collected language where you have to write unsafe whenever you care about performance. Rather, Rust's philosophy is to allow you to define fundamental operations out of small encapsulated unsafe building blocks, and its magic is in being able to prove that the composition of these operations is safe, given the soundness of the individual components.
The stdlib provides enough of these building blocks for almost everything you need to do. Unsafe code in library/systems code is rare and used to teach the language of new patterns or data structures that can't be expressed solely in terms of the types exposed by the stdlib. Unsafe in application-level code is virtually never necessary.
Those specific functions are compiler builtin vector intrinsics. The main reason is that they can easily read past ends of arrays and have type safety and aliasing issues.
By the way, the rust compiler does generate such code because under the hood LLVM runs an autovectorizer when you turn on optimizations. However, for the autovectorizer to do a good job you have to write code in a very special way and you have no way of controlling whether or not it kicked in and once it did that it did a good job.
There’s work on creating safe abstractions (that also transparently scale to the appropriate vector instruction), but progress on that has felt slow to me personally and it’s not available outside nightly currently.
often the unsafe code is at the edges of the type system. e.g. sometimes the proof of safety is that someone read the source code of the c library that you are calling out to. it's not useful to think of machine code as safe or unsafe. safety often refers to whether the types of your data match the lifetime dataflow.
Claiming unsafe invalidates "all of the nice memory-safety properties" is like saying having windows in your house does away with all the structural integrity of your walls.
There's even unsafe usage in the standard library and it's used a lot in embedded libraries.
Where are you more likely get a burglar enter your home? Windows ...
Where are you more likely to develop cracks in your walls? Windows ...
Where are you more likely to develop leaks? Windows (especially roof windows!)...
Sorry but horrible comparison ;)
If you need to rely on unsafe in a memory-safe language for performance reasons, then there is a issue with the language compiler at that point, that needs to be fixed. Simple as that.
The whole memory-safety is the bread and butter of the language, the moment you start to bypass it for faster memory operations, you can start doing the same in any other language. I mean, your literally bypassing the main selling point of the language. \_00_/
> If you need to rely on unsafe in a memory-safe language for performance reasons, then there is a issue with the language compiler at that point, that needs to be fixed. Simple as that.
It actually means "Rust needs to interface with many other systems that are not as stringent as it". Your interpretation has nothing to do with what's actually going on and I am surprised you misinterpreted the situation as hugely as you did.
...And even if everything was written in Rust, `unsafe` would still be needed because the lower you get [to the kernel] you get more and more non-determinism at places.
This "all or nothing" attitude is boring and tiring. We all wish things were super simple, black and white, and all-or-nothing. They are not.
All safe code in existence running on von Neumann architectures is built on a foundation of unsafe code. The goal of all memory-safe languages is to provide safe abstractions on top of an unsafe core.
Depends on which JVM you are talking about, some are 100% Java, some are a mix of Java and C, others are a mix of Java and C++, in all cases a bit of Assembly as well.
Depends on which JVM you are talking about, some are 100% Java, some are a mix of Java and C, others are a mix of Java and C++, in all cases a bit of Assembly as well.
You are right. I should have been more clear. I am talking about the bog standard one that most people use from Oracle/OpenJDK. A long time back it was called "HotSpot JVM". That one has source code available on GitHub. It is mostly C++ with a little bit of C and assembly.
I don't think what something was written in should count. Baring bugs it should still be memory safe.
But I believe JVM has ffi and as soon as you use ffi you risk messing up that memory safety.
If your unsafe code violates invariants it was supposed to uphold, that can wreck safety properties the compiler was trying to uphold elsewhere. If you can achieve something without unsafe you definitely should (safe, portable simd is available in rust nightly, but it isn't stable yet).
At the same time, unsafe doesn't just turn off all compiler checks, it just gives you tools to go around them, as well as tools that happen to go around them because of the way they work. Rust unsafe is this weird mix of being safer than pure C, but harder to grasp; with lots of nuanced invariants you have to uphold. If you want to ensure your code still has all the nice properties the compiler guarantees (which go way beyond memory safety) you would have to carefully examine every unsafe block. Which few people do, but you generally still end up with a better status quo than C/C++ where any code can in principle break properties other code was trying to uphold.
Jason Ordendorff's talk [1] was probably the first time I truly grokked the concept of unsafe in Rust. The core idea behind unsafe in Rust is not to provide an escape from the guarantees provided by rust. It's to isolate the places where you have no choice but to break the guarantees and rigorously code/test the boundaries there so that anything wrapping the unsafe code can still provide the guarantees.
As soon as you start playing with FFI and raw pointers in Python, NodeJS, Julia, R, C#, etc you can easily loose the nice memory-safety properties of those languages - create undefined behavior, segfaults, etc. I'd say Rust is a lot nicer for checking unsafe correctness than other memory-safe languages, and also makes it easier to dip down to systems-level programming, yet it seems to get a lot of hate for these features.
Ada is even much more better at checking for correctness. It needs to be talked about more. "Safer than C" has been Ada, people did not know this before they jumped on the Rust bandwagon.
You only lose those guarantees if and only if the code within the unsafe block violates the rules of the Rust language.
Normally in safe code you can’t violate the language rules because the compiler enforces various rules. In unsafe mode, you can do several things the compiler would normally prevent you from doing (e.g. dereferencing a naked pointer). If you uphold all the preconditions of the language, safety is preserved.
What’s unfortunate is that the rules you are required to uphold can be more complex than you might anticipate if you’re trying to use unsafe to write C-like code. What’s fortunate is that you rarely need to do this in normal code and in SIMD which is what the snippet is representing there’s not much danger of violating the rules.
You lose the nice guarantees inside the `unsafe` block, but the point is to write a sound and safe interface over it, that is an API that cannot lead to UB no matter how other safe code calls it. This is basically the encapsulation concept, but for safety.
To continue the analogy of the dog, you let the dog get wet (=you use unsafe), but you put a cleaning room (=the sound and safe API) before your sealed room (=the safe code world)
> Isn't it the case that once you use unsafe even a single time, you lose all of Rust's nice guarantees?
No, not even close. You only lose Rust's safety guarantees when your unsafe code causes Undefined Behavior. Unsafe code that can be made to cause UB from Safe Rust is typically called unsound, and unsafe code that cannot be made to cause UB from Safe Rust is called sound. As long as your unsafe code is sound, then it does not break any of Rust's guarantees.
For example, unsafe code can still use slices or references provided by Safe Rust, because those are always guaranteed to be valid, even in an unsafe block. However, if from inside that unsafe block you then go on to manufacture an invalid slice or reference using unsafe functions, that is UB and you lose Rust's safety guarantees because of the UB.
Is there such a boundary? How do you know a function doesn't call unsafe code without looking at every function called in it, and every function those functions call, and so on?
The usual retort to these questions is 'well, the standard library uses unsafe code, so everything would need a disclaimer that it uses unsafe code, so that's a useless remark to make', but the basic issue still remains that the only clear boundary is whether a function 'contains' unsafe code, not whether a function 'calls' unsafe code.
If Rust did not have a mechanism to use external code then it would be fine because the only sources of unsafe code would be either the application itself or the standard library so you could just grep for 'unsafe' to find the boundaries.
> Is there such a boundary? How do you know a function doesn't call unsafe code without looking at every function called in it, and every function those functions call, and so on?
Yes, there is a boundary, and usually it's either the function itself, or all methods of an object. For instance, a function I wrote recently goes somewhat like this:
The read_unaligned function (https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html) has two preconditions which have to be checked manually. When doing so, you'll notice that the "src" argument must have at least 8 bytes for these preconditions to be met; the "assert_eq!()" call before that unsafe block ensures that (it will safely panic unless the "src" slice has exactly 8 bytes). That is, my "read_unaligned_u64_from_byte_slice" function is safe, even though it calls unsafe code; the function is the boundary between safe and unsafe code. No callers of that function have to worry that it calls unsafe code in its implementation.
> How do you know a function doesn't call unsafe code without looking at every function called in it, and every function those functions call, and so on?
The point is that you don't need to. The guarantees compose.
> The usual retort to these questions is 'well, the standard library uses unsafe code
It's not about the standard library, it's much more fundamental than that: hardware is not memory safe to access.
> If Rust did not have a mechanism to use external code then it would be fine
This is what GC'd languages with runtimes do. And even they almost always include FFI, which lets you call into arbitrary code via the C ABI, allowing for unsafe things. Rust is a language intended to be used at the bottom of the stack, and so has more first-class support, calling it "unsafe" instead of FFI.
I wouldn’t go that far. Bevy for example, uses unsafe internally but is VERY strict about it, and every use of unsafe requires a comment explaining why the code is safe.
In other words, unsafe works if you use it carefully and keep it contained.
My understanding is that the user who writes an unsafe block in a safe function is responsible for making sure that it doesn't do anything wrong to mess up the safety and that the function isn't lying about exposing a safe interface. I think at one point before rust 1.0 there was even a suggestion to rename it trustme. Of course users can easily mess up but the point is to minimize the use of unsafe so its easier to check and create interfaces that can be used safely
Can’t rust do safe simd? This is just vectorised multiplication and xor, but it gets labelled as unsafe. I imagine most code that wants to be fast would use simd to some extent.
The idea is that you can trivially search the code base for "unsafe" and closely examine all unsafe code, and unless you are doing really low-level stuff there should not be much of it. Higher level code bases should ideally have none.
It tends to be found in drivers, kernels, vector code, and low-level implementations of data structures and allocators and similar things. Not typical application code.
As a general rule it should be avoided unless there's a good reason to do it. But it's there for a reason. It's almost impossible to create a systems language that imposes any kind of rules (like ownership etc.) that covers all possible cases and all possible optimization patterns on all hardware.
To the extent that it's even possible to write bare metal microcontroller firmware in Rust without unsafe, as the embedded hal ecosystem wraps unsafe hardware interfaces in a modular fairly universal safe API.
My understanding from Aria Beingessner's and some other writings is that unsafe{} rust is significantly harder to get right in "non-trivial cases" than C, because the semantics are more complex and less specified.
This is definitely true right now, but I don't think it will always be the case.
Unsafe Rust is currently extremely underspecified and underdocumented, but it's designed to be far more specifiable than C. For example: aliasing rules. When and how you're allowed to alias references in unsafe code is not at all documented and under much active discussion; whereas in C pointer aliasing rules are well defined but also completely insane (casting pointers to a different type in order to reinterpret the bytes of an object is often UB even in completely innocuous cases).
Once Rust's memory model is fully specified and written down, unsafe Rust is trying to go for something much simpler, more teachable, and with less footguns than C.
It's hard to compare. Rust has stricter requirements than C, but looser requirements don't mean easier: ever bit shifted by a variable amount? Hope you never relied on shifting "entirely" out of a variable zeroing it.
Clearly marking unsafe code is no good for safety, if you have many marked areas.
Some codebases, you can grep for "unsafe", find no results, and conclude the codebase is safe... if you trust its dependencies.
This is not one of those codebases. This one uses unsafe liberally, which tells you it's about as safe as C.
"unsafe behaviour is clearly marked" seems to be a thought-stopping cliche in the Rust world. What's the point of marking them, if you still have them? If every pointer dereference in C code had to be marked unsafe (or "please" like in Intercal), that wouldn't make C any better.
While everything you say is true, your reply (and most of its siblings!) entirely misses GP's point.
All languages at some point interface with syscalls or low level assembly that can be done wrong, but one of Rust's selling points is a safe wrapping of low-level interactions. Like safe heap allocation/deallocation with `Box`, or swapping with `swap`, etc. Except... here.
Why does a library like zlib need to go beyond Rust's safe offerings? Why doesn't rust provide safe versions of the constructs zlib needs?
> Presumably with inline assembly both languages can emit what is effectively the same machine code. Is the Rust compiler a better optimizing compiler than C compilers?
rustc uses LLVM just as clang does, so to a first approximation they're the same. For any given LLVM IR you can mostly write equivalent Rust and C++ that causes the respective compiler to emit it (the switch fallthrough thing mentioned in the article is interesting though!) So if you're talking about what's possible (as opposed to what's idiomatic), the question of "which language is faster" isn't very interesting.
Rust's borrow checker still checks within unsafe blocks, so unless you are only operating with raw pointers (and not accessing certain references as raw pointers in some small, well-defined blocks) across the whole program it will be significantly more safe than C. Especially given all the other language benefits, like a proper type system that can encode a bunch of invariants, no footguns at every line/initialization/cast, etc.
Yes. I think it’s easy to underestimate how much the richer language and library ecosystem chip away at the attack surface area. So many past vulnerabilities have been in code which isn’t dealing with low-level interfaces or weird performance optimizations and wouldn’t need to use unsafe. There’ve been so many vulnerabilities in crypto code which weren’t the encryption or hashing algorithms but things like x509/ASN parsing, logging, or the kind of option/error handling logic a Rust programmer would use the type system to validate.
Yeah, this article about a rust "win" perfectly illustrates why I distrust all good news about it.
Rust zlib is faster than zlib-ng, but the latter isn't a particularly fast C contender. Chrome ships a faster C zlib library which Rust could not beat.
Rust beat C by using pre-optimized code paths and then C function pointers inside unsafe. Plus C SIMD inside unsafe.
I'd summarize the article as: generous chunks of C embedded into unsafe blocks help Rust to be almost as fast as Chrome's C Zlib.
I think the bigger point here is that doing SIMD in Rust is still painful.
There are efforts like portable-simd [1] to make this better, but in practice, many people are dropping down to low-level SIMD intrinsics and/or inline assembly, which are no better than their C equivalents.
The purpose of `unsafe` is for the compiler to assume a block of code is correct. SIMD intrinsics are marked as unsafe because they take raw pointers as arguments.
In safe Rust (the default), memory access is validated by the borrow checker and type system. Rust’s goal of soundness means safe Rust should never cause out-of-bounds access, use-after-free, etc; if it does, then there's a bug in the Rust compiler.
And while we're in the hypothetical extreme world somewhat separated from reality, a series of solar flares could flip a memory bit and all the error-correction bits in my ECC ram at once to change a pointer in memory, causing my safe rust to do an out of bounds write.
Until we design perfectly correct computer hardware, processors, and a sun which doesn't produce solar radiation, we can't rely on totally uniform correct execution of our code, so we should give up.
The reality is that while we can't prove the rust compiler is safe, we can keep using it and diligently fix any counter-examples, and that's good enough in practice. Over in the real world, where we can acknowledge "yes, it is impossible to prove the absence of all bugs" and simultaneously say "but things sure seem to be working great, so we can get on with life and fix em if/when they pop up".
Sorry, it's just that I have an allergic reaction to what sounds like people trying to make debate-bro arguments.
Like, when I say "use signal, it's secure", someone could respond "Ahh, but technically you can't prove the absence of bugs, signal could have serious bugs, so it's not secure, you fool", but like everyone reading this already knew "it's secure" means "based on current evidence and my opinion it seems likely to be more secure than alternatives", and it got shortened. Interpreting things as absolutes that are true or false is pointless debate-bro junk which lets you create strawmen out of normal human speech.
When someone says "1+1 = 2", and a debate-bro responds "ahh but in base-2 it's 10 you fool", it's just useless internet noise. Sure, it's correct, but it's irrelevant, everyone already knows it, the original comment didn't mean otherwise.
Responding to "safe Rust should never cause out-of-bounds access, use-after-free" with "ahh but we can't prove the compiler is safe, so rust isn't safe is it??" is a similarly sorta response. Everyone already knows it. It's self-evident. It adds nothing. It sounds like debate-bro "I want to argue with you so I'm saying something that's true, but we both already know and doesn't actually matter".
I think that allergic response came out, apologies if it was misguided in this case and you're not being a debate-bro.
I don't think we can go beyond the 'human limitations' if you will, of any software.
Bugs happen, they're bound to. Its more, what is enforcing the Rust language guarantees and how do we know its enforcing them with reasonably high accuracy one can ascertain?
I feel that it can only happen as Rust itself becomes (or perhaps it meaningfully already is) written in pure 100% safe Rust itself. At which point, I believe the matter will be largely settled.
Until then, I don't think its unreasonable for someone to ask about how it verifies its assertions is all.
There is no possible way for something to be written in 100% memory safe code, no matter what the language, if you include "no unsafe code anywhere in the call stack." Interacting with the hardware is not memory safe. Any useful program must on some level involve unsafety. This is true for every programming language.
I think whenever someone takes the time to walk their audience through the nuances of this question its a big win.
No different than how I asked of the Go community how it could produce binaries on any platform for all major platforms it supports (IE, you don't have to compile your Go code on Linux for it to work on Linux, only have to set a flag, with the exception If I recall correctly of CGO dependencies but thats a wild horse anyway)
But you have to admit Rust zealots are misguided, too, who does not happen to know or realize the obviousness of what you just said with regarding to Rust.
How is it a strawman? Many people have misconceptions with regarding to Rust, while not even knowing about the existence of Ada/SPARK to begin with. They blindly spout "Rust is saFeEe!44!". If you are not a zealot, then it is not applied to you.
I see about 1000x more anti-rust-zealot strawman arguments than rust zealots on this site. Can you give some examples of the misguided rust zealotry you’re talking about?
Yep! For example, https://github.com/Speykious/cve-rs is an example of a bug in the Rust compiler, which allows something that it shouldn't. It's on its way to being fixed.
> or miss things no?
This is the trickier part! Yes, even proofs have axioms, that is, things that are accepted without proof, that the rest of the proof is built on top of. If an axiom is incorrect, so is the proof, even though we've proven it.
> The standard library will not deviate in naming or type signature of any intrinsic defined by an architecture.
I think this makes sense, just like any other intrinsic: unsafe to use directly, but with safe wrappers.
I believe that there are also some SIMD things that would have to inherently take raw pointers, as they work on pointers that aren't aligned, and/or otherwise not valid for references. In theory you could make only those take raw pointers, but I think the blanket policy of "follow upstream" is more important.
To be fair, there's a safe portable SIMD abstraction brewing in `std::simd` but it's not stable yet. SIMD is just a terrible mess of platform differences in general and making a SIMD-using program safe means ensuring the availability of every single intrinsic used, lest the program is unsound. Of course that's not what C or C++ programs typically do, but in that world unsoundness is the norm anyway.
I thought that the point of Rust is to have safe {} blocks (implicit) as a default and unsafe {} when you need the absolute maximum performance available. You can audit those few lines of unsafe code very easily. With C everything is unsafe and you can just forget to call free() or call it twice and you are done.
> unsafe {} when you need the absolute maximum performance available.
Unsafe code is not inherently faster than safe code, though sometimes, it is. Unsafe is for when you want to do something that is legal, but the compiler cannot understand that it is legal.
The usual answer is: You only need to verify the unsafe blocks, not every block. Though 'unsafe' in Rust is actually even less safe than regular C, if a bit more predictable, so there's a crossover point where you really shouldn't have bothered.
The Rust compiler is indeed better than the C one, largely because of having more information and doing full-program optimisation. A `vec_foo = vec_foo.into_iter().map(...).collect::Vec<foo>`, for example, isn't going to do any bounds checks or allocate.
Buggy unsafe blocks can affect code anywhere (through Undefined Behavior, or breaking the API contract).
However, if you verify that the unsafe blocks are correct, and the safe API wrapping them rejects invalid inputs, then they won't be able to cause unsafety anywhere.
This does reduce how much code you need to review for memory safety issues. Once it's encapsulated in a safe API, the compiler ensures it can't be broken.
This encapsulation also prevents combinatorial explosion of complexity when multiple (unsafe) libraries interact.
I can take zlib-rs, and some multi-threaded job executor (also unsafe internally), but I don't need to specifically check how these two interact.
zlib-rs needs to ensure they use slices and lifetimes correctly, the threading library needs to ensure it uses correct lifetimes and type bounds, and then the compiler will check all interactions between these two libraries for me. That's like (M+N) complexity to deal with instead of (M*N).
> I have been told that "unsafe" affects code outside of that block, but hopefully stevelabnik may explain it better (again).
It's due to a couple of different things interacting with each other: unsafe relies on invariants that safe code must also uphold, and that the privacy boundary in Rust is the module.
Before we get into the unsafe stuff, I want you to consider an example. Is this Rust code okay?
struct Foo {
bar: usize,
}
impl Foo {
fn set_bar(&mut self, bar: usize) {
self.bar = bar;
}
}
No unsafe shenanigans here. This code is perfectly safe, if a bit useless.
Let's talk about unsafe. The canonical example of unsafe code being affected outside of unsafe itself is the implementation of Vec<T>. Vecs look something like this (the real code is different for reasons that don't really matter in this context):
The pointer is to a bunch of Ts in a row, the length is the current number of Ts that are valid, and the capacity is the total number of Ts. The length and the capacity are different so that memory allocation is amortized; the capacity is always greater than or equal to the length.
That property is very important! If the length is greater than the capacity, when we try and index into the Vec, we'd be accessing random memory.
So now, this function, which is the same as Foo::set_bar, is no longer okay:
This is because the unsafe code inside of other methods of Vec<T> need to be able to rely on the fact that len <= capacity. And so you'll find that Vec<T>::set_len in Rust is marked as unsafe, even though it doesn't contain unsafe code. It still requires judicious use of to not introduce memory unsafety.
And this is why the module being the privacy boundary matters: the only way to set len directly in safe Rust code is code within the same privacy boundary as the Vec<T> itself. And so, that's the same module, or its children.
> You need to add explicit bounds check or explicitly allocate in C though. It is not there if you do not add it yourself.
Yes — in C you can skip the bounds-checks and allocation, because you can convince yourself they aren't needed; the problem is you may be wrong, either immediately or after later refactoring.
In other memory-safe languages you don't risk the buffer overrun, but it's likely you'll get the bounds checks and allocation, and you have the overhead of GC.
> I have been told that "unsafe" affects code outside of that block, but hopefully stevelabnik may explain it better (again).
Poorly-written unsafe code can have effects extending out into safe code. But correctly-written unsafe code does not have any effects on safe code w.r.t. memory safety. So to ensure memory safety, you just have to verify the correctness of the unsafe code (and any helper functions, etc., it depends on), rather than the entire codebase.
Also, some forms of unsafe code are far less dangeous than others in practice. E.g., most of the SIMD functions are practically safe to call in every situation, but they all have 'unsafe' slapped on them due to being intrinsics.
> You need to add explicit bounds check or explicitly allocate in C though. It is not there if you do not add it yourself.
Unfortunately, you do need to allocate a new buffer in C if you change the type of the elements. The annoying side of strict aliasing is that every buffer has a single type that's set in stone for all time. (Unless you preemptively use unions for everything.)
C has type-changing stores. If you store to a buffer with a new type, it has the new type. Clang does not implement this correctly though, but GCC does.
It won't allocate in this case because it's still a vec of foo at the end, so we know it has enough space. If it were a different type, it may or may not allocate, depending on if it had enough capacity.
> I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library.
This is such a widespread misunderstanding… one of the points of rust (there are many other advantages that have nothing to do with safety, but let’s ignore those for now) is that you can build safe interfaces, possibly on top of unsafe code. It’s not that all code is magically safe all the time.
In addition, unsafe does not mean the code inside the
block is necessarily dangerous or that it will definitely
have memory safety problems: the intent is that as the
programmer, you’ll ensure the code inside an unsafe block
will access memory in a valid way.
Since you say you already know that much Rust, you can be that programmer!
Hard disagree - if you violate the invariants in Rust unsafe code, you can cause global problems with local code. You can cause use-after-free, and other borrow checker violations, with incorrect unsafe code. Nothing will flag it, you will have no idea which unsafe code block is causing the isue, debugging will be hard.
I have no idea what your definition of encapsulation is, but mine is not this.
It's really only encapsulated in the sense that if you have a finite and small set of unsafe blocks, you can audit them easier and be pretty sure that your memory safety bugs are in there.
This reality really doesn't exist much anymore because of how much unsafe is often ued, and since you you have to audit all of them, whether they come from a library or not, it's not as useful to claim encapsulation as one thinks.
I do agree in theory that unsafe encapsulation was supposed to be a thing, but i think it's crazy at this point to not admit that unsafe blocks turned out to easily have much more global effects than people expected, in many more cases, and are used more readily than expected.
Saying "scaling reasoning" also implies someone reasoned about it, or can reason about it.
But the practical problem is the same in both cases - someone got the reasoning wrong and nothing flagged it.
Wanna go search github for how many super popular libraries using unsafe had global correctness issues due to local unsafe blocks that a human reasoned incorrectly about, but something like miri found? Most of that unsafety that turned out to be buggy also was done for (unnecessary) performance reasons.
What you are saying is just something people tell themselves to make them feel okay about using unsafe all over the place.
If you want global correctness, something has to verify it, ideally not-human.
In the end, the thing C lacks is tools like miri that can be used practically with low false-positives, not "encapsulation" of unsafe code, which is trivially easy to perform in C.
Let's not kid ourselves here and end up building an ecosystem that is just as bad as the C one, but our egos refuse to allow us to admit it. We should instead admit our problems and try to improve.
Unsafe also has legitimate use cases in rust, for sure - but most unsafe code i look at does not need to exist, and is not better than unsafe C.
I'll give you an example:
There are entire popular embedded bluetooth stacks in rust using unsafe global mutable variables and raw pointers and ..., across threads, for everything.
This is not better than the C equivalent - in fact it's worse, because users think it is safe and it's very not.
At least nobody thinks the C version is safe. It will often therefore be shoved in a binary that is highly sandboxed/restricted/etc.
It would be one thing if this was in the process of being ported/translated from C. But it's not.
Using intrinsics that require alignment and the API was still being worked on - probably a reasonable use of unsafe (though still easy to cause global problems like buffer overflows if you screwed up the alignment)
The encapsulation referred to here is that you can expose a safe API that is impossible to misuse in a way that leads to undefined behavior. That's the succinct way of putting it anyway.
The `memchr` crate, for example, has an entirely safe API. Nobody needs to use `unsafe` to use any part of it. But its internals have `unsafe` littered everywhere. Could the crate have bugs that result in UB due to a particular use of the `memchr` API? Yes! Doesn't that violate encapsulation? No! A bug inside an encapsulated boundary does not violate the very idea of encapsulation itself.
Encapsulation is about blame. It means that if `memchr` exposes a safe API, and if you use `memchr` and you get UB as a result of some `unsafe` code inside of `memchr`, then that means the problem is inside of `memchr`. The problem is definitively not with the caller using the library. That is, they aren't "holding it wrong."
I'm surprised that someone with as much experience as you is missing this nuance. How many times have you run into a C library API that has UB, you report the bug and the maintainer says, "sorry bro, but you're holding that shit wrong, your fault." In Rust, the only way that ought (very specifically using ought and not is) to be true is if the API is tagged with `unsafe`.
Now, there are all sorts of caveats that don't change the overall point. "totally safe transmute" being an obvious demonstration of one of them[1] by fiddling with `/proc/self/mem`. And of course, Rust does have soundness bugs. But neither of these things change the fundamental idea of encapsulation.
And yes, one obvious shortcoming of this approach is that... well... people don't have to follow it! People can lie! I can expose a safe API, you can get UB and I can reject blame and say, "well you're holding it wrong." And thus, we're mostly back into how languages like C deal with these sorts of things. And that is indeed a bummer. And there are for sure examples of that in the ecosystem. But the glaring thing you've left out of your analysis is all of the crates that don't lie and specifically set out to provide a sound API.
The great thing about progress is that we don't have to perfect. I'm really disappointed that you seem to be missing the forest for the trees here.
"The encapsulation referred to here is that you can expose a safe API that is impossible to misuse in a way that leads to undefined behavior. That's the succinct way of putting it anyway."
Well, no, actually.
At least, not in an (IMHO) useful way.
I can break your safe API by getting the constraints wrong on unsafe code inside that API.
Also, unsafe usage elsewhere is not local. I can break your impossible to misuse API through an unsafe API that someone else used elsewhere, completely outside my control, and then wrapped in a safe API.
Some of these are of course, bugs in rust/compiler, etc. I'm just offering i've yet to hear the view taken that the ability to do this is always a bug in the language/compiler, and will be destroyed on sight.
Beyond that:
To the degree this is useful encapsulation for tracking things down, it is only useful when the amount is small and you can reason about it.
This is simply no longer true in any reasonably sized rust app.
As a result, as you say, it is then only useful for saying who is at fault in the sense of whether i'm holding it wrong. To me, that is basically worthless at scale.
"I'm surprised that someone with as much experience as you is missing this nuance."
I don't miss it - I just don't think it's as useful as claimed.
This level of "encapsulation", which provides no real guarantee except "the set of bugs is caused somewhere by the set of unsafe blocks" is fairly unhelpful at large scale.
I have audited hundreds of thousands of lines of rust code to find bugs caused by unsafe usage. The thing that made it at all tractable was not this form of encapsulation - it was in fact, 100% worthless in doing that at scale because it was till tons and tons and tons of code to try to reason about, across lots of libraries and dependencies. As you say, it only helps provide blame once found, and blame is not that useful at scale. It does not make the code safer. It does not make it easier to track down. It only declares, that after i've spent all the time, that it is not my fault. But also nobody has to do anything anyway.
For small programs, this buys you something, as i said, as long as the set of unsafe blocks is small enough to be tractable to audit, cool. You can find bugs easier. In that sense, the tons of hobby programs, small libraries, etc, are a lot less likely to have bugs when written in rust (modulo their dependencies on unsafe code).
But like, your position seems to be that it is fairly useful that i can go to a library and tell them "your crap is broken", and be right about it.
To me, this does not buy a lot in the kinds of large complex systems rust hopes to replace in C/C++.
(it also might be false)
In actually tracking down the bug, which is what i care about, the thing that was useful is that i could run miri and lots of other things on it and get useful results that pointed me towards the most likely causes of issues..
So don't get me wrong - this is overall better than C, but writing lots of rust (i haven't written C/C++ at all in a while, actually) I still tire of the constant claims of the amount of rust safety. You are the rare rust person who understand the nuance and is willing to admit there is any flaw or non-perfection whatsoever.
A you say, there are lots of things that ought to be true in rust that are not.
You have a good understanding of this nuance, and where it fails.
But it is you, i believe, who is missing the forest for the trees, because most do not have this.
I'll be concrete and i guess controversial in a way you are 100% free to disagree with, but might as well throw a stake in the ground - it's hacker news, might as well have fun making a comment someone can beat me over the head with later: If nothing changes, and the rust ecosystem grows by a factor of 100x while changing nothing about how it behaves WRT to unsafe usage, and no tooling gets significantly better, Rust will not end up better than C in practice. I don't mean - it will not have less bugs/vulnerabilities - i think it would by far!
But whether you have 100 billion of them, or 1 billion of them, and thus made a 100x improvement, i don't think matters too much when it's still a billion :)
Meanwhile, if the rust ecosystem got worse about unsafe, but made tools like Miri 50x faster (and made more tools like it that help verification in practice), it will not end up better than C.
To me - it is the tooling, and not this sort of encapsulation, that will make a practical difference or not at scale.
The idea that you will convince people not to write broken unsafe code, in ways that breaks safe APIs, or that the ability to assign blame matters, is very strange to me, and is no better than C. As systems grow, the likelihood of totally safe transmutes growing in them is basically 100% :)
FWIW - I also agree you don't have to be perfect, nor do I fault rust for not being perfect. Instead, i simply disagree that at scale, this sort of ability to place blame is useful.
To me, it's the ability to find the bugs quickly and as automated as possible that is useful.
I need to find the totally safe transmutes causing issues in my system, not hand it to someone else after determining it couldn't be my fault.
> I can break your safe API by getting the constraints wrong on unsafe code inside that API.
This doesn't make any sense at all as a broader point. Of course you can break the safe API by introducing a bug inside the implementation! I honestly just cannot figure out how you have a misunderstanding of this magnitude, and I'm forced to conclude that we are mis-communicating at some level.
I did read the rest of your comment, and the most significant point I can take away from it is that you're making a claim about scale. I think the dissonance introduced with comments like the one above makes it very hard for me to trust your experience here and the conclusions you've drawn from it. But I will note that whether Rust's safety story scales is from my perspective a different thing entirely from the factual claim that Rust enables safe encapsulation of `unsafe` usage.
You may say that just because Rust enables safe encapsulation doesn't mean programmers using Rust actually follow through with that in practice. And yes, absolutely, it doesn't. You can't derive an is from an ought. But in my experience, it totally does. I do work on lots of "hobby" stuff in Rust (although I try to treat it professionally, I just mean that I am not directly paid for it beyond donations), but I am also paid to write Rust too. I do not have your experience with Rust at scale, so I cannot refute it. But you've said enough questionable things here that I can't trust it either.
Are you writing lots of FFI and/or embedded code? Those are the main places I see unsafe being used a lot.
The tooling and the encapsulation go hand in hand.
> The idea that you will convince people not to write broken unsafe code, in ways that breaks safe APIs, or that the ability to assign blame matters, is very strange to me, and is no better than C. As systems grow, the likelihood of totally safe transmutes growing in them is basically 100% :)
To be honest this doesn't track with my experience at all. Unsafe just isn't that commonly used in projects I contribute to. When it is, it is aggressively encapsulated.
> It's really only encapsulated in the sense that if you have a finite and small set of unsafe blocks, you can audit them easier and be pretty sure that your memory safety bugs are in there. This reality really doesn't exist much anymore because of how much unsafe is often ued, and since you you have to audit all of them, whether they come from a library or not, it's not as useful to claim encapsulation as one thinks.
Is it? I've written hundreds of thousands of lines of production Rust, and I've only sparingly used unsafe. It's more common in some domains than others, but the observed trend I've seen is for people to aggressively encapsulate unsafe code.
Unsafe Rust is quite difficult to write correctly. (The &mut provenance rules are a bit scary!) But once a safe abstraction has been built around it and the unsafe code has passed Miri, in practice I've seen people be able to not worry about it any more.
By the way I maintain cargo-nextest, and we've added support for Miri to make its runs many times faster [1]. So I'm doing my part here!
> and we've added support for Miri to make its runs many times faster
Whoa. This might be the kick in the ass I needed to give cargo-nextest a whirl in my projects. Miri being slow is the single biggest annoyance I have with it!
Would love to hear how it goes! Miri is generally single-threaded, but because nextest is process-per-test, each test gets a completely separate Miri context. A few projects have switched their Miri runs over to nextest and are seeing dramatic improvements in CI times, e.g. [1].
Eh. Good C programmers know what's safe and what's not. Often comments call out sketchy stuff. Just because it's not a language keyword, doesnt mean it's not called out.
Bad C programmers though? Their stuff is more dangerous and they don't know when and don't call it out and should probably stick to Rust.
No, it's been proven over and over that simply knowing invariants is not enough, in long-term projects built by large teams where team members change over time. Even the most experienced C developers are going to fail every so often. You need tooling that automates those invariants, and you need that tooling to fail closed.
I take a hard line on this stuff because we can either keep repeating the fundamental mistake of believing things like "willpower" to write correct code are real, or we can move on and adopt better tooling.
C's safe subset is so small as to be basically useless, and especially it's impossible to encapsulate behavior into a safe interface, in fact it's fairly easy in C to make an interface which is impossible to use correctly (gets() and the like).
I wonder why writing SIMD in high-level languages hasn't been figured out yet for CPUs (it has been the norm for GPUs for since forever). Auto-vectorization universally sucks, so do OpenMP directives.
There was Ispc, which was a separate C-like programming language just for SIMD, but I don't understand why can't regular compilers generated high-quality vectorized code.
Why do you say that? I would say SIMD is pretty well figured out in well-written code, e.g. small, tight loops over vectors. Unrolling and vectorizing a loop is not that hard and happens constantly on all our phones for signal processing, for example.
That's just syntactic sugar (and a bit of architecture independence) over intrinsics. You can get the same in C++ just with wrapping intrinsics in classes, and a few ifdefs.
The key difference is that there are invariants you can rely on as a user of the library, and they'll be enforced by the compiler outside the unsafe blocks. The corresponding C invariants mostly aren't enforced by the compiler. Worse, many C programmers will actively argue that some amount of undefined behavior is "fine".
Rust code emitter is Clang, the same one that Apple uses for C on their platforms. I wouldn't expect any miracles there, as Rust authors have zero influence over it. If any compiler is using any secret Clang magic, that would be Swift or Objective-C, since they are developed by Apple.
You can choose unsafe rust which has many more optimizations and is much faster than safe rust. Both are legitimate dialects of the language. Should you not feel confident with a library that is too “unsafe” you can use another crate. The rust ecosystem is quite big by now.
Personally I would still use unsafe safe rust than raw C which has more edge cases. Also when I’m not on the critical path I can always use safe rust.
> Kidding aside, I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library. At what point does it really stop mattering if this is C or Rust?
Kidding aside the 150-comment Unsafe Rust subthread was inevitable.
> Is the Rust compiler a better optimizing compiler than C compilers?
First, I assume that the main Rust compiler uses LLVM. I also assume (big leap here!) that the LLVM optimization process is language agnostic (ChatGPT agrees, whatever that is worth). As long as the language frontend can compiler to LLVM language-independent intermediate representation (IR), then all languages can equally benefit from the optimizer.
> At what point does it really stop mattering if this is C or Rust?
If I read TFA correctly, they came up with a library that is API compatible with the C one, but they've measured to be faster.
At that point I think in addition to safety benefits in other parts of the library (apart from unsafe micro optimizations as quoted), what they're leveraging is better compiler technology. Intuitively, I start to assume that the rust compiler can perhaps get away with more optimizations that might not be safe to assume in C.
There are certain optimizations you can only make with unsafe, because the borrow checker is smart, but not all-knowing. There have been countless discussions how unsafe isn't the ideal name. It should be more like in the meaning of trust the programmer that they checked this manually.
That being said, most rust programs don't ever need to use unsafe directly. If you go very low level or tune for prrformance it might become useful however.
Or if you're lazy and just want to stop the borrow checker from saving your ass.
..at least outside of loads/stores. From a bit of looking at the code though it seems like a good amount of those should be doable in a safe way with some abstractions.
You can use 'unsafe' blocks to delineate places on the hot path where you need to take the limiters off, then trust that the rest of the code will be safe. In C, all your code is unsafe.
We will see more and more Rust libraries trounce their C counterparts in speed, because Rust is more fun to work in because of the above. Rust has democratized high-speed and concurrent systems programming. Projects in it will attract a larger, more diverse developer base -- developers who would be loath to touch a C code base for (very justified) fear of breaking something.
> At what point does it really stop mattering if this is C or Rust?
That depends. If, for you, safety is something relative and imperfect rather than absolute, guaranteed and reliable, then - the answer is that once you have the first non-trivial unsafe block that has not gotten standard-library-level of scrutiny. But if that's your view, you should not be all that starry-eyed about how "Rust is a safe language!" to begin with.
On the other hand, if you really do want to rely on Rust's strong safety guarantees, then the answer is: From the moment you use any library with unsafe code.
There is an option to not link to it for instances like OS writing and embedded. Writing everything in pure Rust without libc is entirely possible, even if an effort in losing sanity when you're reimplementing every syscall you need from scratch.
But even then, your code is calling out to kernel functions which are probably written in C or assembly, and therefore "dangerous."
Rust code safety is overhyped frequently, but reducing an attack surface is still an improvement over not doing so.
I agree and binary exploitation/Vulnerability Research is my area of expertise.. The whole "Lets port everything to Rust" is so misguided. Binary exploitation has already gotten 20x harder than say ten years ago.. Even so.. Most big breaches happen because people reuse their password or just give it out... Nation States are pretty much the only parties capable of delivering full kill chains that exploit, say chrome... That is why I moved to the embedded space.. Still so insecure...
Ironically using C without libc turns out to be easier (except for portability of course). The kernel ABI is much more sane than <stdio.h>. The only useful parts of libc are DNS resolution and text formatting, both of which it does rather poorly.
zlib-ng is pretty much assembly - with a bit of C.
There is this quote: but was not entirely fair because our rust implementation could assume that certain SIMD capabilities would be available, while zlib-ng had to check for them at runtime
zlib-ng can be compiled to whatever target arch is necessary, and the original post doesn't mention how it was compiled and what architecture and so on.
Nevertheless Russinovich actually says something in the lines of "simple rewriting in rust made some our code 5-15% faster (without deliberate optimizations)":
https://www.youtube.com/watch?v=1VgptLwP588&t=351s
Without analysis as to what caused that, that statement is meaningless.
For example, he says they didn’t set out to improve the code, but they were porting decennia-old C code to rust. Given the subject (truetype font parsing and rendering), my guess would be that the original code had more memory copies copying data out of the font data because rust makes it easier to safely avoid that (in which case the conclusion would be “C could be as fast, but with a lot more effort”), but it could also be that they spent a day figuring out some code did to realize that it wasn’t necessary on anything after Windows 95, and stripped it out, rather than porting it.
I understand their improvement figures exactly as you wrote, "C could be as fast, but with a lot more effort".
Yes, if your code in Lang-X is faster than C, it's almost certainly a skill issue somewhere in the C implementation.
However, in the day-to-day, if I can make my code run faster in Lang-X than C, especially if I'm using Lang-X for only a couple of months and C potentially for decades, that is absolutely meaningful. Sure, we can make the C code just as fast, but it's not viable to spend that much time and expertise on every small issue.
Outside of "which lang is better" discussions on online forums, it doesn't matter how fast you can theoretically make your program, it matters how fast you actually make it with the constraints your business have (time usually).
> Now, the Rust version took me about five times as long as the Go version
> The Go one performed almost identically well
Now this was for netcode rather than number crunching. But I actually had a similar surprise with number crunching, with C# and C++. I wrote the same program (rational approximation of Pi), line for line, in both languages, and the C# version ran faster. Apparently C# aggressively optimizes hot code paths while running, whereas to get that behavior in C++, you need to collect profiler data and use a special compiler flag.
Well, what I said is also true for Rust and Go. Sure if your Go code is faster than your Rust code, one could argue you have skill issues in Rust, but if to get the Rust program faster than your Go program requires 10x time (or more), it’s fair to say that Go is faster and simpler, even if it would be more precise to say that the Go code you can write performs as well as the Rust code you can write.
I’m sure I’m missing context, and presumably there are other benefits, but 5-15% improvement is such a small step to justify rewriting codebases.
I also wonder how much of an improvement you’d get by just asking for a “simple rewrite” in the existing language. I suspect there are often performance improvements to be had with simple changes in the existing language
Far better justification for a rewrite like this is if it eases maintenance, or simplifies building/testing/distribution. Taking an experienced and committed team of C developers with a mature code base, and retraining them to rewrite their project in Rust for its own sake is pretty absurd. But if you have a team that’s more comfortable in Rust, then doing so could make a lot of sense - and, yes, make it easier to ensure the product is secure and memory-safe.
Disagree - a rewrite for “maintainability” is an engineer saying they want to rewrite in their preferred language. I wouldn’t allow someone on my team to rewrite a core dependency for “maintainability”, but I absolutely would if they suggested it would be faster and safer.
We’re talking about a rust rewrite of a fairly core level library. I don’t think C is inherently unsuitable or difficult to hire for. If the library was in Fortran then maybe.
But yes you are technically correct, congratulations.
I was responding to a general claim. In any case, I certainly disagree that C is suitable in 2025 for the vast majority of possible use-cases. For fun? Sure, but not for shipping code you want to rely on.
Obviously the code isn't going anywhere, and obviously we DO have reliable code we've built with C. But acting like C and Rust deliver equivalent value is simply farcical: you choose C for rapid development and cheap devs (or some other niche concern, like using an obscure embedded arch), and you choose rust to solve the problems that C introduced.
I agree that simple rewriting could have given some if not all perf benefits, but can it be the case that rust forces us to structure code in a way that is for some reason more performant in some cases?
5-15% is a big deal for a low-level foundational code, especially if you get it along with some other guarantees, which may be of greater importance.
There are hopefully very few things that can be done to low level building blocks. A 15% improvement is absolutely worth it for a library as widely used as a compression library.
Even 5% on a hot path are quite the big gains, actually.
Furthermore, they said that they did not expect any performance gains. They did the rewrite for other reasons and got the unexpected bonus of extra performance.
One big part I've noticed when working in rust is that, because the compilation and analysis checks you're given are so much stronger than in C or C++, and because the ecosystem of crates is so easy to make use of, I'll generally be able to make use of more advanced algorithms and methods.
I'm currently working with ~150 dependencies in my current project which I know would be a major hurdle in previous C or C++ projects.
Everything you said is correct of course, but the idea of auditing 150 dependencies makes me feel ill. It's essentially impossible for a single person.
The effort is _roughly_ proportional - if you need to parse JSON in either language you can write it yourself or use an existing library. Both of those are the same amount of work in c++ and rust.
This has generally been the case, but a system language like Rust has access to optimisations that C simply won't have due to the compiler having so much more information (e.g. being able to skip run time array size checks because the compiler was able to prove out of bounds access cannot occur).
I heard that aliasing in C prevents the compiler from optimizing aggressively. I can believe Rust's compiler can optimize more aggressively if there's no aliasing problem.
Which is so underused that the whole compiler feature was buggy as hell, and was only recently fixed because compiling Rust where it is the norm exposed it.
My understanding is that noalias isn't fully utilized by LLVM, just that it's less buggy now, so there's some uncertainty leaning in favor of Rust in terms of future Rust-specific optimizations. Certainly a language like Fortran, with its restrictions, delivers accordingly on optimization, so I imagine Rust has plenty of room to grow similarly.
> I wonder why no one has written high performance library code in assembly yet at this point?
What do you mean by that?
There is plenty of hand-rolled assembly in low-level libraries, whether you look at OpenBLAS (17%), GMP (36%), BoringSSL (25%), WolfSSL (14%) -- all of these numbers are based on looking at Github's language breakdown (which is measured on a per-file basis, so doesn't count inline asm or heavy use of intrinsics).
There are contexts where you want better performance guarantees than the compiler will give you. If you're dealing with cryptography, you probably want to guard against timing attacks via constant-time code. If you're dealing with math, maybe you really do want to eke out as much performance as possible, autovectorization just isn't doing what you want it to do, and your intrinsic-based code just isn't using all your registers as efficiently as you'd like.
C++ surpassed C performance decades ago. While C still has some lingering cachet from its history of being “fast”, most software engineers have not worked at a time when it was actually true. C has never been that amenable to scalable optimization, due mostly to very limited abstractions and compile-time codegen.
Doesn’t it say something if Rust programmers routinely feel more comfortable making aggressive optimizations and have more time to do so? We maintain code for longer than the time taken to write the first version and not having to pay as much ongoing overhead cost is worth something.
How can it not? Experts in C taking longer to make a slower and less safe implementation than experts in Rust? It's not conclusive but it most certainly says something about the language.
The thing is, Rust allows you to casually code things that are fast. A few years back I took part in an "all programming languages allowed" competition on a popular hacker blog in my country. The topic was who writes the fastest tokenizer (a thing splitting sentences into words).
I took 15 minutes to write one in Rust (a language I had just learned by that point) using a "that should work" approach and became second place, with some high effort C-implementations being slower and a highly optimized assembler variant taking first place.
Since then I programmed a lot more in C and C++ as well (for other reasons) and got more experience. Rust is not automatically faster, but the defaults and std library of Rust is so well put together that a common-sense approach will outperform most C code without even trying – and it does so while having typesafety and memory safety. This is not nothing in my book and still extremely impressive.
The best thing about learning Rust however was how much I learned for all the other languages. Because what you learn there is not just how to use Rust, but how to program well. Understanding the way the Rust borrow checker works 1000% helped me avoiding nasty bugs in C/C++ by realizing that I violatr ownership rules (e.g. by having multiple writers)
zlib itself seems pretty antiquated/outdated these days, but it does remain popular, even as a basis for newer parallel-friendly formats such as https://www.htslib.org/doc/bgzip.html
The benchmarks in the parent post are comparing to zlib-ng, which is substantially faster than zlib. The zippy claims are against "zlib found on a fresh Linux install" which at least for Debian is classic zlib.
Thanks (to all correctors). FWIW, that zlib-ng discussion page you link to has way more information about what machine the benchmarks were run on than TFA. It's also a safe bet that Google timed their chromium lib (which seems really close) on a much larger diversity of core architectures than these 3..4 guys have with zlib-rs. So, you know, very early days in terms of perf claims, IMO.
Also, FWIW, that zippy Nim library has essentially zero CPU-specific optimizations that I could find. Maybe one tiny one in some checksumming bit. Optimization is specialization. So, I'd guess it's probably a little slower than zlib-ng now that this is pointed out, but as @hinkley observed, portability can also be a meaningful goal/axis.
Zlib is unapologetically written to be portable rather than fast. It is absolutely no wonder that a Rust implementation would be faster. It runs on a pathetically small number of systems by contrast. This is not a dig at Rust, it’s an acknowledgement of how many systems exist out there, once you include embedded, automotive, aerospace, telecom, industrial control systems, and mainframes.
Richard Hipp denounces claims that SQLite is the widest-used piece of code in the world and offers zlib as a candidate for that title, which I believe he is entirely correct about. I’ve been consciously using it for almost thirty years, and for a few years before that without knowing I was.
I think performance is an underappreciated benefit of safe languages that compile to machine code.
If you're writing your program in C, you're afraid of shooting yourself in the foot and introducing security vulnerabilities, so you'll naturally tend to avoid significant refactorings or complicated multithreading unless necessary. If you have Rust's memory safety guarantees, Go's channels and lightweight goroutines, or the access to a test runner from either of those languages, that's suddenly a lot less of a problem.
The compiler guarantees you get won't hurt either. Just to give a simple example, if your Rust function receives an immutable reference to a struct, it can rely on the fact that a member of that struct won't magically be mutated by a call to some random function through spooky action at a distance. It can just keep it on the stack / in a callee-saved register instead of fetching it from memory at every loop iteration, if that's more optimal.
Then there's the easy access to package ecosystems and extensive standard libraries. If there's a super popular do_foo package, you can almost guarantee that it was a bottleneck for somebody at some point, so it's probably optimized to hell and back. It's certainly more optimized than your simple 10-line do_foo function that you would have written in C, because that's easier than dealing with yet another third-party library and whatever build system it uses.
Chromium is kind of stuck with zlib because it's the algorithm that's in the standards, but if you're making your own protocol, you can do even better than this by picking a better algorithm. Zstandard is faster and compresses better. LZ4 is much faster, but not quite as small.
(As an aside, at my last job container pushes / pulls were in the development critical path for a lot of workflows. It turns out that sha256 and gzip are responsible for a lot of the time spent during container startup. Fortunately, Zstandard is allowed, and blake3 digests will be allowed soon.)
However, keep in mind that zstd also needs much more memory. IIRC, it uses by default 8 megabytes as its buffer size (and can be configured to use many times more than that), while zlib uses at most 32 kilobytes, allowing it to run even on small 16-bit processors.
Yeah I just discovered this a few days ago. All the docker-era tools default to gzip but if using, say, bazel rules_oci instead of rules_docker you can turn on zstd for large speedups in push/pull time.
Zlib-ng is between a couple and multiple times away from the state of the art[1], it’s just that nobody has yet done the (hard) work of adjusting libdeflate[2] to a richer API than “complete buffer in, complete buffer out”.
"Barely" or not is completely irrelevant. The fact is that it's measurably faster than the C implementation with the more common parameters. So the point that you're trying to make isn't clear tbh.
Also I'm pretty sure that the C implementation had more man hours put into it than the Rust one.
I think that would be really hard to measure. In particular, for this sort of very optimized code, we’d want to separate out the time spent designing the algorithms (which the Rust version benefits from as well). Actually I don’t think that is possible at all (how will we separate out time spent coding experiments in C, then learning from them).
Fortunately these “which language is best” SLOC measuring contests are just frivolous little things that only silly people take seriously.
It's... basically written in C. I'm no expert on zlib/deflate or related algorithms, but digging around https://github.com/trifectatechfoundation/zlib-rs/ almost every block with meaningful logic is marked unsafe. There's raw allocation management, raw slicing of arrays, etc... This code looks and smells like C, and very much not like rust. I don't know that this is a direct transcription of the C code, but if you were to try something like that this is sort of what it would look like.
I think there's lots of value in wrapping a raw/unsafe implementation with a rust API, but that's not quite what most people think of when writing code "in rust".
The things I’ve seen broadly adopted in the industry (i.e. sanitizers) are equally available in Rust. & Rust’s testing infrastructure is standardized so tests are actually common to see in every library.
The number of tools matters less than the quality of the tools. Rust’s inherent guarantees + miri + software verification tools mean that in practice Rust code, even with unsafe, ends up being higher quality.
Miri is the closest to a UB specification for Rust that there is, coming in the form of a tool so you can run it. It's really cool but Valgrind, which is a C tool that also supports Rust, also supports Rust code that calls to C and that does I/O, both pretty common things for programs to do.
Are there examples you're thinking about? The only good ones I can think of are bits about undefined behavior semantics, which frankly are very well covered in modern C code via tools like ubsan, etc...
That doesn't seem responsive. The question wasn't whether Rust and C are literally the same language ("duh", as it were), it was effectively "are there meaningful safety features provided to the unsafe zlib-rs code in question in that aren't already available in C toolchains/ecosystems?"
And there really aren't. The abbreviated/limited safety environment being exploited by this non-idiomatic Rust code seems to me to be basically isomorphic to the way you'd solve the problem in C.
> it was effectively "are there meaningful safety features provided to the unsafe zlib-rs code in question in that aren't already available in C toolchains/ecosystems?"
Ah, so that was like, not in your comment, but in a parent.
> And there really aren't.
I mean, not all of the code is unsafe. From a cursory glance, there's surely way more here than I see in most Rust packages, but that doesn't mean that you get no advantages. I picked a random file, and chose some random code out of it, and see this:
pub fn copy<'a>(
dest: &mut MaybeUninit<DeflateStream<'a>>,
source: &mut DeflateStream<'a>,
) -> ReturnCode {
// SAFETY: source and dest are both mutable references, so guaranteed not to overlap.
// dest being a reference to maybe uninitialized memory makes a copy of 1 DeflateStream valid.
unsafe {
core::ptr::copy_nonoverlapping(source, dest.as_mut_ptr(), 1);
}
The semantics of safe code, `&mut T`, provide the justification for why the unsafe code is okay. Heck, this code wouldn't even be legal in C, thanks to strict aliasing. (Well, I guess you could argue that in C code they'd be of the same type, since you don't have "might be uninitialized" in C's typesystem, but again, this is an invariant encoded in the type system that C can't do, so it's not possible to express in C for that reason either.)
Isn't that exactly my point though? This is just a memcpy(). In C, you do some analysis to prove to yourself that the pointers are valid[1]. In this unsafe Rust code, the author did some analysis to prove the same thing. I mean, sure, the specific analyses use words and jargon that are different. I don't think that's particularly notable. This is C code, written in Rust.
[1] FWIW, memcpy() arguments are declared restrict post-C99, the strict aliasing thing doesn't apply, for exactly the reason you're imagining.
> In C, you do some analysis to prove to yourself that the pointers are valid[1]
Right, and in Rust, you don't have to do it yourself: the language does it for you. If the signature were in C, you'd have to analyze the callers to make sure that this property is upheld when invoked. In Rust, the compiler does that for you.
> the strict aliasing thing doesn't apply
Yes, this is the case in this specific instance due to it being literally memcpy, but if it were any other function with the same signature, the problem would exist. Again, I picked some code at random, I'm not saying this one specific instance is even the best one. The broader point of "Rust has a type system that lets you encode more invariants than C's" is still broadly true.
No it doesn't? That comment is expressing a human analysis. The compiler would allow you to stuff any pointer in that you want, even ones that overlap. You're right that some side effects of the runtime can be exploited to do that analysis. But that's true of C too! (Like, "these are two separate heap blocks", or "these are owned by two separate objects", etc...). Still human analysis.
Frankly you're overselling hard here. A human author can absolutely mess that analysis up, which is the whole reason Rust calls it "unsafe" to begin with.
I think you're misunderstanding of what I'm claiming is being checked. I don't mean the unsafe block directly. I mean that &mut Ts do not alias. That is checked by the compiler.
I'm saying that even in a codebase with a lot of unsafe, the checks that are still performed have value.
Sure, but C++ objects returned from operator new are likewise guaranteed not to alias. There's "value" there, but not a lot of value. And I repeat, you're overselling hard here. People who write rust like this are going to produce roughly the same amount of memory safety bugs, and pretending otherwise is frankly dangerous, IMHO.
In safe rust there is no way to call the function in question if that sort of aliasing has happened. This means that if you get a bug from your copy, its in the copy method - the possibility it's been used inappropriately has been eliminated.
It reduces the search space for problems from: everywhere that created a pointer that is ultimately used in the copy, to: the copy function itself.
It reduces the number of programmers who have to keep the memory semantics of that copy in their head from "potentially everyone" to just "those who directly implement and check copy".
This comment summarizes the difference of unsafe Rust quite well. Basically, mostly safe Rust, but with few exceptions, fewer than one would imagine: https://news.ycombinator.com/item?id=43382176
C is not assembly, nor is it portable assembly at all in this century, so your phrasing is very off.
C code will go through a huge amounts of transformations by the compiler, and unless you are a compiler expert you will have no idea how the resulting code looks. It's not targeting the PDP-11 anymore.
I mentioned in under another comment - and while I consider myself versed enough in deflate - comparing the library to zlib-ng is quite weird as the latter is generally hand written assembly. In order to beat it'd take some oddity in the test itself
I'm not sure why people say this about certain languages (it is sometimes said about Haskell, as well).
The code has a C style to it, but that doesn't mean it wasn't actually written in Rust -- Rust deliberately has features to support writing this kind of code, in concert with safer, stricter code.
> Imagine if we applied this standard to C code. "Zlib-NG is basically written in assembler, not C..."
We absolutely should, if someone claimed/implied-via-headline that naive C was natively as fast as hand-tuned assembly! This kind of context matters.
FWIW: I'm not talking about the assembly in zlib-rs, I was specifically limiting my analysis to the rust layers doing memory organization, etc... Discussing Rust is just exhausting. It's one digression after another, like the community can't just take a reasonable point ("zlib-rs isn't a good example of idiomatic rust performance") on its face.
I'm not sure anyone really believes `zlib-rs` is a good example of idiomatic Rust performance, though
Maybe the reason I think that is because I've written Rust for a variety of purposes (web application, database bindings, high performance parser) so I account for the "register" of Rust that is appropriate without thinking about it.
It might be that a simple description like the headline leads some people to believe they could write Rust the easy way and get code that's as fast as writing "Rust the hard way".
However, that is different than what you earlier said -- "It's... basically written in C.". I have actually written Rust programs where some parts were literally written in C and linked in -- in order to build functioning plugins -- and there is a world of difference with that.
Regarding
Discussing Rust is just exhausting. It's one digression after another, like the community can't just take a reasonable point ("zlib-rs isn't a good example of idiomatic rust performance") on its face.
I'm just not sure what to say to this. What do you expect from me, here?
It doesn't exploit (and in fact deliberately evades) Rust's signature memory safety features. The impression from the headline is "Rust is as fast as C now!", but in fact the subset of the language that has been shown to be as fast as C is the subset that is basically isomorphic to C.
The impression a naive reader might take is that idiomatic/safe/best-practices Rust has now closed the performance gap. But clearly that's not happening here.
But again, not exploited by the code in question. This isn't using the Rust runtime heap, it's doing its own thing with raw pointers/indexing, and even seems to have its own allocator.
That is not correct; in another comment you can see where the code takes advantage of the rust-specific &mut notation to use a fast memcpy for non-overlapping pointers.
I would argue compile time changes don't matter much, as the amount of data going through zlib all across the world is so large, that any performance gain should more than compensate any additional compilation time (and zlib-rs compiles in a couple of seconds anyway on my laptop).
As for dependencies: zlib, zlib-ng and zlib-rs all obviously need some access to OS APIs for filesystem access if compiled with that functionality. At least for zlib-rs: if you provide an allocator and don't need any of the file IO you can compile it without any dependencies (not even standard library or libc, just a couple of core types are needed). zlib-rs does have some testing dependencies though, but I think that is fair. All in: all of them use almost exactly the same external dependencies (i.e.: nothing aside from libc-like functionality).
zlib-rs is a bit bigger by default (around 400KB), with some of the Rust machinery. But if you change some of that (i.e. panic=abort), use a nightly compiler (unfortunately still needed for the right flags) and add the right flags both libraries are virtually the same size, with zlib at about 119KB and zlib-rs at about 118KB.
One of the things I like about C is I can download a statically-compiled native GCC for use on a computer with modest amounts of memory, storage and a relatively old, slow CPU. Total size uncompressed is 242.3MB.
Using this I can statically compile a cross-compiler. Total size uncompressed 169.4MB.
I use GCC to compille zlib and a wide variety of other software. I can build an operating system from the ground up.
Perhaps someday during my lifetime it will be possible to compile programs written in Rust using inexpensive computers with modest amounts of memory, storage and relatively slow CPUs. Meanwhille, there is C.
Does this performance have anything to do with Rust itself, or is it just more optimized than the other C-language versions (more SIMD instructions / raw assembly code)? I ask because there is a canonical use case where C++ can consistently outperform C -- sorting, because the comparison operator in C++ allows for more compiler optimization compared to the C version: qsort(). I am wondering if there is something similar here for Rust vs C.
If you're dealing with a compiled system language the language is going to make almost no difference in speed, especially if they are all being optimized by LLVM.
An optimized version that controls allocations, has good memory access patterns, uses SIMD and uses multi-threading can easily be 100x faster or more. Better memory access alone can speed a program up 20x or more.
New native code implementation of zlib faster than old native code version. So what? Rust has a lot of recommend it, but it's not automatically faster than C.
Why can’t something be faster than C? If a language is able to convey more information to a backend like LLVM, the backend could use that to produce more optimised code than what it could do for C.
For example, if the language is able to say, for any two pointers, the two pointers will not overlap - that would enable the backend to optimise further. In C this requires an explicit restrict keyword. In Rust, it’s the default.
grep (C) is about 5-10x slower than ripgrep (Rust). That’s why ripgrep is used to execute all searches in VS Code and not grep.
Or a different tack. If you wrote a program that needed to sort data, the Rust version would probably be faster thanks to the standard library sort being the fastest, across languages (https://github.com/rust-lang/rust/pull/124032). Again, faster than C.
Happy to give more examples if you’re interested.
There’s nothing special about C that entitles it to the crown of “nothing faster”. This would have made sense in 2005, not 2025.
First, I would say that "ripgrep is generally faster than GNU grep" is a true statement. But sometimes GNU grep is faster than ripgrep and in many cases, performance is comparable or only a "little" slower than ripgrep.
Secondly, VS Code using ripgrep because of its speed is only one piece of the picture. Licensing was also a major consideration. There is an issue about this where they originally considered ripgrep (and ag if I recall correctly), but I'm on mobile so I don't have the link handy.
The kind of code you can write in rust can indeed be faster than C, but someone will wax poetic about how anything is possible in C and they would be valid.
The major reason that rust can be faster than C though, is because due to the way the compiler is constructed, you can lean on threading idiomatically. The same can be true for Go, coroutines vs no coroutines in some cases is going to be faster for the use case.
You can write these things to be the same speed or even faster in C, but you won’t, because it’s hard and you will introduce more bugs per KLOC in C with concurrency vs Go or Rust.
> but someone will wax poetic about how anything is possible in C and they would be valid.
Not at all would that be valid.
C has a semantic model which was close to how early CPUs worked, but a lot has changed since. It's more like CPUs deliberately expose an API so that C programmers could feel at home, but stuff like SIMD and the like is non-existent in C besides as compiler extensions. But even just calling conventions, the stack, etc are all stuff you have no real control over in the C language, and a more optimal version of your code might want to do so. Sure, the compiler might be sufficiently smart, but then it might as well convert my Python script to that ultra-efficient machine code, right?
So no, you simply can't write everything in C, something like simd-json is just not possible. Can you put inline assembly into C? Yeah, but I can also call inline assembly from Scratch and JS, that's not C at all.
Also, Go is not even playing in the same ballpark as C/C++/Rust.
If you don't count manual SIMD intrinsics or inline assembly as C, then Rust and FORTRAN can be faster than C.
This is mainly thanks to having pointer aliasing guarantees that C doesn't have. They can get autovectorization optimizations where C's semantics get in the way.
Of course many things can be faster than C, because C is very far from modern hardware. If you compile with optimisation flags, the generated machine code looks nothing like what you programmed in C.
It is quite easy for C++ and Rust to both be faster than C in things larger than toy projects. C is hardly a panacea of efficiency, and the language makes useful things very hard to do efficiently.
You can contort C to trick it into being fast[1], but it quickly becomes an unmaintainable nightmare so almost nobody does.
1: eg, correct use of restrict, manually creating move semantics, manually creating small string optimizations, etc...
We were presumably talking about an ideal massless space [Minkowski] in which the speed of light in a vaccuum is considered -- that is what c is defined as.
Fortran has been faster than C, because C has aliasing, preventing optimizations. At least for decades this was why for some applications Fortran was just faster.
It's not just "a sufficiently smart compiler", without completely unrealistic (as in "halting problem" unrealistic, in the general case) "smartness".
So no, C is inherently slower than some other languages.
Besides the famous "C is not a low-level language" blog post.. I don't even get what you are thinking. C is not even the performance queen for large programs (the de facto standard today is C++ for good reasons), let alone for tiny ultra hot loops like codecs and stuff, which are all hand-written assembly.
It's not even hard to beat C with something like Rust or C++, because you can properly do high level optimizations as the language is expressive enough for that.
While AI can certainly produce code that's faster or otherwise better than human-written code, I am utterly skeptical of LLMs doing that. My own experience with LLM is that humans can do everything they do, but they are faster than humans. I believe we should look at non-LLM AI technologies for going beyond what a skilled human programmer can expect to do. The most famous example of AI doing that is https://www.nature.com/articles/s41586-023-06004-9 where no LLM is involved.
I contributed a number of performance patches to this release of zlib-rs. This was my first time doing perf work on a Rust project, so here are some things I learned: Even in a project that uses `unsafe` for SIMD and internal buffers, Rust still provided guardrails that made it easier to iterate on optimizations. Abstraction boundaries helped here: a common idiom in the codebase is to cast a raw buffer to a Rust slice for processing, to enable more compile-time checking of lifetimes and array bounds. The compiler pleasantly surprised me by doing optimizations I thought I’d have to do myself, such as optimizing away bounds checks for array accesses that could be proven correct at compile time. It also inlined functions aggressively, which enabled it to do common subexpression elimination across functions. Many times, I had an idea for a micro-optimization, but when I looked at the generated assembly I found the compiler had already done it. Some of the performance improvements came from better cache locality. I had to use C-style structure declarations in one place to force fields that were commonly used together to inhabit the same cache line. For the rare cases where this is needed, it was helpful that Rust enabled it. SIMD code is arch-specific and requires unsafe APIs. Hopefully this will get better in the future. Memory-safety in the language was a piece of the project’s overall solution for shipping correct code. Test coverage and auditing were two other critical pieces.
Interesting! I wonder if you have used PGO in the project? Forcing fields to be located next to each other kind of feels like something that PGO could do for you.
I basically did manual PGO because I was also reducing the size of several integer fields at the same time to pack more into each cache line. I’m excited to try out the rustc+LLVM PGO for future optimizations.
A long-standing issue with that was just recently fixed: https://github.com/rust-lang/rust/pull/133250
I found out I already know Rust:
Kidding aside, I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library. At what point does it really stop mattering if this is C or Rust?Presumably with inline assembly both languages can emit what is effectively the same machine code. Is the Rust compiler a better optimizing compiler than C compilers?
Using unsafe blocks in Rust is confusing when you first see it. The idea is that you have to opt-out of compiler safety guarantees for specific sections of code, but they’re clearly marked by the unsafe block.
In good practice it’s used judiciously in a codebase where it makes sense. Those sections receive extra attention and analysis by the developers.
Of course you can find sloppy codebases where people reach for unsafe as a way to get around Rust instead of writing code the Rust way, but that’s not the intent.
You can also find die-hard Rust users who think unsafe should never be used and make a point to avoid libraries that use it, but that’s excessive.
Unsafe is a very distinct code smell. Like the hydrogen sulfide added to natural gas to allow folks to smell a gas leak.
If you smell it when you're not working on the gas lines, that's a signal.
There's no standard recipe for natural gas odorant, but it's typically a mixture of various organosulfur compounds, not hydrogen sulfide. See:
https://en.wikipedia.org/wiki/Odorizer#Natural_gas_odorizers
TIL also - until today, I thought it was just "mercaptan". Turns out there are actually two variants of that:
> Ethanethiol (EM), commonly known as ethyl mercaptan is used in liquefied petroleum gas (LPG) and resembles odor of leeks, onions, durian, or cooked cabbage
Methanethiol, commonly known as methyl mercaptan, is added to natural gas as an odorant, usually in mixtures containing methane. Its smell is reminiscent of rotten eggs or cabbage.
...but you can still call it "mercaptan" and be ~ correct in most cases.
TIL!
Someone mentioned to me that for something as simple as a Linked list you have to use unsafe in rust
Update its how the std lib does it: https://doc.rust-lang.org/src/alloc/collections/linked_list....
Note that that is a doubly linked list, because it is a "soup of ownership" data structure. A singly linked list has clear ownership so it can be modelled in safe Rust.
On modern aschitectures you shouldn't use either unless you have an extremely niche use-case. They are not general use data structures anymore in a world where cache locality is a thing.
No you don’t. You can use the standard linked list that is already included in the standard library.
Coming up with these niche examples of things you need unsafe for in order to discredit rust’s safety guarantees is just not interesting. What fraction of programmer time is spent writing custom linked lists? Surely way less than 1%. In most of the other 99%, Rust is very helpful.
I think the point is that it's funny that the standard library has to use unsafe to implement a data structure that's like the second data structure you learn in an intro to CS class
No, that's how the feature is supposed to work.
You design an abstraction which is unsafe inside, and exposes a safe API to users. That is really how unsafe it meant to be used.
Of course the standard library uses unsafe. This is where you want unsafe to be, not in random user code. That's what it was made for.
Yeah, but Rust just proves the point here that (doubly) linked lists
a) are surprisingly nontrivial to get right,
b) have almost no practical uses, and
c) are only taught because they're conceptually nice and demonstrate pointers and O(1) vs O(n) tradeoffs.
Note that safe Rust has no problems with singly-linked lists or in general any directed tree structure.
Why is it particularly funny?
C has to make a syscall to the kernel which ultimately results in a BIOS interrupt to implement printf, which you need for the hello world program on page 1 of K&R.
Does that mean that C has no abstraction advantage over directly coding interrupts with asm? Of course not.
> C has to make a syscall to the kernel which ultimately results in a BIOS interrupt to implement printf,
That's not the case since the late 1990s. Other than during early boot, nobody calls into the BIOS to output text, and even then "BIOS interrupt" is not something normally used anymore (EFI uses direct function calls through a function table instead of going through software interrupts).
What really happens in the kernel nowadays is direct memory access and direct manipulation of I/O ports and memory mapped registers. That is, all modern operating systems directly manipulate the hardware for text and graphics output, instead of going through the BIOS.
Thanks for the information (I mean that genuinely, not sarcastically — I do really find it interesting). But it doesn’t really impact my point.
I love how the most common negative thing I hear about rust is how a really uncommon data structure no one should write by hand and should almost always import can be written using the unsafe rust language feature. Meanwhile rust application s tend to in most cases be considerably faster, more correct and more enjoyable to maintain than other languages. Must be a really awesome technology.
Doesn’t Arc and Weak work for doubly linked lists? Rust docs recommend Weak as a way to break pointer cycles: https://doc.rust-lang.org/std/sync/struct.Arc.html#breaking-...
This is far less of a problem than it would be in a C-like language, though.
You can implement that linked list just once, audit the unsafe parts extensively, provide a fully safe API to clients, and then just use that safe API in many different places. You don't need thousands of project-specific linked list reimplementations.
Hydrogen Sulfide is highly corrosive (big problem in sewers and associated infrastructure) I highly doubt you would choose to introduce it to gas pipelines on purpose.
Hydrogen sulfide is highly toxic (it's comparable to carbon monoxide). I doubt anyone in their right mind would put it intentionally in a place where it could leak around humans.
But it can occur naturally in natural gas.
I assume GP was referring to mercaptan, or similar. i.e. Something with a distinctive bad smell.
https://en.m.wikipedia.org/wiki/Methanethiol
> Hydrogen sulfide is highly toxic (it's comparable to carbon monoxide)
It's a bad comparison since CO doesn't smell, which is what makes it dangerous, while H2S is detected by our sense of smell at concentrations much lower than the toxic dose (in fact, its biggest dangers comes from the fact that at dangerous concentration it doesn't even smell anything due to our receptors being saturated).
It's not what's being put in natural gas, but it wouldn't be that dangerous if we did.
> Like the hydrogen sulfide added to natural gas to allow folks to smell a gas leak.
I am 100% sure that the smell they add to natural gas does not smell like rotten eggs.
They add mercaptan which is like 1000x the rotten egg smell of H2S.
Mercaptan is a group of compounds, more than one of which are used as gas odorants, so in some places, gas smells of rotten eggs, similar to H2S, while in others gas doesn't smell like that at all, but a quite distinct smell that's reminiscent garlic and durian.
you are lucky to not have smelled metacarpan (which is what is actually put in). Much much worse than H2S
I have. It's worse no doubt. But it's not the smell of rotten eggs. My comment was meant to be tongue-in-cheek to correct the mistake of saying "H2S" in the GP comment.
If that is the case (and I have no reason to believe otherwise), I apologise. Should work on detecting tone better.
Look, no. Just go read the unsafe block in question. It's just SIMD intrinsics. No memory access. No pointers. It's unsafe in name only.
No need to get all moral about it.
By your line of reasoning, SIMD intrinsics functions should not be marked as unsafe in the first place. Then why are they marked as unsafe?
They are in the process of marking them safe, which is enabled through the target_feature 1.1 RFC.
In fact, it has already been merged two weeks ago: https://github.com/rust-lang/stdarch/pull/1714
The change is already visible on nightly: https://doc.rust-lang.org/nightly/core/arch/x86/fn._mm_xor_s...
Compared to stable: https://doc.rust-lang.org/core/arch/x86/fn._mm_xor_si128.htm...
So this should be stable in 1.87 on May 15 (Rust's 10 year anniversary since 1.0)
They are marked as unsafe because there are hundreds and hundreds of intrinsics, some of which do memory access, some have side effects and others are arithmetic only. Someone would have to individually review them and explicitly mark the safe ones.
There was a bug open about it and the rationale was that no one with the expertise (some of these are quite arcane) was stepping up to do it. (edit: other comments in this thread suggest that this effort is now underway and first changes were committed a few weeks ago)
You can do safe SIMD using std::simd but it is nightly only at this point.
For now the caller has to ensure proper alignment of SMID lines. But in the future a safe API will be made available, once the kinks are ironed out. You can already use it in fact, by enabling a specific compiler feature [1].
[1] https://doc.rust-lang.org/std/simd/index.html
there are no loads in the above unsafe block, in practice loadu is just as fast as load, and even if you manually use the aligned load or store, you get a crash. it's silly to say that crashes are unsafe.
Well, there's a category difference between a crash as in a panic and a crash as in a CPU exception. Usually, "safe" programming limits crashes to language-level error handling, which allows you to easily reason about the nature of crashes: if the type system is sound and your program doesn't use unsafe, the only way it should crash is by panic, and panics are recoverable and leave your program in a well-defined state. By the time you get to a signal handler, you're too late. Admittedly, there are some cases where this is less important than others... misaligned load/store wouldn't lead to a potential RCE, but if it can bring down a program it still is a potential DoS vector.
Of course, in practice, even in Rust, it isn't strictly true that programs without unsafe can't crash with fatal runtime errors. There's always stack overflows, which will crash you with a SIGABRT or equivalent operating system error.
As you point out later, a SIGBRT or a SIGBUS would both be perfectly safe and really no different than a panic. With enough infra you could convert them to panic anyway (but probably not worth the effort).
Well, that's the thing though: in terms of Rust and Go and other safe programming languages, CPU exceptions are not "safe" even though they are not inherently dangerous. The point is that the subset of the language that is safe can't generate them, period. They are not accounted for in safe code.
There are uses for this, especially since some code will run in environments where you can not simply handle it, but it's also just cleaner this way; you don't have to worry about the different behaviors between operating systems and possibly CPU architectures with regards to error recovery if you simply don't generate any.
Since there are these edge cases where it wouldn't be possible to handle faults easily (e.g. some kernel code) it needs to be considered unsafe in general.
That’s largely true, but there are some exceptions (pun not intended).
In Rust, the CPU exception resulting from a stack overflow is considered safe. The compiler uses stack probing to ensure that as long as there is at least one page of unmapped memory below the stack (guard page), the program will reliably fault on it rather than continuing to access memory further below. In most environments it is possible to set up a guard page, including Linux kernel code if CONFIG_VMAP_STACK is enabled. But there are other environments where it’s not, such as WebAssembly and some microcontrollers. In those environments, the backend would have to add explicit checks to function prologs to ensure enough stack is available. I say “would have to”, not “does”: I’ve heard that on at least the microcontrollers, there are no such checks and Rust is just unsound at the moment. Not sure about WebAssembly.
Meanwhile, Go uses CPU exceptions to handle nil dereferences.
Yeah, I glossed over the Rust stack overflow case. I don't know why: Literally two parent comments up I did bother to mention it.
That said, I actually entirely forgot Go catches nil derefs in a segfault handler. I guess it's not a big deal since Go isn't really suitable for free-standing environments where avoiding CPU exceptions is sometimes more useful, so there's no particular reason why the runtime can't rely on it.
Also, AFAIK panics are not always recoverable in Rust. You can compile your project with `panic = "abort"`, in which case the program will quit immediately whenever a panic is encountered.
Sure, but that is beside the point: if you compile code like that, you're intentionally making panics unrecoverable. The nature of panics from the language perspective is not any different; you're still in a well-defined state when it happens.
It's also possible to go a step further and practice "panic-free" Rust where you write code in such a way that it never links to the panic handler. Seems pretty hard to do, but seems like it might be worth it sometimes, especially if you're in an environment where you don't have anything sensible to do on a panic.
There's no standardization of simd in Rust yet, they've been sitting in nightly unstable for years:
https://doc.rust-lang.org/std/intrinsics/simd/index.html
So I suspect it's a matter of two things:
1. You're calling out to what's basically assembly, so buyer beware. This is basically FFI into C/asm.
2. There's no guarantee on what comes out of those 128-bit vectors after to follow any sanity or expectations, so... buyer beware. Same reason std::mem::transmute is marked unsafe.
It's really the weakest form of unsafe.
Still entirely within the bounds of a sane person to reason about.
The example here is trivially safe but more general SIMD safety is going to be extremely difficult to analyze for safety, possibly intractable.
For example, it is perfectly legal to dereference a vector pointer that references illegal memory if you mask the illegal addresses. This is a useful trick and common in e.g. idiomatic AVX-512 code. The mask registers are almost always computed at runtime so it would be effectively impossible to determine if a potentially illegal dereference is actually illegal at compile-time.
I suspect we’ll be hand-rolling unsafe SIMD for a long time. The different ISAs are too different, inconsistent, and weird. A compiler that could make this clean and safe is like fusion power, it has always been 10 years away my entire career.
Presumably a bounds check on the mask could be done or a safe variant exposed that does that trick under the hood. But yeah I don’t disagree that it’s “safe SIMD” is unlikely to scratch the itch for various applications but hopefully at least it’ll scratch a lot of them enough that the remaining unsafe is reduced.
No, a bounds check beats the purpose of simd in these cases
Not necessarily if you can hoist the bounds check outside of the loop somehow.
> There's no standardization of simd in Rust yet
Of safe SIMD, but some stuff in core::arch is stabilized. Here's the first bit called in the example of the OP: https://doc.rust-lang.org/core/arch/x86/fn._mm_clmulepi64_si...
> they've been sitting in nightly unstable for years
So many very useful features of Rust and its core library spend years in "nightly" because the maintainers of those features don't have the discipline to see them through.
Before I started working with Rust, I spent a lot of time using Swift for systems-y/server-side code, outside of the Apple ecosystem. There is a lot I like about that language, but one of the biggest factors that drove me away was just how fast the Apple team was to add more and more compiler-magic features without considering whether they were really the best possible design. (One example: adding compiler-magic derived implementations of specific protocols instead of an extensible macro system like Rust has.) When these concerns were raised on the mailing lists, the response from leadership was "yes, something like that would be better in the long run, but we want to ship this now." Or even in one case, "yes, that tweak to the design would be better, but we already showed off the old design at the WWDC keynote and we don't want to break code we put in a keynote slide."
When I started working in Rust, I'd want some feature or function, look it up, and find it was unstable, sometimes for years. This was frustrating at first, but then I'd go read the GitHub issue thread and find that there was some design or implementation concern that needed to be overcome, and that people were actively working on it and unwilling to stabilize the feature until they were sure it was the best possible design. And the result of that is that features that do get stabilized are well thought out, generalize, and compose well with everything else in the language.
Yes, I really want things like portable SIMD, allocators, generators, or Iterator::intersperse. But programming languages are the one place I really do want perfect to be the enemy of good. I'd rather it take 5+ years to stabilize features than for us to end up with another Swift or C++.
> the response from leadership was "yes, something like that would be better in the long run, but we want to ship this now."
Sounds like the Rust's async story.
Rust's async model was shipped as an MVP, not in the sense of "this is a bad design and we just want to ship it"; but rather, "we know this is the first step of the eventual design we want, so we can commit to stabilizing these parts of it now while we work on the rest." There's ongoing work to bring together the rest of the pieces and ergonomics on top of that foundational model; async closures & trait methods were recently stabilized, and work towards things like pin ergonomics & simplifying cheap clones like Rc are underway.
Rust uses this strategy of minimal/incremental stabilization quite often (see also: const generics, impl Trait); the difference between this and what drove me away from Swift is that MVPs aren't shipped unless it's clear that the design choices being made now will still be the right choices when the rest of the feature is ready.
IMO shipping async without a standardized API for basic common async facilities (like thread spawning, file/network I/O) was a mistake and basically means that tokio has eaten the whole async side of the language.
Why define runtime independence as a goal, but then make it impossible to write runtime agnostic crates?
(Well, there's the "agnostic" crate at least now)
>IMO shipping async without a standardized API for basic common async facilities (like thread spawning, file/network I/O) was a mistake and basically means that tokio has eaten the whole async side of the language.
I would argue that it's the opposite of a mistake. If you standardize everything before the ecosystem gets a chance to play with it, you risk making mistakes that you have to live with in perpetuity.
Unless you clearly define how and when you’re going to handle removing a standard or updating it to reflect better use cases.
Language designers admittedly should worry about constant breakage but it’s fine to have some churn, and we shouldn’t be so concerned of it that it freezes everything
Async went through years of work before being stabilized. This isn't true.
My personal opinion is that if you want to contribute a language feature, shit or get off the pot. Leaving around a half-baked solution actually raises the required effort for someone who isn't you to add that feature (or an equivalent) because they now have to either (1) ramp up on the spaghetti you wrote or (2) overcome the barrier of explaining why your thing isn't good enough. Neither of those two things are fun (which is important since writing language features is volunteer work) and those things come in the place of doing what is actually fun, which is writing the relevant code.
The fact that the Rust maintainers allow people to put in half-baked features before they are fully designed is the biggest cultural failing of the language, IMO.
>The fact that the Rust maintainers allow people to put in half-baked features before they are fully designed is the biggest cultural failing of the language, IMO.
In nightly?
Hard disagree. Letting people try things out in the real world is how you avoid half-baked features. Easy availability of nightly compilers with unstable features allows way more people to get involved in the pre-stabilization polishing phase of things and raise practical concerns instead of theoretical ones.
C++ takes the approach of writing and nitpicking whitepapers for years before any implementations are ready and it's hard to see how that has led to better outcomes relatively speaking.
Yeah, we're going to have to agree to disagree on the C++ flow (really the flow for any language that has a written standard) being better. That flow is usually:
1. Big library/compiler does a thing, and people really like it
2. Other compilers and libraries copy that thing, sometimes putting their own spin on it
3. All the kinks get worked out and they write a white paper
4. Eventually the thing becomes standard
That way, everything in the standard library is something that is fully-thought-out and feature-complete. It also gives much more room for competing implementations to be built and considered before someone stakes out a spot in the standard library for their thing.
>That way, everything in the standard library is something that is fully-thought-out and feature-complete
Are C++ features really that much better thought out? Modules were "standardized" half a decade ago, but the list of problems with actually using them in practice is still pretty damn long to the point where adoption is basically non-existent.
I'm not going to pretend to be nearly as knowledgeable about C++ as Rust, but it seems like most new C++ features I hear about are a bit janky or don't actually fit that well with the rest of the language. Something that tends to happen when designing things in an ivory tower without testing them in practice.
They absolutely are. The reason many features are stupid and janky is because the language and its ecosystem has had almost 40 more years to collect cruft.
The fundamental problem with modules is that build systems for C++ have different abstractions and boundaries. C++ modules are like Rust async - something that just doesn't fit well with the language/system and got hammered in anyway.
The reason it seems like they come from nowhere is probably because you don't know where they come from. Most things go through boost, folly, absl, clang, or GCC (or are vendor-specific features) before going to std.
That being said, it's not just C++ that has this flow for adding features to the language. Almost every other major language that is not Rust has an authoritative specification.
Since C++17 that anything hardly goes "through boost, folly, absl, clang, or GCC (or are vendor-specific features) before going to std.".
What's a Rust feature that you think suffered from their process in a way that C++ would not have?
Unfortunely C++ on the last set of revisions has gotten that sequence wrong, many ideas are now PDF implemented before showing up in any compiler years later.
Fully-thought-out and feature-complete is something that since C++17 has been hardly happening.
> maintainers of those features don't have the discipline to see them through.
This take makes me sad. There are a lot of reasons why an open source contributor may not see something through. "Lack of discipline" is only one of them. Others that come to mind are: lack of time, lack of resources, lack of capability (i.e good at writing code, but struggles to navigate the social complexities of sheparding a significant code change), clinically impaired ability to "stay the course" and "see things through" (e.g. ADHD), or maybe it was a collaborative effort and some of the parties dropped out for any of the aforementioned reasons.
I don't have a solution, but it does kinda suck that open source contribution processes are so dependent on instigators being the responsible party to seeing a change all the way through the pipeline.
simd and allocator_api are the two that irritate me enough to consider a different language for future systems dev projects.
I don't have the personality or time to wade into committee type work, so I have no idea what it would take to get those two across the finish line, but the allocator one in particular makes me question Rust for lower level applications. I think it's just not going to happen.
If Zig had proper ADTs and something equivalent to borrow checker, I'd be inclined to poke at it more.
generic simd abstractions are of quite limited use. I'm not sure what's objectionable about the thing Rust has shipped (in nightly) for this, which is more or less the same as the stuff Zig has shipped for this (in a pre-1.0 compiler version).
The issue is that it's sitting in nightly for years. Many many many years.
I don't write software targetting nightly, for good reason.
SIMD intrinsics are unsafe because they are available only under some CPU features.
I don't read any moralizing in my previous comment. And it seems to mirror the relevant section in the book:
"People are fallible, and mistakes will happen, but by requiring these five unsafe operations to be inside blocks annotated with unsafe you’ll know that any errors related to memory safety must be within an unsafe block. Keep unsafe blocks small; you’ll be thankful later when you investigate memory bugs."
I hope the SIMD intrinsics make it to stable soon so folks can ditch unnecessary unsafes if that's the only issue.
This is not really true. You have to uphold those guarantees yourself. With unsafe preconditions, if you don't, the code will still crash loudly (which is better than undefined behaviour).
With unsafe you get exactly the same kind of semantics as C, if you don't uphold the invariant the unsafe functions expect, you end up with UB exactly like in C.
If you want a clean crash instead on indeterministic behavior, you need to use assert like in C, but it won't save you from compiler optimization removing checks that are deemed useless (again, exactly like in C).
> With unsafe you get exactly the same kind of semantics as C, if you don't uphold the invariant the unsafe functions expect, you end up with UB exactly like in C.
This is not exactly true. Even in production code, unsafe preconditions check if you violate these rules.
Here: https://doc.rust-lang.org/core/macro.assert_unsafe_precondit... And here: https://google.github.io/comprehensive-rust/unsafe-rust/unsa...
Quoted from your link
> Safe Rust: memory safe, no undefined behavior possible. Unsafe Rust: can trigger undefined behavior if preconditions are violated.
So Unsafe Rust from a UB perspective is no different than C/C++. If preconditions are violated, UB can occur, affecting anywhere in the program. Its unclear how the compiler could check anything about preconditions in a block explicitly used to say that the developer is the one upholding the preconditions.
The rust compiler was written by chuck norris.
> With unsafe you get exactly the same kind of semantics as C
People seem to disagree.
Unsafe Rust Is Harder Than C
https://chadaustin.me/2024/10/intrusive-linked-list-in-rust/
https://news.ycombinator.com/item?id=41944121
Using references in unsafe Rust is harder than using raw pointers in C.
Using raw pointers in unsafe Rust is easier than using raw pointers in C.
The solution is to not manipulate references in unsafe code. The problem is that in old versions of Rust this was tricky. Modern versions of Rust have addressed this by adding first-class facilities for producing pointers without needing temporary references: https://blog.rust-lang.org/2024/10/17/Rust-1.82.0.html#nativ...
> clearly marked by the unsafe block.
Rust has macros; are macros prohibited from generating unsafe blocks, so that macro invocations don't have to be suspected of harboring unsafe code?
No. Just like function bodies can contain unsafe blocks.
Isn't it the case that once you use unsafe even a single time, you lose all of Rust's nice guarantees? As far as I'm aware, inside the unsafe block you can do whatever you want which means all of the nice memory-safety properties of the language go away.
It's like letting a wet dog (who'd just been swimming in a nearby swamp) run loose inside your hermetically sealed cleanroom.
It seems like you've got it backwards. Even unsafe rust is still more strict than C. Here's what the book has to say (https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html)
"You can take five actions in unsafe Rust that you can’t in safe Rust, which we call unsafe superpowers. Those superpowers include the ability to:
It’s important to understand that unsafe doesn’t turn off the borrow checker or disable any other of Rust’s safety checks: if you use a reference in unsafe code, it will still be checked. The unsafe keyword only gives you access to these five features that are then not checked by the compiler for memory safety. You’ll still get some degree of safety inside of an unsafe block.In addition, unsafe does not mean the code inside the block is necessarily dangerous or that it will definitely have memory safety problems: the intent is that as the programmer, you’ll ensure the code inside an unsafe block will access memory in a valid way.
People are fallible, and mistakes will happen, but by requiring these five unsafe operations to be inside blocks annotated with unsafe you’ll know that any errors related to memory safety must be within an unsafe block. Keep unsafe blocks small; you’ll be thankful later when you investigate memory bugs."
This description is still misleading. The preconditions for the correctness of an unsafe block can very much depend on the correctness of the code outside and it is easy to find Rust bugs where exactly this was the cause. This is very similar where often C out of bounds accesses are caused by some logic error elsewhere. Also an unsafe block has to maintain all the invariants the safe Rust part needs to maintain correctness.
So, it's true that unsafe code can depend on preconditions that need to be upheld by safe code.
But using ordinary module encapsulation and private fields, you can scope the code that needs to uphold those preconditions to a particular module.
So the "trusted computing base" for the unsafe code can still be scoped and limited, allowing you to reduce the amount of code you need to audit and be particularly careful about for upholding safety guarantees.
Basically, when writing unsafe code, the actual unsafe operations are scoped to only the unsafe blocks, and they have preconditions that you need to scope to a particular module boundary to ensure that there's a limited amount of code that needs to be audited to ensure it upholds all of the safety invariants.
Ralf Jung has written a number of good papers and blog posts on this topic.
And you think one can not modularize C code and encapsulate critical buffer operations in much safer APIs? One can, the problem is that a lot of legacy C code was not written this way. Also lot of newly written C code is not written this way, but the reason is often that people cut corners when they need to get things done with limited time and resources. The same you will see with Rust.
There is no distinction between safe and unsafe code in C, so it's not possible to make that same distinction that you can in Rust.
And even if you try to provide some kind of safer abstraction, you're limited by the much more primitive type system, that can't distinguish between owned types, unique borrows, and shared borrows, nor can it distinguish thread safety properties.
So you're left to convention and documentation for that kind of information, but nothing checking that you're getting it right, making it easy to make mistakes. And even if you get it right at first, a refactor could change your invariants, and without a type system enforcing them, you never know until someone comes along with a fuzzer and figures out that they can pwn you
There is definitely a distinction between safe and unsafe code in C, it is just not a simple binary distinction. But this does not make it impossible to screen C for unsafe constructions and it also does not mean that detecting unsafe issues in Rust is always trivial.
Even innocent looking C code can be chock-full of UBs that can invalidate your "local reasoning" capabilities. So, not even close.
Care to share an example?
But this is also easy to protect against if you use the tools available to C programmers. It is part of the Rust hype that we would be completely helpless here, but this is far from the truth.
I assume you are hinting at 'int' is signed here? And, that signed overflow is UB in C? Real question: Ignoring what the ISO C language spec says, are there any modern hardware platforms (say: ARM64 and X86-64) that do not use two's complement to implement signed integers? I don't know any. As I understand, two's complement correctly supports overflow for signed arithmetic.
I might be old, but more than 10 years ago, hardly anyone talked about UB in C and C++ programming. In the last 10 years, it is all the rage, but seems to add very little to the conversation. For example, if you program C or C++ with the Win32 API, there are loads of weird UB-ish things that seem to work fine.
> Ignoring what the ISO C language spec says, are there any modern hardware platforms (say: ARM64 and X86-64) that do not use two's complement to implement signed integers?
This is not how compilers work. Optimization happens based on language semantics, not on what platforms do.
At least in recent C++ standards, integers are defined as two’s complement. As a practical matter what hardware like that may still exist doesn’t have a modern C++ compiler, rendering it a moot point.
UB in C is often found where different real hardware architectures had incompatible behavior. Rather than biasing the language for or against different architectures they left it to the compiler to figure out how to optimize for the cases where instruction behavior diverge. This is still true on current architectures e.g. shift overflow behavior which is why shift overflow is UB.
AI rewrote to avoid undefined behavior:
> long sum = (long)x + y;
There is no guarantee that sizeof(long) > sizeof(int), in fact the GNU libc documentation states that int and long have the same size on the majority of supported platforms.
https://www.gnu.org/software/libc/manual/html_node/Range-of-...
> return -1; // or any value that indicates an error/overflow
-1 is a perfectly valid average for various inputs. You could return the larger type to encode an error value that is not a valid output or just output the error and average in two distinct variables.
AI and C seem like a match made in hell.
> There is no guarantee that sizeof(long) > sizeof(int), in fact the GNU libc documentation states that int and long have the same size on the majority of supported platforms.
That used to be the case for 32-bit platforms, but most 64-bit platforms in which GNU libc runs use the LP64 model, which has 32-bit int and 64-bit long. That documentation seems to be a bit outdated.
(One notable 64-bit platform which uses 32-bit for both int and long is Microsoft Windows, but that's not one of the target platforms for GNU libc.)
I’m not convinced that solution is much better. It can be improved to x/2 + y/2 (which still gives the wrong answer if both inputs are odd).
We're about to see a huge uptick in bugs worldwide, aren't we?
I don't know why this answer was downvoted. It adds valuable information to this discussion. Yes, I know that someone already pointed out that sizeof(int) is not guaranteed on all platforms to be smaller than sizeof(long). Meh. Just change the type to long long, and it works well.
Copypasting a comment into an LLM, and then copypasting its response back is not a useful contribution to a discussion, especially without even checking to be sure it got the answer right. If I wanted to know what an LLM had to say, I can go ask it myself; I'm on HN because I want to know what people have to say.
It literally returns a valid output value as an error.
An error value is valid output in both cases.
The code is unarguably wrong.
average(INT_MAX,INTMAX) should return INT_MAX, but it will get that wrong and return -1.
average(0,-2) should not return a special error-code value, but this code will do just that, making -1 an ambiguous output value.
Even its comment is wrong. We can see from the signature of the function that there can be no value that indicates an error, as every possible value of int may be a legitimate output value.
It's possible to implement this function in a portable and standard way though, along the lines of [0].
[0] https://stackoverflow.com/a/61711253/ (Disclosure: this is my code.)
> Meh. Just change the type to long long, and it works well.
C libraries tend to support a lot of exotic platforms. zlib for example supports Unicos, where int, long int and long long int are all 64 bits large.
sorting floats with NaN ? almost anything involving threading and mutation where people either don't realise how important locks are, or don't realise their code has suddenly been threaded?
https://www.ioccc.org/years.html
You're a lot more limited more limited to the kinds of APIs you can safely encapsulate in C. For example, you can't safely encapsulate an interface that shares memory between the library and the caller in C. So you're forced into either:
- Exposing an unsafe API and relying on the caller to manually uphold invariants
- Doing things like defensive copying at a performance cost
In many cases Rust gives you the best of both worlds: sharing memory liberally while still having the compiler enforce correctness.
Rust is better at this yes, but the practical advantage is not necessarily that huge.
Which is just a convoluted way of saying that it is possible to write bugs in any language. Still, it's undeniable that some languages make a better job at helping you avoid certain bugs than others.
It's true, but I think it's only fair if you hold Rust to this analysis, other languages should too; the scrutiny you're implying you need in an unsafe Rust block needs to be applied to all C code, because all C code could depend on code anywhere else for its safety characteristics.
In practice (in both languages) you check what the actual unsafe code does (or "all" code in C's case), note code that depends on external actors for safety (it's not all C code, nor is it all unsafe Rust blocks), and check their callers (and callers callers, etc).
What is true is that there are more operations in C which can cause undefined behavior and those are more densely distributed over the C code, making it harder to screen for undefined behavior. This is true and Rust certainly has an advantage, but it not nearly as big of an advantage as the "Rust is safe" (please do not look at all the unsafe blocks we need to make it also fast!) and "all C is unsafe" story wants you to believe.
What Rust provides is a way to build safe abstractions over unsafe code.
Rust's type system (including ownership and borrowing, Sync/Send, etc), along with it's privacy features (allowing types to have private fields that can only be accessed by code in the module that defined them) allows you to create fully safe interfaces around code that uses unsafe; there is provably no combination of uses of the interface which lead to undefined behavior.
Now, yeah, it's possible to also use unsafe in Rust just for applying a local optimisation. And that has fewer benefits than a fully encapsulated safe interface, though is still easier to audit for potential UB than C.
So you're right that it's on a continuum, but the distinction between safe and unsafe code means you can more easily find the specific places where UB could occur, and the encapsulation and type system makes it possible to create safe abstractions over unsafe code.
You sound pretty biased, gotta tell you. That snark is not helping any argument you think you might be doing -- and you are not doing any; you are kind of just making fun of Rust, which is pretty boring and uninformative for any reader.
From my past experiences with Rust, the team never had to think about data race once, or mutable volatile globals. And we all there suffered from those decades ago with C and sometimes C++ as well.
You like those and don't want to migrate? More power to ya! But badmouthing Rust with what seem fairly uninformed comments is just low. Inform yourself first.
The places where undefined behaviour can occur are also limited in scope; you insist that that part isn't true, because operations outside those unsafe blocks can impact their safety.
That's only true at the same level of scrutiny as "all C operations can cause undefined behaviour, regardless of what they are", which I find similarly shallow.
Rust is plenty fast, in fact there are countless examples of safe rust that will trivially beat out C in performance due to no aliasing, enabling better vectorization among others. Let alone being simply a more expressive language and allowing writing better optimizations (e.g. small strings, vs the absolutely laughable c-strings that perform terribly, but also you can actually get away with sharing more stuff in memory vs doing defensive copies everywhere because it is safe to do so, etc)
And there is not many things we have statistics on in CS, but memory vulnerabilities being absolutely everywhere in unsafe languages, and Rust cleaning up the absolute majority of them even when only the new parts are written in Rust are some of the few we do know, based on actual, real life projects at Google/Microsoft among others.
A memory safe low-level language is as novel as it gets. Rust is absolutely not just hype, it actually delivers and you might want to get on with the times.
I take that you consider most major projects written in C to not be "good"?
Most major software projects are not good, no matter what language.
This is technically correct, but a bit pedantic.
Sure, you can technically just write your own vulnerability for your own program and inject it at an unsafe and see the whole world crumble... but the exact same is true for any form of FFI calls in any language. Is Java memory safe? Yeah, just because I can grab a random pointer and technically break anything I want won't change that.
The fact that a memory vulnerability error may either appear at no place at all OR at the couple hundred lines of code thorough the whole project is a night and day difference.
No. Correctness of code outside unsafe depends on correctness inside those blocks, not the other way around
But “Dereference a raw pointer”, in combination with the ability to create raw pointers pointing to arbitrary memory addresses (that, you can do even in safe rust) allows you to write arbitrary memory from unsafe rust.
So, in theory, unsafe rust opens the floodgates. In practice, though, you can use small fragments of unsafe code that programmers can fairly easily check to be safe.
Then, once you’ve convinced yourself that those fragments are safe, you can be assured that your whole program is safe (using ‘safe’ in the rust sense, of course)
So, there may be some small islands of unsafe code that require extra attention from the programmer, but that should be just a tiny fraction of all lines, and you should be able to verify those islands in isolation.
> allows you
This is where the rubber hits the road. Rust does not allow you to do this, in the sense that this is possibly undefined behavior. That "possibly" is why the compiler allows you to write this code, because by saying "unsafe", you are promising that this specific arbitrary address is legal for you to write to. But that doesn't mean that it's always legal to do so.
The compiler won't allow you to compile such code without the unsafe. The unsafe is *you* promising the compiler that *you* have checked to ensure that the address will always be legal. So that the compiler will allow you to compile the code.
Right, I'm saying "allow" has two different connotations, and only one of them, the one that you're talking about, applies.
I gotcha. I misread and misunderstood. Yes, we agree.
I believe the post you are replying to was referring to the fact that you could take actions in that unsafe block that would compromise the guarantees of rust; eg you could do something silly, leave the unsafe block, then hit an “impossible” condition later in the program.
A simple example might be modifying a const value deep down in some class, where it only becomes apparent later in the program’s execution. Hence their analogy of the wet dog in a clean room - whatever beliefs you have about the structure of memory in your entire program, and guaranteed by the compiler, could have been undone by a rogue unsafe.
Would someone with more experience be able to explain to me why can't these operations be "safe"? What is blocking rust from producing the same machine code in a "safe" way?
Rust's raw pointers are more-or-less equivalent to C pointers, with many of the same types of potential problems like dangling pointers or out-of-bounds access. Rust's references are the "safe" version of doing pointer operations; raw pointers exist so that you can express patterns that the borrow checker can't prove are sound.
Rust encourages using unsafe to "teach" the language new design patterns and data structures; and uses this heavily in its standard library. For example, the Vec type is a wrapper around a raw pointer, length, and capacity; and exposes a safe interface allowing you to create, manipulate, and access vectors with no risk of pointer math going wrong -- assuming the people who implemented the unsafe code inside of Vec didn't make a mistake, the external, safe interface is guaranteed to be sound no matter what external code does.
Think of unsafe not as "this code is unsafe", but as "I've proven this code to be safe, and the borrow checker can rely on it to prove the safety of the rest of my program."
Why does Vec need to have any unsafe code? If you respond "speed"... then I will scratch my chin.
I'm sure you already know this, but you can do exactly the same in C by using an opaque pointer to protect the data structure. Then you write a bunch of functions that operate on the opaque pointer. You can use assert() to protect against unreasonable inputs.Rust doesn't have compiler-magic support for anything like a vector. The language has syntax for fixed-sized arrays on the stack, and it supports references to variable-length slices; but it has no magic for constructing variable-length slices (e.g. C++'s `new[]` operator). In fact, the compiler doesn't really "know" about the heap at all.
Instead, all that functionality is written as Rust code in the standard library, such as Vec. This is what I mean by using unsafe code to "teach" the borrow checker: the language itself doesn't have any notion of growable arrays, so you use unsafe to define its semantics and interface, and now the borrow checker understands growable arrays. The alternative would be to make growable arrays some kind of compiler magic, but that's both harder to implement correctly and not generalizable.
> you can do exactly the same in C by using an opaque pointer to protect the data structure. Then you write a bunch of functions that operate on the opaque pointer. You can use assert() to protect against unreasonable inputs.
That's true and that's a great design pattern in C as well. But there are some crucial differences:
- Rust has no undefined behavior outside of unsafe blocks. This means you only need to audit unsafe blocks (and any invariants they assume) to be sure your program is UB-free. C does not have this property even if you code defensively at interface boundaries.
- In Rust, most of the invariants can be checked at compile time; the need for runtime asserts is less than in C.
- C provides no way to defend against dangling pointers without additional tooling & runtime overhead. For instance, if I write a dynamic vector and get a pointer to the element, there's no way to prevent me from using that pointer after I've freed the vector, or appended an element causing the container to get reallocated elsewhere.
Rust isn't some kind of silver bullet where you feed it C-like code and out comes memory safety. It's also not some kind of high-overhead garbage collected language where you have to write unsafe whenever you care about performance. Rather, Rust's philosophy is to allow you to define fundamental operations out of small encapsulated unsafe building blocks, and its magic is in being able to prove that the composition of these operations is safe, given the soundness of the individual components.
The stdlib provides enough of these building blocks for almost everything you need to do. Unsafe code in library/systems code is rare and used to teach the language of new patterns or data structures that can't be expressed solely in terms of the types exposed by the stdlib. Unsafe in application-level code is virtually never necessary.
Those specific functions are compiler builtin vector intrinsics. The main reason is that they can easily read past ends of arrays and have type safety and aliasing issues.
By the way, the rust compiler does generate such code because under the hood LLVM runs an autovectorizer when you turn on optimizations. However, for the autovectorizer to do a good job you have to write code in a very special way and you have no way of controlling whether or not it kicked in and once it did that it did a good job.
There’s work on creating safe abstractions (that also transparently scale to the appropriate vector instruction), but progress on that has felt slow to me personally and it’s not available outside nightly currently.
For example many autovectorizers get upset if you put control flow in your loop
often the unsafe code is at the edges of the type system. e.g. sometimes the proof of safety is that someone read the source code of the c library that you are calling out to. it's not useful to think of machine code as safe or unsafe. safety often refers to whether the types of your data match the lifetime dataflow.
The way I have heard it described that I think is a bit more succinct is "unsafe admits undefined behavior as though it was safe."
Claiming unsafe invalidates "all of the nice memory-safety properties" is like saying having windows in your house does away with all the structural integrity of your walls.
There's even unsafe usage in the standard library and it's used a lot in embedded libraries.
Where are you more likely get a burglar enter your home? Windows ... Where are you more likely to develop cracks in your walls? Windows ... Where are you more likely to develop leaks? Windows (especially roof windows!)...
Sorry but horrible comparison ;)
If you need to rely on unsafe in a memory-safe language for performance reasons, then there is a issue with the language compiler at that point, that needs to be fixed. Simple as that.
The whole memory-safety is the bread and butter of the language, the moment you start to bypass it for faster memory operations, you can start doing the same in any other language. I mean, your literally bypassing the main selling point of the language. \_00_/
So static typing is stupid because at the end of the line your program must interface with stream of untyped bits (i/o)?
Once you can internalize that you could unlock the power of encapsulation.
> If you need to rely on unsafe in a memory-safe language for performance reasons, then there is a issue with the language compiler at that point, that needs to be fixed. Simple as that.
It actually means "Rust needs to interface with many other systems that are not as stringent as it". Your interpretation has nothing to do with what's actually going on and I am surprised you misinterpreted the situation as hugely as you did.
...And even if everything was written in Rust, `unsafe` would still be needed because the lower you get [to the kernel] you get more and more non-determinism at places.
This "all or nothing" attitude is boring and tiring. We all wish things were super simple, black and white, and all-or-nothing. They are not.
What language is the JVM written in?
All safe code in existence running on von Neumann architectures is built on a foundation of unsafe code. The goal of all memory-safe languages is to provide safe abstractions on top of an unsafe core.
Depends on which JVM you are talking about, some are 100% Java, some are a mix of Java and C, others are a mix of Java and C++, in all cases a bit of Assembly as well.
I like your second paragraph. It is well written.
Depends on which JVM you are talking about, some are 100% Java, some are a mix of Java and C, others are a mix of Java and C++, in all cases a bit of Assembly as well.
You are right. I should have been more clear. I am talking about the bog standard one that most people use from Oracle/OpenJDK. A long time back it was called "HotSpot JVM". That one has source code available on GitHub. It is mostly C++ with a little bit of C and assembly.
Define mostly, https://github.com/openjdk/jdk
- Java 74.1%
- C++ 14.0%
- C 7.9%
- Assembly 2.7%
And those values have been increasing for Java with each OpenJDK release.
JDK≠JVM
If you are only talking about libjvm.so you would be right, then again that alone won't do much help for Java developers.
I don't think what something was written in should count. Baring bugs it should still be memory safe. But I believe JVM has ffi and as soon as you use ffi you risk messing up that memory safety.
Does it help to think of "safe Rust" as a language that's written in "unsafe Rust"? That's basically what it is.
If your unsafe code violates invariants it was supposed to uphold, that can wreck safety properties the compiler was trying to uphold elsewhere. If you can achieve something without unsafe you definitely should (safe, portable simd is available in rust nightly, but it isn't stable yet).
At the same time, unsafe doesn't just turn off all compiler checks, it just gives you tools to go around them, as well as tools that happen to go around them because of the way they work. Rust unsafe is this weird mix of being safer than pure C, but harder to grasp; with lots of nuanced invariants you have to uphold. If you want to ensure your code still has all the nice properties the compiler guarantees (which go way beyond memory safety) you would have to carefully examine every unsafe block. Which few people do, but you generally still end up with a better status quo than C/C++ where any code can in principle break properties other code was trying to uphold.
If you have 1 unsafe block, and you have a memory related crash/issue, where in your Rust code do you think the root cause is located?
This isn't a wet dog in a cleanroom. This is cleanroom complex that has a very small outhouse that is labeled as dangerous.
Jason Ordendorff's talk [1] was probably the first time I truly grokked the concept of unsafe in Rust. The core idea behind unsafe in Rust is not to provide an escape from the guarantees provided by rust. It's to isolate the places where you have no choice but to break the guarantees and rigorously code/test the boundaries there so that anything wrapping the unsafe code can still provide the guarantees.
[1]: https://www.youtube.com/watch?v=rTo2u13lVcQ
Rust isn't the only memory-safe language.
As soon as you start playing with FFI and raw pointers in Python, NodeJS, Julia, R, C#, etc you can easily loose the nice memory-safety properties of those languages - create undefined behavior, segfaults, etc. I'd say Rust is a lot nicer for checking unsafe correctness than other memory-safe languages, and also makes it easier to dip down to systems-level programming, yet it seems to get a lot of hate for these features.
Ada is even much more better at checking for correctness. It needs to be talked about more. "Safer than C" has been Ada, people did not know this before they jumped on the Rust bandwagon.
You only lose those guarantees if and only if the code within the unsafe block violates the rules of the Rust language.
Normally in safe code you can’t violate the language rules because the compiler enforces various rules. In unsafe mode, you can do several things the compiler would normally prevent you from doing (e.g. dereferencing a naked pointer). If you uphold all the preconditions of the language, safety is preserved.
What’s unfortunate is that the rules you are required to uphold can be more complex than you might anticipate if you’re trying to use unsafe to write C-like code. What’s fortunate is that you rarely need to do this in normal code and in SIMD which is what the snippet is representing there’s not much danger of violating the rules.
You lose the nice guarantees inside the `unsafe` block, but the point is to write a sound and safe interface over it, that is an API that cannot lead to UB no matter how other safe code calls it. This is basically the encapsulation concept, but for safety.
To continue the analogy of the dog, you let the dog get wet (=you use unsafe), but you put a cleaning room (=the sound and safe API) before your sealed room (=the safe code world)
> Isn't it the case that once you use unsafe even a single time, you lose all of Rust's nice guarantees
Inside that block, both yes and no. You have to enforce those nice guarantees yourself. Code that violates it will still crash.
It's more like letting a wet dog who you are watching closely quickly pass from your front door to the shower.
> Isn't it the case that once you use unsafe even a single time, you lose all of Rust's nice guarantees?
No, not even close. You only lose Rust's safety guarantees when your unsafe code causes Undefined Behavior. Unsafe code that can be made to cause UB from Safe Rust is typically called unsound, and unsafe code that cannot be made to cause UB from Safe Rust is called sound. As long as your unsafe code is sound, then it does not break any of Rust's guarantees.
For example, unsafe code can still use slices or references provided by Safe Rust, because those are always guaranteed to be valid, even in an unsafe block. However, if from inside that unsafe block you then go on to manufacture an invalid slice or reference using unsafe functions, that is UB and you lose Rust's safety guarantees because of the UB.
> unsafe even a single time, you lose all of Rust's nice guarantees
Not sure why would one resulted in all. One of Rust's advantages is the clear boundary between safe/unsafe.
Is there such a boundary? How do you know a function doesn't call unsafe code without looking at every function called in it, and every function those functions call, and so on?
The usual retort to these questions is 'well, the standard library uses unsafe code, so everything would need a disclaimer that it uses unsafe code, so that's a useless remark to make', but the basic issue still remains that the only clear boundary is whether a function 'contains' unsafe code, not whether a function 'calls' unsafe code.
If Rust did not have a mechanism to use external code then it would be fine because the only sources of unsafe code would be either the application itself or the standard library so you could just grep for 'unsafe' to find the boundaries.
> Is there such a boundary? How do you know a function doesn't call unsafe code without looking at every function called in it, and every function those functions call, and so on?
Yes, there is a boundary, and usually it's either the function itself, or all methods of an object. For instance, a function I wrote recently goes somewhat like this:
The read_unaligned function (https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html) has two preconditions which have to be checked manually. When doing so, you'll notice that the "src" argument must have at least 8 bytes for these preconditions to be met; the "assert_eq!()" call before that unsafe block ensures that (it will safely panic unless the "src" slice has exactly 8 bytes). That is, my "read_unaligned_u64_from_byte_slice" function is safe, even though it calls unsafe code; the function is the boundary between safe and unsafe code. No callers of that function have to worry that it calls unsafe code in its implementation.> How do you know a function doesn't call unsafe code without looking at every function called in it, and every function those functions call, and so on?
The point is that you don't need to. The guarantees compose.
> The usual retort to these questions is 'well, the standard library uses unsafe code
It's not about the standard library, it's much more fundamental than that: hardware is not memory safe to access.
> If Rust did not have a mechanism to use external code then it would be fine
This is what GC'd languages with runtimes do. And even they almost always include FFI, which lets you call into arbitrary code via the C ABI, allowing for unsafe things. Rust is a language intended to be used at the bottom of the stack, and so has more first-class support, calling it "unsafe" instead of FFI.
I wouldn’t go that far. Bevy for example, uses unsafe internally but is VERY strict about it, and every use of unsafe requires a comment explaining why the code is safe.
In other words, unsafe works if you use it carefully and keep it contained.
right, the point is raising awareness and assumption its not 100 and 0 problem
My understanding is that the user who writes an unsafe block in a safe function is responsible for making sure that it doesn't do anything wrong to mess up the safety and that the function isn't lying about exposing a safe interface. I think at one point before rust 1.0 there was even a suggestion to rename it trustme. Of course users can easily mess up but the point is to minimize the use of unsafe so its easier to check and create interfaces that can be used safely
Where did you even get that weird extreme take from?
O_o
Can’t rust do safe simd? This is just vectorised multiplication and xor, but it gets labelled as unsafe. I imagine most code that wants to be fast would use simd to some extent.
It's still nightly-only.
Is this a sloppy codebase? I browsed through a few random files, and easily 90% of functions are marked unsafe.
The idea is that you can trivially search the code base for "unsafe" and closely examine all unsafe code, and unless you are doing really low-level stuff there should not be much of it. Higher level code bases should ideally have none.
It tends to be found in drivers, kernels, vector code, and low-level implementations of data structures and allocators and similar things. Not typical application code.
As a general rule it should be avoided unless there's a good reason to do it. But it's there for a reason. It's almost impossible to create a systems language that imposes any kind of rules (like ownership etc.) that covers all possible cases and all possible optimization patterns on all hardware.
To the extent that it's even possible to write bare metal microcontroller firmware in Rust without unsafe, as the embedded hal ecosystem wraps unsafe hardware interfaces in a modular fairly universal safe API.
My understanding from Aria Beingessner's and some other writings is that unsafe{} rust is significantly harder to get right in "non-trivial cases" than C, because the semantics are more complex and less specified.
This is definitely true right now, but I don't think it will always be the case.
Unsafe Rust is currently extremely underspecified and underdocumented, but it's designed to be far more specifiable than C. For example: aliasing rules. When and how you're allowed to alias references in unsafe code is not at all documented and under much active discussion; whereas in C pointer aliasing rules are well defined but also completely insane (casting pointers to a different type in order to reinterpret the bytes of an object is often UB even in completely innocuous cases).
Once Rust's memory model is fully specified and written down, unsafe Rust is trying to go for something much simpler, more teachable, and with less footguns than C.
Huge props to Ralf Jung and the opsem team who are working on answering these questions & creating a formal specification: https://github.com/rust-lang/unsafe-code-guidelines/issues
It's hard to compare. Rust has stricter requirements than C, but looser requirements don't mean easier: ever bit shifted by a variable amount? Hope you never relied on shifting "entirely" out of a variable zeroing it.
It is also an idea that traces back to the 1960's system languages, that apparently was unknown at Bell Labs.
Clearly marking unsafe code is no good for safety, if you have many marked areas.
Some codebases, you can grep for "unsafe", find no results, and conclude the codebase is safe... if you trust its dependencies.
This is not one of those codebases. This one uses unsafe liberally, which tells you it's about as safe as C.
"unsafe behaviour is clearly marked" seems to be a thought-stopping cliche in the Rust world. What's the point of marking them, if you still have them? If every pointer dereference in C code had to be marked unsafe (or "please" like in Intercal), that wouldn't make C any better.
While everything you say is true, your reply (and most of its siblings!) entirely misses GP's point.
All languages at some point interface with syscalls or low level assembly that can be done wrong, but one of Rust's selling points is a safe wrapping of low-level interactions. Like safe heap allocation/deallocation with `Box`, or swapping with `swap`, etc. Except... here.
Why does a library like zlib need to go beyond Rust's safe offerings? Why doesn't rust provide safe versions of the constructs zlib needs?
> Presumably with inline assembly both languages can emit what is effectively the same machine code. Is the Rust compiler a better optimizing compiler than C compilers?
rustc uses LLVM just as clang does, so to a first approximation they're the same. For any given LLVM IR you can mostly write equivalent Rust and C++ that causes the respective compiler to emit it (the switch fallthrough thing mentioned in the article is interesting though!) So if you're talking about what's possible (as opposed to what's idiomatic), the question of "which language is faster" isn't very interesting.
Rust's borrow checker still checks within unsafe blocks, so unless you are only operating with raw pointers (and not accessing certain references as raw pointers in some small, well-defined blocks) across the whole program it will be significantly more safe than C. Especially given all the other language benefits, like a proper type system that can encode a bunch of invariants, no footguns at every line/initialization/cast, etc.
Yes. I think it’s easy to underestimate how much the richer language and library ecosystem chip away at the attack surface area. So many past vulnerabilities have been in code which isn’t dealing with low-level interfaces or weird performance optimizations and wouldn’t need to use unsafe. There’ve been so many vulnerabilities in crypto code which weren’t the encryption or hashing algorithms but things like x509/ASN parsing, logging, or the kind of option/error handling logic a Rust programmer would use the type system to validate.
Yeah, this article about a rust "win" perfectly illustrates why I distrust all good news about it.
Rust zlib is faster than zlib-ng, but the latter isn't a particularly fast C contender. Chrome ships a faster C zlib library which Rust could not beat.
Rust beat C by using pre-optimized code paths and then C function pointers inside unsafe. Plus C SIMD inside unsafe.
I'd summarize the article as: generous chunks of C embedded into unsafe blocks help Rust to be almost as fast as Chrome's C Zlib.
Yay! Rust sure showed it's superiority here!!!!1!1111
Did you even read the article? They compare specifically against the Chrome zlib library and beat it at 10 out of 13 chunk sizes considered.
> I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library.
Which is exactly the point, other languages have unsafe implicitly sprinkled in every single line.
Rust tries to bound and explicitly delimit where unsafe code is to makes review and verification efforts precise.
Others have already addressed the "unsafe" smell.
I think the bigger point here is that doing SIMD in Rust is still painful.
There are efforts like portable-simd [1] to make this better, but in practice, many people are dropping down to low-level SIMD intrinsics and/or inline assembly, which are no better than their C equivalents.
[1]: https://github.com/rust-lang/portable-simd
The purpose of `unsafe` is for the compiler to assume a block of code is correct. SIMD intrinsics are marked as unsafe because they take raw pointers as arguments.
In safe Rust (the default), memory access is validated by the borrow checker and type system. Rust’s goal of soundness means safe Rust should never cause out-of-bounds access, use-after-free, etc; if it does, then there's a bug in the Rust compiler.
How do we know if Rust is safe unless Rust is written purely in safe Rust?
Is that not true? Even validators have bugs or miss things no?
And while we're in the hypothetical extreme world somewhat separated from reality, a series of solar flares could flip a memory bit and all the error-correction bits in my ECC ram at once to change a pointer in memory, causing my safe rust to do an out of bounds write.
Until we design perfectly correct computer hardware, processors, and a sun which doesn't produce solar radiation, we can't rely on totally uniform correct execution of our code, so we should give up.
The reality is that while we can't prove the rust compiler is safe, we can keep using it and diligently fix any counter-examples, and that's good enough in practice. Over in the real world, where we can acknowledge "yes, it is impossible to prove the absence of all bugs" and simultaneously say "but things sure seem to be working great, so we can get on with life and fix em if/when they pop up".
I’m simply positing how do we know the safety guarantees hold, not a hypothetical extreme. Not really sure where the extreme comes in.
If you take Rust at face value, than this to me seems like an obvious question to ask
Sorry, it's just that I have an allergic reaction to what sounds like people trying to make debate-bro arguments.
Like, when I say "use signal, it's secure", someone could respond "Ahh, but technically you can't prove the absence of bugs, signal could have serious bugs, so it's not secure, you fool", but like everyone reading this already knew "it's secure" means "based on current evidence and my opinion it seems likely to be more secure than alternatives", and it got shortened. Interpreting things as absolutes that are true or false is pointless debate-bro junk which lets you create strawmen out of normal human speech.
When someone says "1+1 = 2", and a debate-bro responds "ahh but in base-2 it's 10 you fool", it's just useless internet noise. Sure, it's correct, but it's irrelevant, everyone already knows it, the original comment didn't mean otherwise.
Responding to "safe Rust should never cause out-of-bounds access, use-after-free" with "ahh but we can't prove the compiler is safe, so rust isn't safe is it??" is a similarly sorta response. Everyone already knows it. It's self-evident. It adds nothing. It sounds like debate-bro "I want to argue with you so I'm saying something that's true, but we both already know and doesn't actually matter".
I think that allergic response came out, apologies if it was misguided in this case and you're not being a debate-bro.
I don't think we can go beyond the 'human limitations' if you will, of any software.
Bugs happen, they're bound to. Its more, what is enforcing the Rust language guarantees and how do we know its enforcing them with reasonably high accuracy one can ascertain?
I feel that it can only happen as Rust itself becomes (or perhaps it meaningfully already is) written in pure 100% safe Rust itself. At which point, I believe the matter will be largely settled.
Until then, I don't think its unreasonable for someone to ask about how it verifies its assertions is all.
There is no possible way for something to be written in 100% memory safe code, no matter what the language, if you include "no unsafe code anywhere in the call stack." Interacting with the hardware is not memory safe. Any useful program must on some level involve unsafety. This is true for every programming language.
I wasn't asking for 100%, I am asking for a reasonable proof of assertions.
You may like my next blog post.
I think whenever someone takes the time to walk their audience through the nuances of this question its a big win.
No different than how I asked of the Go community how it could produce binaries on any platform for all major platforms it supports (IE, you don't have to compile your Go code on Linux for it to work on Linux, only have to set a flag, with the exception If I recall correctly of CGO dependencies but thats a wild horse anyway)
But you have to admit Rust zealots are misguided, too, who does not happen to know or realize the obviousness of what you just said with regarding to Rust.
Such a rust zealot is a strawman, though please don't let me stop you from enjoying burning such a strawman.
How is it a strawman? Many people have misconceptions with regarding to Rust, while not even knowing about the existence of Ada/SPARK to begin with. They blindly spout "Rust is saFeEe!44!". If you are not a zealot, then it is not applied to you.
I see about 1000x more anti-rust-zealot strawman arguments than rust zealots on this site. Can you give some examples of the misguided rust zealotry you’re talking about?
I deleted my initial response, but FWIW you do not have to go far, take a look at the title of this submission.
> Even validators have bugs
Yep! For example, https://github.com/Speykious/cve-rs is an example of a bug in the Rust compiler, which allows something that it shouldn't. It's on its way to being fixed.
> or miss things no?
This is the trickier part! Yes, even proofs have axioms, that is, things that are accepted without proof, that the rest of the proof is built on top of. If an axiom is incorrect, so is the proof, even though we've proven it.
Out of curiosity, why do they take raw pointers as arguments, rather than references?
From the RFC: https://rust-lang.github.io/rfcs/2325-stable-simd.html
> The standard library will not deviate in naming or type signature of any intrinsic defined by an architecture.
I think this makes sense, just like any other intrinsic: unsafe to use directly, but with safe wrappers.
I believe that there are also some SIMD things that would have to inherently take raw pointers, as they work on pointers that aren't aligned, and/or otherwise not valid for references. In theory you could make only those take raw pointers, but I think the blanket policy of "follow upstream" is more important.
Which also doesn't preclude someone else writing an abstraction on top that provides an API using references.
Absolutely, that's important too, thanks.
To be fair, there's a safe portable SIMD abstraction brewing in `std::simd` but it's not stable yet. SIMD is just a terrible mess of platform differences in general and making a SIMD-using program safe means ensuring the availability of every single intrinsic used, lest the program is unsound. Of course that's not what C or C++ programs typically do, but in that world unsoundness is the norm anyway.
I thought that the point of Rust is to have safe {} blocks (implicit) as a default and unsafe {} when you need the absolute maximum performance available. You can audit those few lines of unsafe code very easily. With C everything is unsafe and you can just forget to call free() or call it twice and you are done.
> unsafe {} when you need the absolute maximum performance available.
Unsafe code is not inherently faster than safe code, though sometimes, it is. Unsafe is for when you want to do something that is legal, but the compiler cannot understand that it is legal.
True, however I only saw this happens to achieve max perf. I have very limited experience so this is confirmation bias from my end.
An example of unsafe not for performance is when interacting with hardware directly.
It’s not about performance, it’s about undefined behavior.
The usual answer is: You only need to verify the unsafe blocks, not every block. Though 'unsafe' in Rust is actually even less safe than regular C, if a bit more predictable, so there's a crossover point where you really shouldn't have bothered.
The Rust compiler is indeed better than the C one, largely because of having more information and doing full-program optimisation. A `vec_foo = vec_foo.into_iter().map(...).collect::Vec<foo>`, for example, isn't going to do any bounds checks or allocate.
I have been told that "unsafe" affects code outside of that block, but hopefully steveklabnik may explain it better (again).
> isn't going to do any bounds checks or allocate.
You need to add explicit bounds check or explicitly allocate in C though. It is not there if you do not add it yourself.
Buggy unsafe blocks can affect code anywhere (through Undefined Behavior, or breaking the API contract).
However, if you verify that the unsafe blocks are correct, and the safe API wrapping them rejects invalid inputs, then they won't be able to cause unsafety anywhere.
This does reduce how much code you need to review for memory safety issues. Once it's encapsulated in a safe API, the compiler ensures it can't be broken.
This encapsulation also prevents combinatorial explosion of complexity when multiple (unsafe) libraries interact.
I can take zlib-rs, and some multi-threaded job executor (also unsafe internally), but I don't need to specifically check how these two interact. zlib-rs needs to ensure they use slices and lifetimes correctly, the threading library needs to ensure it uses correct lifetimes and type bounds, and then the compiler will check all interactions between these two libraries for me. That's like (M+N) complexity to deal with instead of (M*N).
> I have been told that "unsafe" affects code outside of that block, but hopefully stevelabnik may explain it better (again).
It's due to a couple of different things interacting with each other: unsafe relies on invariants that safe code must also uphold, and that the privacy boundary in Rust is the module.
Before we get into the unsafe stuff, I want you to consider an example. Is this Rust code okay?
No unsafe shenanigans here. This code is perfectly safe, if a bit useless.Let's talk about unsafe. The canonical example of unsafe code being affected outside of unsafe itself is the implementation of Vec<T>. Vecs look something like this (the real code is different for reasons that don't really matter in this context):
The pointer is to a bunch of Ts in a row, the length is the current number of Ts that are valid, and the capacity is the total number of Ts. The length and the capacity are different so that memory allocation is amortized; the capacity is always greater than or equal to the length.That property is very important! If the length is greater than the capacity, when we try and index into the Vec, we'd be accessing random memory.
So now, this function, which is the same as Foo::set_bar, is no longer okay:
This is because the unsafe code inside of other methods of Vec<T> need to be able to rely on the fact that len <= capacity. And so you'll find that Vec<T>::set_len in Rust is marked as unsafe, even though it doesn't contain unsafe code. It still requires judicious use of to not introduce memory unsafety.And this is why the module being the privacy boundary matters: the only way to set len directly in safe Rust code is code within the same privacy boundary as the Vec<T> itself. And so, that's the same module, or its children.
> You need to add explicit bounds check or explicitly allocate in C though. It is not there if you do not add it yourself.
Yes — in C you can skip the bounds-checks and allocation, because you can convince yourself they aren't needed; the problem is you may be wrong, either immediately or after later refactoring.
In other memory-safe languages you don't risk the buffer overrun, but it's likely you'll get the bounds checks and allocation, and you have the overhead of GC.
Rust is close to alone in doing both.
> I have been told that "unsafe" affects code outside of that block, but hopefully stevelabnik may explain it better (again).
Poorly-written unsafe code can have effects extending out into safe code. But correctly-written unsafe code does not have any effects on safe code w.r.t. memory safety. So to ensure memory safety, you just have to verify the correctness of the unsafe code (and any helper functions, etc., it depends on), rather than the entire codebase.
Also, some forms of unsafe code are far less dangeous than others in practice. E.g., most of the SIMD functions are practically safe to call in every situation, but they all have 'unsafe' slapped on them due to being intrinsics.
> You need to add explicit bounds check or explicitly allocate in C though. It is not there if you do not add it yourself.
Unfortunately, you do need to allocate a new buffer in C if you change the type of the elements. The annoying side of strict aliasing is that every buffer has a single type that's set in stone for all time. (Unless you preemptively use unions for everything.)
C has type-changing stores. If you store to a buffer with a new type, it has the new type. Clang does not implement this correctly though, but GCC does.
Won't the final result allocate?
It won't allocate in this case because it's still a vec of foo at the end, so we know it has enough space. If it were a different type, it may or may not allocate, depending on if it had enough capacity.
> I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library.
This is such a widespread misunderstanding… one of the points of rust (there are many other advantages that have nothing to do with safety, but let’s ignore those for now) is that you can build safe interfaces, possibly on top of unsafe code. It’s not that all code is magically safe all the time.
To quote the Rust book (https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html):
Since you say you already know that much Rust, you can be that programmer!I feel like C programmers had the same idea, and well, we see how that works out in practice.
No, C lacks encapsulation of unsafe code. This is very important. Encapsulation is the only way to scale local reasoning into global correctness.
Hard disagree - if you violate the invariants in Rust unsafe code, you can cause global problems with local code. You can cause use-after-free, and other borrow checker violations, with incorrect unsafe code. Nothing will flag it, you will have no idea which unsafe code block is causing the isue, debugging will be hard.
I have no idea what your definition of encapsulation is, but mine is not this.
It's really only encapsulated in the sense that if you have a finite and small set of unsafe blocks, you can audit them easier and be pretty sure that your memory safety bugs are in there. This reality really doesn't exist much anymore because of how much unsafe is often ued, and since you you have to audit all of them, whether they come from a library or not, it's not as useful to claim encapsulation as one thinks.
I do agree in theory that unsafe encapsulation was supposed to be a thing, but i think it's crazy at this point to not admit that unsafe blocks turned out to easily have much more global effects than people expected, in many more cases, and are used more readily than expected.
Saying "scaling reasoning" also implies someone reasoned about it, or can reason about it.
But the practical problem is the same in both cases - someone got the reasoning wrong and nothing flagged it.
Wanna go search github for how many super popular libraries using unsafe had global correctness issues due to local unsafe blocks that a human reasoned incorrectly about, but something like miri found? Most of that unsafety that turned out to be buggy also was done for (unnecessary) performance reasons.
What you are saying is just something people tell themselves to make them feel okay about using unsafe all over the place.
If you want global correctness, something has to verify it, ideally not-human.
In the end, the thing C lacks is tools like miri that can be used practically with low false-positives, not "encapsulation" of unsafe code, which is trivially easy to perform in C.
Let's not kid ourselves here and end up building an ecosystem that is just as bad as the C one, but our egos refuse to allow us to admit it. We should instead admit our problems and try to improve.
Unsafe also has legitimate use cases in rust, for sure - but most unsafe code i look at does not need to exist, and is not better than unsafe C.
I'll give you an example: There are entire popular embedded bluetooth stacks in rust using unsafe global mutable variables and raw pointers and ..., across threads, for everything.
This is not better than the C equivalent - in fact it's worse, because users think it is safe and it's very not.
At least nobody thinks the C version is safe. It will often therefore be shoved in a binary that is highly sandboxed/restricted/etc.
It would be one thing if this was in the process of being ported/translated from C. But it's not.
Using intrinsics that require alignment and the API was still being worked on - probably a reasonable use of unsafe (though still easy to cause global problems like buffer overflows if you screwed up the alignment)
The bluetooth example - unreasonable.
The encapsulation referred to here is that you can expose a safe API that is impossible to misuse in a way that leads to undefined behavior. That's the succinct way of putting it anyway.
The `memchr` crate, for example, has an entirely safe API. Nobody needs to use `unsafe` to use any part of it. But its internals have `unsafe` littered everywhere. Could the crate have bugs that result in UB due to a particular use of the `memchr` API? Yes! Doesn't that violate encapsulation? No! A bug inside an encapsulated boundary does not violate the very idea of encapsulation itself.
Encapsulation is about blame. It means that if `memchr` exposes a safe API, and if you use `memchr` and you get UB as a result of some `unsafe` code inside of `memchr`, then that means the problem is inside of `memchr`. The problem is definitively not with the caller using the library. That is, they aren't "holding it wrong."
I'm surprised that someone with as much experience as you is missing this nuance. How many times have you run into a C library API that has UB, you report the bug and the maintainer says, "sorry bro, but you're holding that shit wrong, your fault." In Rust, the only way that ought (very specifically using ought and not is) to be true is if the API is tagged with `unsafe`.
Now, there are all sorts of caveats that don't change the overall point. "totally safe transmute" being an obvious demonstration of one of them[1] by fiddling with `/proc/self/mem`. And of course, Rust does have soundness bugs. But neither of these things change the fundamental idea of encapsulation.
And yes, one obvious shortcoming of this approach is that... well... people don't have to follow it! People can lie! I can expose a safe API, you can get UB and I can reject blame and say, "well you're holding it wrong." And thus, we're mostly back into how languages like C deal with these sorts of things. And that is indeed a bummer. And there are for sure examples of that in the ecosystem. But the glaring thing you've left out of your analysis is all of the crates that don't lie and specifically set out to provide a sound API.
The great thing about progress is that we don't have to perfect. I'm really disappointed that you seem to be missing the forest for the trees here.
[1]: https://github.com/ben0x539/totally-safe-transmute/blob/main...
"The encapsulation referred to here is that you can expose a safe API that is impossible to misuse in a way that leads to undefined behavior. That's the succinct way of putting it anyway."
Well, no, actually. At least, not in an (IMHO) useful way.
I can break your safe API by getting the constraints wrong on unsafe code inside that API.
Also, unsafe usage elsewhere is not local. I can break your impossible to misuse API through an unsafe API that someone else used elsewhere, completely outside my control, and then wrapped in a safe API. Some of these are of course, bugs in rust/compiler, etc. I'm just offering i've yet to hear the view taken that the ability to do this is always a bug in the language/compiler, and will be destroyed on sight.
Beyond that:
To the degree this is useful encapsulation for tracking things down, it is only useful when the amount is small and you can reason about it.
This is simply no longer true in any reasonably sized rust app.
As a result, as you say, it is then only useful for saying who is at fault in the sense of whether i'm holding it wrong. To me, that is basically worthless at scale.
"I'm surprised that someone with as much experience as you is missing this nuance."
I don't miss it - I just don't think it's as useful as claimed.
This level of "encapsulation", which provides no real guarantee except "the set of bugs is caused somewhere by the set of unsafe blocks" is fairly unhelpful at large scale.
I have audited hundreds of thousands of lines of rust code to find bugs caused by unsafe usage. The thing that made it at all tractable was not this form of encapsulation - it was in fact, 100% worthless in doing that at scale because it was till tons and tons and tons of code to try to reason about, across lots of libraries and dependencies. As you say, it only helps provide blame once found, and blame is not that useful at scale. It does not make the code safer. It does not make it easier to track down. It only declares, that after i've spent all the time, that it is not my fault. But also nobody has to do anything anyway.
For small programs, this buys you something, as i said, as long as the set of unsafe blocks is small enough to be tractable to audit, cool. You can find bugs easier. In that sense, the tons of hobby programs, small libraries, etc, are a lot less likely to have bugs when written in rust (modulo their dependencies on unsafe code).
But like, your position seems to be that it is fairly useful that i can go to a library and tell them "your crap is broken", and be right about it. To me, this does not buy a lot in the kinds of large complex systems rust hopes to replace in C/C++. (it also might be false)
In actually tracking down the bug, which is what i care about, the thing that was useful is that i could run miri and lots of other things on it and get useful results that pointed me towards the most likely causes of issues..
So don't get me wrong - this is overall better than C, but writing lots of rust (i haven't written C/C++ at all in a while, actually) I still tire of the constant claims of the amount of rust safety. You are the rare rust person who understand the nuance and is willing to admit there is any flaw or non-perfection whatsoever.
A you say, there are lots of things that ought to be true in rust that are not. You have a good understanding of this nuance, and where it fails.
But it is you, i believe, who is missing the forest for the trees, because most do not have this.
I'll be concrete and i guess controversial in a way you are 100% free to disagree with, but might as well throw a stake in the ground - it's hacker news, might as well have fun making a comment someone can beat me over the head with later: If nothing changes, and the rust ecosystem grows by a factor of 100x while changing nothing about how it behaves WRT to unsafe usage, and no tooling gets significantly better, Rust will not end up better than C in practice. I don't mean - it will not have less bugs/vulnerabilities - i think it would by far!
But whether you have 100 billion of them, or 1 billion of them, and thus made a 100x improvement, i don't think matters too much when it's still a billion :)
Meanwhile, if the rust ecosystem got worse about unsafe, but made tools like Miri 50x faster (and made more tools like it that help verification in practice), it will not end up better than C.
To me - it is the tooling, and not this sort of encapsulation, that will make a practical difference or not at scale.
The idea that you will convince people not to write broken unsafe code, in ways that breaks safe APIs, or that the ability to assign blame matters, is very strange to me, and is no better than C. As systems grow, the likelihood of totally safe transmutes growing in them is basically 100% :)
FWIW - I also agree you don't have to be perfect, nor do I fault rust for not being perfect. Instead, i simply disagree that at scale, this sort of ability to place blame is useful. To me, it's the ability to find the bugs quickly and as automated as possible that is useful.
I need to find the totally safe transmutes causing issues in my system, not hand it to someone else after determining it couldn't be my fault.
> I can break your safe API by getting the constraints wrong on unsafe code inside that API.
This doesn't make any sense at all as a broader point. Of course you can break the safe API by introducing a bug inside the implementation! I honestly just cannot figure out how you have a misunderstanding of this magnitude, and I'm forced to conclude that we are mis-communicating at some level.
I did read the rest of your comment, and the most significant point I can take away from it is that you're making a claim about scale. I think the dissonance introduced with comments like the one above makes it very hard for me to trust your experience here and the conclusions you've drawn from it. But I will note that whether Rust's safety story scales is from my perspective a different thing entirely from the factual claim that Rust enables safe encapsulation of `unsafe` usage.
You may say that just because Rust enables safe encapsulation doesn't mean programmers using Rust actually follow through with that in practice. And yes, absolutely, it doesn't. You can't derive an is from an ought. But in my experience, it totally does. I do work on lots of "hobby" stuff in Rust (although I try to treat it professionally, I just mean that I am not directly paid for it beyond donations), but I am also paid to write Rust too. I do not have your experience with Rust at scale, so I cannot refute it. But you've said enough questionable things here that I can't trust it either.
Are you writing lots of FFI and/or embedded code? Those are the main places I see unsafe being used a lot.
The tooling and the encapsulation go hand in hand.
> The idea that you will convince people not to write broken unsafe code, in ways that breaks safe APIs, or that the ability to assign blame matters, is very strange to me, and is no better than C. As systems grow, the likelihood of totally safe transmutes growing in them is basically 100% :)
To be honest this doesn't track with my experience at all. Unsafe just isn't that commonly used in projects I contribute to. When it is, it is aggressively encapsulated.
> It's really only encapsulated in the sense that if you have a finite and small set of unsafe blocks, you can audit them easier and be pretty sure that your memory safety bugs are in there. This reality really doesn't exist much anymore because of how much unsafe is often ued, and since you you have to audit all of them, whether they come from a library or not, it's not as useful to claim encapsulation as one thinks.
Is it? I've written hundreds of thousands of lines of production Rust, and I've only sparingly used unsafe. It's more common in some domains than others, but the observed trend I've seen is for people to aggressively encapsulate unsafe code.
Unsafe Rust is quite difficult to write correctly. (The &mut provenance rules are a bit scary!) But once a safe abstraction has been built around it and the unsafe code has passed Miri, in practice I've seen people be able to not worry about it any more.
By the way I maintain cargo-nextest, and we've added support for Miri to make its runs many times faster [1]. So I'm doing my part here!
[1] https://nexte.st/docs/integrations/miri/
> and we've added support for Miri to make its runs many times faster
Whoa. This might be the kick in the ass I needed to give cargo-nextest a whirl in my projects. Miri being slow is the single biggest annoyance I have with it!
Would love to hear how it goes! Miri is generally single-threaded, but because nextest is process-per-test, each test gets a completely separate Miri context. A few projects have switched their Miri runs over to nextest and are seeing dramatic improvements in CI times, e.g. [1].
[1] https://bsky.app/profile/lukaswirth.bsky.social/post/3lkg2sl...
Eh. Good C programmers know what's safe and what's not. Often comments call out sketchy stuff. Just because it's not a language keyword, doesnt mean it's not called out.
Bad C programmers though? Their stuff is more dangerous and they don't know when and don't call it out and should probably stick to Rust.
No, it's been proven over and over that simply knowing invariants is not enough, in long-term projects built by large teams where team members change over time. Even the most experienced C developers are going to fail every so often. You need tooling that automates those invariants, and you need that tooling to fail closed.
I take a hard line on this stuff because we can either keep repeating the fundamental mistake of believing things like "willpower" to write correct code are real, or we can move on and adopt better tooling.
And where can I find this mythical "Good C programmer"?
True! Only, Good C programmers don’t exist.
Dunno why this is being downvoted, obviously no true Scotsman would ever use memory after freeing it.
C's safe subset is so small as to be basically useless, and especially it's impossible to encapsulate behavior into a safe interface, in fact it's fairly easy in C to make an interface which is impossible to use correctly (gets() and the like).
the problem in those cases is that C can’t help but be unsafe always.
People can write memory safe code, just not 100% of the time.
Awesome find. This really means:
Assembly language faster than C. And faster than Rust. Assembly can be very fast.
I wonder why writing SIMD in high-level languages hasn't been figured out yet for CPUs (it has been the norm for GPUs for since forever). Auto-vectorization universally sucks, so do OpenMP directives.
There was Ispc, which was a separate C-like programming language just for SIMD, but I don't understand why can't regular compilers generated high-quality vectorized code.
Why do you say that? I would say SIMD is pretty well figured out in well-written code, e.g. small, tight loops over vectors. Unrolling and vectorizing a loop is not that hard and happens constantly on all our phones for signal processing, for example.
.NET (C#) is getting there with Vector<T>.
That's just syntactic sugar (and a bit of architecture independence) over intrinsics. You can get the same in C++ just with wrapping intrinsics in classes, and a few ifdefs.
The key difference is that there are invariants you can rely on as a user of the library, and they'll be enforced by the compiler outside the unsafe blocks. The corresponding C invariants mostly aren't enforced by the compiler. Worse, many C programmers will actively argue that some amount of undefined behavior is "fine".
Rust code emitter is Clang, the same one that Apple uses for C on their platforms. I wouldn't expect any miracles there, as Rust authors have zero influence over it. If any compiler is using any secret Clang magic, that would be Swift or Objective-C, since they are developed by Apple.
You’re conflating clang and LLVM.
Yes, you are right, should be 'code emitter is LLVM, the same that Clang uses for C'
You can choose unsafe rust which has many more optimizations and is much faster than safe rust. Both are legitimate dialects of the language. Should you not feel confident with a library that is too “unsafe” you can use another crate. The rust ecosystem is quite big by now.
Personally I would still use unsafe safe rust than raw C which has more edge cases. Also when I’m not on the critical path I can always use safe rust.
> Kidding aside, I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library. At what point does it really stop mattering if this is C or Rust?
Kidding aside the 150-comment Unsafe Rust subthread was inevitable.
It goes both ways, many C folks call files full of inline Assembly and compiler specific extensions, C.
> At what point does it really stop mattering if this is C or Rust?
If I read TFA correctly, they came up with a library that is API compatible with the C one, but they've measured to be faster.
At that point I think in addition to safety benefits in other parts of the library (apart from unsafe micro optimizations as quoted), what they're leveraging is better compiler technology. Intuitively, I start to assume that the rust compiler can perhaps get away with more optimizations that might not be safe to assume in C.
> I thought the purpose of Rust was for safety but the keyword unsafe is sprinkled liberally throughout this library.
What wrong with that?
oddly enough that's not the most optimal version of crc32, e.g. it's not an avx512 variant.
There are certain optimizations you can only make with unsafe, because the borrow checker is smart, but not all-knowing. There have been countless discussions how unsafe isn't the ideal name. It should be more like in the meaning of trust the programmer that they checked this manually.
That being said, most rust programs don't ever need to use unsafe directly. If you go very low level or tune for prrformance it might become useful however.
Or if you're lazy and just want to stop the borrow checker from saving your ass.
Looks like as of 2 weeks ago the unsafe block should no longer be required: https://github.com/rust-lang/stdarch/pull/1714
..at least outside of loads/stores. From a bit of looking at the code though it seems like a good amount of those should be doable in a safe way with some abstractions.
You can use 'unsafe' blocks to delineate places on the hot path where you need to take the limiters off, then trust that the rest of the code will be safe. In C, all your code is unsafe.
We will see more and more Rust libraries trounce their C counterparts in speed, because Rust is more fun to work in because of the above. Rust has democratized high-speed and concurrent systems programming. Projects in it will attract a larger, more diverse developer base -- developers who would be loath to touch a C code base for (very justified) fear of breaking something.
> At what point does it really stop mattering if this is C or Rust?
That depends. If, for you, safety is something relative and imperfect rather than absolute, guaranteed and reliable, then - the answer is that once you have the first non-trivial unsafe block that has not gotten standard-library-level of scrutiny. But if that's your view, you should not be all that starry-eyed about how "Rust is a safe language!" to begin with.
On the other hand, if you really do want to rely on Rust's strong safety guarantees, then the answer is: From the moment you use any library with unsafe code.
My 2 cents, anyway.
Not to mention they link to libc.. All rust code does last I checked…
There is an option to not link to it for instances like OS writing and embedded. Writing everything in pure Rust without libc is entirely possible, even if an effort in losing sanity when you're reimplementing every syscall you need from scratch.
But even then, your code is calling out to kernel functions which are probably written in C or assembly, and therefore "dangerous."
Rust code safety is overhyped frequently, but reducing an attack surface is still an improvement over not doing so.
I agree and binary exploitation/Vulnerability Research is my area of expertise.. The whole "Lets port everything to Rust" is so misguided. Binary exploitation has already gotten 20x harder than say ten years ago.. Even so.. Most big breaches happen because people reuse their password or just give it out... Nation States are pretty much the only parties capable of delivering full kill chains that exploit, say chrome... That is why I moved to the embedded space.. Still so insecure...
> The whole "Lets port everything to Rust" is so misguided.
Well, good thing that nobody sane is saying that then.
https://github.com/embassy-rs/embassy
Ironically using C without libc turns out to be easier (except for portability of course). The kernel ABI is much more sane than <stdio.h>. The only useful parts of libc are DNS resolution and text formatting, both of which it does rather poorly.
By text formatting, do you mean printf and the like? It is pretty powerful in my experience.
Also, DNS resolution isn't part of the C standard, it's a POSIX interface I think.
"faster than C" almost always boils down to different designs, implementations, algorithms, etc.
Perhaps it is faster than already-existing implementations, sure, but not "faster than C", and it is odd to make such claims.
zlib-ng is pretty much assembly - with a bit of C. There is this quote: but was not entirely fair because our rust implementation could assume that certain SIMD capabilities would be available, while zlib-ng had to check for them at runtime
zlib-ng can be compiled to whatever target arch is necessary, and the original post doesn't mention how it was compiled and what architecture and so on.
It's another case not to trust micro benchmarks
Nevertheless Russinovich actually says something in the lines of "simple rewriting in rust made some our code 5-15% faster (without deliberate optimizations)": https://www.youtube.com/watch?v=1VgptLwP588&t=351s
Without analysis as to what caused that, that statement is meaningless.
For example, he says they didn’t set out to improve the code, but they were porting decennia-old C code to rust. Given the subject (truetype font parsing and rendering), my guess would be that the original code had more memory copies copying data out of the font data because rust makes it easier to safely avoid that (in which case the conclusion would be “C could be as fast, but with a lot more effort”), but it could also be that they spent a day figuring out some code did to realize that it wasn’t necessary on anything after Windows 95, and stripped it out, rather than porting it.
I understand their improvement figures exactly as you wrote, "C could be as fast, but with a lot more effort".
Yes, if your code in Lang-X is faster than C, it's almost certainly a skill issue somewhere in the C implementation.
However, in the day-to-day, if I can make my code run faster in Lang-X than C, especially if I'm using Lang-X for only a couple of months and C potentially for decades, that is absolutely meaningful. Sure, we can make the C code just as fast, but it's not viable to spend that much time and expertise on every small issue.
Outside of "which lang is better" discussions on online forums, it doesn't matter how fast you can theoretically make your program, it matters how fast you actually make it with the constraints your business have (time usually).
The skill issue part is a pretty interesting part of the conversation.
I'm always reminded of this video, where the author writes the same program in Rust and Go.
https://www.youtube.com/watch?v=Z0GX2mTUtfo
> Now, the Rust version took me about five times as long as the Go version
> The Go one performed almost identically well
Now this was for netcode rather than number crunching. But I actually had a similar surprise with number crunching, with C# and C++. I wrote the same program (rational approximation of Pi), line for line, in both languages, and the C# version ran faster. Apparently C# aggressively optimizes hot code paths while running, whereas to get that behavior in C++, you need to collect profiler data and use a special compiler flag.
Well, what I said is also true for Rust and Go. Sure if your Go code is faster than your Rust code, one could argue you have skill issues in Rust, but if to get the Rust program faster than your Go program requires 10x time (or more), it’s fair to say that Go is faster and simpler, even if it would be more precise to say that the Go code you can write performs as well as the Rust code you can write.
I’m sure I’m missing context, and presumably there are other benefits, but 5-15% improvement is such a small step to justify rewriting codebases.
I also wonder how much of an improvement you’d get by just asking for a “simple rewrite” in the existing language. I suspect there are often performance improvements to be had with simple changes in the existing language
Far better justification for a rewrite like this is if it eases maintenance, or simplifies building/testing/distribution. Taking an experienced and committed team of C developers with a mature code base, and retraining them to rewrite their project in Rust for its own sake is pretty absurd. But if you have a team that’s more comfortable in Rust, then doing so could make a lot of sense - and, yes, make it easier to ensure the product is secure and memory-safe.
Disagree - a rewrite for “maintainability” is an engineer saying they want to rewrite in their preferred language. I wouldn’t allow someone on my team to rewrite a core dependency for “maintainability”, but I absolutely would if they suggested it would be faster and safer.
> a rewrite for “maintainability” is an engineer saying they want to rewrite in their preferred language
Not necessarily—sometimes languages are especially poorly suited for tasks or difficult to hire for.
We’re talking about a rust rewrite of a fairly core level library. I don’t think C is inherently unsuitable or difficult to hire for. If the library was in Fortran then maybe.
But yes you are technically correct, congratulations.
I was responding to a general claim. In any case, I certainly disagree that C is suitable in 2025 for the vast majority of possible use-cases. For fun? Sure, but not for shipping code you want to rely on.
Obviously the code isn't going anywhere, and obviously we DO have reliable code we've built with C. But acting like C and Rust deliver equivalent value is simply farcical: you choose C for rapid development and cheap devs (or some other niche concern, like using an obscure embedded arch), and you choose rust to solve the problems that C introduced.
Million dollar question: why Rust over <insert any memory safe language>? Common Lisp? OCaml, Ada / SPARK, etc. if not C?
> if you have a team that’s more comfortable in
As is the case with any languages, of course, it is not in favor (nor against) Rust.
I agree that simple rewriting could have given some if not all perf benefits, but can it be the case that rust forces us to structure code in a way that is for some reason more performant in some cases?
5-15% is a big deal for a low-level foundational code, especially if you get it along with some other guarantees, which may be of greater importance.
> 5-15% improvement is such a small step to justify rewriting codebases
They hadn't expected any perf improvements at all. Quite the opposite, in fact. They were surprised that they saw perf improvements right away.
There are hopefully very few things that can be done to low level building blocks. A 15% improvement is absolutely worth it for a library as widely used as a compression library.
Even 5% on a hot path are quite the big gains, actually.
Furthermore, they said that they did not expect any performance gains. They did the rewrite for other reasons and got the unexpected bonus of extra performance.
One big part I've noticed when working in rust is that, because the compilation and analysis checks you're given are so much stronger than in C or C++, and because the ecosystem of crates is so easy to make use of, I'll generally be able to make use of more advanced algorithms and methods.
I'm currently working with ~150 dependencies in my current project which I know would be a major hurdle in previous C or C++ projects.
Everything you said is correct of course, but the idea of auditing 150 dependencies makes me feel ill. It's essentially impossible for a single person.
This is why sharing code is so important; it doesn't fall on one person, but instead, on the overall community.
For example, cargo-vet and cargo-crev allow you to rely on others you trust to help audit dependencies.
Oh, absolutely. Software cannot scale without trust. No single person is capable of auditing their browser or operating system either.
The effort is _roughly_ proportional - if you need to parse JSON in either language you can write it yourself or use an existing library. Both of those are the same amount of work in c++ and rust.
This has generally been the case, but a system language like Rust has access to optimisations that C simply won't have due to the compiler having so much more information (e.g. being able to skip run time array size checks because the compiler was able to prove out of bounds access cannot occur).
If anything, this should be “zlib-rs is faster than zlib-ng”, but not “$library is faster than $programming_language”.
I heard that aliasing in C prevents the compiler from optimizing aggressively. I can believe Rust's compiler can optimize more aggressively if there's no aliasing problem.
C has the restrict type qualifier to express non-aliasing, hence it shouldn’t be a fundamental impediment.
Which is so underused that the whole compiler feature was buggy as hell, and was only recently fixed because compiling Rust where it is the norm exposed it.
I believe it was mostly an llvm problem and gcc supports restrict fine thanks to Fortran support
My understanding is that noalias isn't fully utilized by LLVM, just that it's less buggy now, so there's some uncertainty leaning in favor of Rust in terms of future Rust-specific optimizations. Certainly a language like Fortran, with its restrictions, delivers accordingly on optimization, so I imagine Rust has plenty of room to grow similarly.
> fundamental impediment
This is an interesting word. I wonder why no one has written high performance library code in assembly yet at this point?
Well, most of them is written in C/C++ and Fortran, is it not the case?
> I wonder why no one has written high performance library code in assembly yet at this point?
What do you mean by that?
There is plenty of hand-rolled assembly in low-level libraries, whether you look at OpenBLAS (17%), GMP (36%), BoringSSL (25%), WolfSSL (14%) -- all of these numbers are based on looking at Github's language breakdown (which is measured on a per-file basis, so doesn't count inline asm or heavy use of intrinsics).
There are contexts where you want better performance guarantees than the compiler will give you. If you're dealing with cryptography, you probably want to guard against timing attacks via constant-time code. If you're dealing with math, maybe you really do want to eke out as much performance as possible, autovectorization just isn't doing what you want it to do, and your intrinsic-based code just isn't using all your registers as efficiently as you'd like.
The fact that it's faster than the C implementation that surely had more time and effort put into it doesn't look good for C here.
C++ surpassed C performance decades ago. While C still has some lingering cachet from its history of being “fast”, most software engineers have not worked at a time when it was actually true. C has never been that amenable to scalable optimization, due mostly to very limited abstractions and compile-time codegen.
It says absolutely nothing about the programming language though.
Doesn’t it say something if Rust programmers routinely feel more comfortable making aggressive optimizations and have more time to do so? We maintain code for longer than the time taken to write the first version and not having to pay as much ongoing overhead cost is worth something.
How can it not? Experts in C taking longer to make a slower and less safe implementation than experts in Rust? It's not conclusive but it most certainly says something about the language.
I think you'll find that if you re-write an application, feature-for-feature, without changing its language, the re-written version will be faster.
This is known as the Second System Effect: where Great Rewrites always succeed in making a more performant thing.
I am not sure if the semantics have drifted over the decades to what you say, but this seems not quite right according to wikipedia: https://en.wikipedia.org/wiki/Second-system_effect
EDIT: but I do agree that starting greenfield from an old code base is often a path towards performance.
The thing is, Rust allows you to casually code things that are fast. A few years back I took part in an "all programming languages allowed" competition on a popular hacker blog in my country. The topic was who writes the fastest tokenizer (a thing splitting sentences into words).
I took 15 minutes to write one in Rust (a language I had just learned by that point) using a "that should work" approach and became second place, with some high effort C-implementations being slower and a highly optimized assembler variant taking first place.
Since then I programmed a lot more in C and C++ as well (for other reasons) and got more experience. Rust is not automatically faster, but the defaults and std library of Rust is so well put together that a common-sense approach will outperform most C code without even trying – and it does so while having typesafety and memory safety. This is not nothing in my book and still extremely impressive.
The best thing about learning Rust however was how much I learned for all the other languages. Because what you learn there is not just how to use Rust, but how to program well. Understanding the way the Rust borrow checker works 1000% helped me avoiding nasty bugs in C/C++ by realizing that I violatr ownership rules (e.g. by having multiple writers)
... because by "C" we mean handwritten inline assembler.
Typical realworld C code uses \0 terminated strings and strlen() with O(len^2) complexity.
I think this may not be a very high bar. zippy in Nim claims to be about 1.5x to 2.0x faster than zlib: https://github.com/guzba/zippy I think there are also faster zlib's around in C than the standard install one, such as https://github.com/ebiggers/libdeflate (EDIT: also mentioned elsethread https://news.ycombinator.com/item?id=43381768 by mananaysiempre)
zlib itself seems pretty antiquated/outdated these days, but it does remain popular, even as a basis for newer parallel-friendly formats such as https://www.htslib.org/doc/bgzip.html
The bar here is not zlib, it's zlib-ng, which aims primarily for performance.
libdeflate is an impressive library, but it doesn't help if you need to stream data rather than having it all in memory at once.
The benchmarks in the parent post are comparing to zlib-ng, which is substantially faster than zlib. The zippy claims are against "zlib found on a fresh Linux install" which at least for Debian is classic zlib.
They're comparing against zlib-ng, not zlib. zlib-ng is more than twice as fast as zlib for decompression. https://github.com/zlib-ng/zlib-ng/discussions/871
libdeflate is not zlib compatible. It doesn't support streaming decompression.
Thanks (to all correctors). FWIW, that zlib-ng discussion page you link to has way more information about what machine the benchmarks were run on than TFA. It's also a safe bet that Google timed their chromium lib (which seems really close) on a much larger diversity of core architectures than these 3..4 guys have with zlib-rs. So, you know, very early days in terms of perf claims, IMO.
Also, FWIW, that zippy Nim library has essentially zero CPU-specific optimizations that I could find. Maybe one tiny one in some checksumming bit. Optimization is specialization. So, I'd guess it's probably a little slower than zlib-ng now that this is pointed out, but as @hinkley observed, portability can also be a meaningful goal/axis.
Zlib is unapologetically written to be portable rather than fast. It is absolutely no wonder that a Rust implementation would be faster. It runs on a pathetically small number of systems by contrast. This is not a dig at Rust, it’s an acknowledgement of how many systems exist out there, once you include embedded, automotive, aerospace, telecom, industrial control systems, and mainframes.
Richard Hipp denounces claims that SQLite is the widest-used piece of code in the world and offers zlib as a candidate for that title, which I believe he is entirely correct about. I’ve been consciously using it for almost thirty years, and for a few years before that without knowing I was.
Except this comparison isn’t against zlib, it’s against zlib-ng [0]. The readme states:
> The result is a better performing and easier to maintain zlib-ng.
So they’re comparing a first pass rewrite against a variation of zlib designed for performance
[0] https://github.com/zlib-ng/zlib-ng
I think performance is an underappreciated benefit of safe languages that compile to machine code.
If you're writing your program in C, you're afraid of shooting yourself in the foot and introducing security vulnerabilities, so you'll naturally tend to avoid significant refactorings or complicated multithreading unless necessary. If you have Rust's memory safety guarantees, Go's channels and lightweight goroutines, or the access to a test runner from either of those languages, that's suddenly a lot less of a problem.
The compiler guarantees you get won't hurt either. Just to give a simple example, if your Rust function receives an immutable reference to a struct, it can rely on the fact that a member of that struct won't magically be mutated by a call to some random function through spooky action at a distance. It can just keep it on the stack / in a callee-saved register instead of fetching it from memory at every loop iteration, if that's more optimal.
Then there's the easy access to package ecosystems and extensive standard libraries. If there's a super popular do_foo package, you can almost guarantee that it was a bottleneck for somebody at some point, so it's probably optimized to hell and back. It's certainly more optimized than your simple 10-line do_foo function that you would have written in C, because that's easier than dealing with yet another third-party library and whatever build system it uses.
Chromium is kind of stuck with zlib because it's the algorithm that's in the standards, but if you're making your own protocol, you can do even better than this by picking a better algorithm. Zstandard is faster and compresses better. LZ4 is much faster, but not quite as small.
Some reading: https://jolynch.github.io/posts/use_fast_data_algorithms/
(As an aside, at my last job container pushes / pulls were in the development critical path for a lot of workflows. It turns out that sha256 and gzip are responsible for a lot of the time spent during container startup. Fortunately, Zstandard is allowed, and blake3 digests will be allowed soon.)
`Content-Encoding: zstd` was added to Chromium a while ago: https://chromestatus.com/feature/6186023867908096
You can still use deflate for compression, but Brotli and Zstd have been available in all modern browsers for quite some time.
Safari doesn't support zstd, that means if you want to use it you have to support multiple formats.
> Zstandard is faster and compresses better.
However, keep in mind that zstd also needs much more memory. IIRC, it uses by default 8 megabytes as its buffer size (and can be configured to use many times more than that), while zlib uses at most 32 kilobytes, allowing it to run even on small 16-bit processors.
Chromium supports brotli and zstd
Yeah I just discovered this a few days ago. All the docker-era tools default to gzip but if using, say, bazel rules_oci instead of rules_docker you can turn on zstd for large speedups in push/pull time.
It's barely faster. I would say it's more accurate to say it's as fast as C, which is still a great achievement.
But it is faster. The closer to theoretical maximum the smaller the gains become.
Zlib-ng is between a couple and multiple times away from the state of the art[1], it’s just that nobody has yet done the (hard) work of adjusting libdeflate[2] to a richer API than “complete buffer in, complete buffer out”.
[1] https://github.com/zlib-ng/zlib-ng/issues/1486
[2] https://github.com/ebiggers/libdeflate
"Barely" or not is completely irrelevant. The fact is that it's measurably faster than the C implementation with the more common parameters. So the point that you're trying to make isn't clear tbh.
Also I'm pretty sure that the C implementation had more man hours put into it than the Rust one.
I think that would be really hard to measure. In particular, for this sort of very optimized code, we’d want to separate out the time spent designing the algorithms (which the Rust version benefits from as well). Actually I don’t think that is possible at all (how will we separate out time spent coding experiments in C, then learning from them).
Fortunately these “which language is best” SLOC measuring contests are just frivolous little things that only silly people take seriously.
It's... basically written in C. I'm no expert on zlib/deflate or related algorithms, but digging around https://github.com/trifectatechfoundation/zlib-rs/ almost every block with meaningful logic is marked unsafe. There's raw allocation management, raw slicing of arrays, etc... This code looks and smells like C, and very much not like rust. I don't know that this is a direct transcription of the C code, but if you were to try something like that this is sort of what it would look like.
I think there's lots of value in wrapping a raw/unsafe implementation with a rust API, but that's not quite what most people think of when writing code "in rust".
> basically written in C
Unsafe Rust still has to conform to many of Rust’s rules. It is meaningfully different than C.
It has also way less tooling available than C to analyze its safety.
The things I’ve seen broadly adopted in the industry (i.e. sanitizers) are equally available in Rust. & Rust’s testing infrastructure is standardized so tests are actually common to see in every library.
The number of tools matters less than the quality of the tools. Rust’s inherent guarantees + miri + software verification tools mean that in practice Rust code, even with unsafe, ends up being higher quality.
Miri is better than any C tool I'm aware of for runtime UB detection.
Miri is the closest to a UB specification for Rust that there is, coming in the form of a tool so you can run it. It's really cool but Valgrind, which is a C tool that also supports Rust, also supports Rust code that calls to C and that does I/O, both pretty common things for programs to do.
Are there examples you're thinking about? The only good ones I can think of are bits about undefined behavior semantics, which frankly are very well covered in modern C code via tools like ubsan, etc...
They're just fundamentally different languages. There's semantics that exist in all four of these quadrants:
* defined in C, undefined in Rust
* undefined in C, undefined in Rust
* defined in Rust, undefined in C
* defined in Rust, defined in C
That doesn't seem responsive. The question wasn't whether Rust and C are literally the same language ("duh", as it were), it was effectively "are there meaningful safety features provided to the unsafe zlib-rs code in question in that aren't already available in C toolchains/ecosystems?"
And there really aren't. The abbreviated/limited safety environment being exploited by this non-idiomatic Rust code seems to me to be basically isomorphic to the way you'd solve the problem in C.
> it was effectively "are there meaningful safety features provided to the unsafe zlib-rs code in question in that aren't already available in C toolchains/ecosystems?"
Ah, so that was like, not in your comment, but in a parent.
> And there really aren't.
I mean, not all of the code is unsafe. From a cursory glance, there's surely way more here than I see in most Rust packages, but that doesn't mean that you get no advantages. I picked a random file, and chose some random code out of it, and see this:
The semantics of safe code, `&mut T`, provide the justification for why the unsafe code is okay. Heck, this code wouldn't even be legal in C, thanks to strict aliasing. (Well, I guess you could argue that in C code they'd be of the same type, since you don't have "might be uninitialized" in C's typesystem, but again, this is an invariant encoded in the type system that C can't do, so it's not possible to express in C for that reason either.)Isn't that exactly my point though? This is just a memcpy(). In C, you do some analysis to prove to yourself that the pointers are valid[1]. In this unsafe Rust code, the author did some analysis to prove the same thing. I mean, sure, the specific analyses use words and jargon that are different. I don't think that's particularly notable. This is C code, written in Rust.
[1] FWIW, memcpy() arguments are declared restrict post-C99, the strict aliasing thing doesn't apply, for exactly the reason you're imagining.
> In C, you do some analysis to prove to yourself that the pointers are valid[1]
Right, and in Rust, you don't have to do it yourself: the language does it for you. If the signature were in C, you'd have to analyze the callers to make sure that this property is upheld when invoked. In Rust, the compiler does that for you.
> the strict aliasing thing doesn't apply
Yes, this is the case in this specific instance due to it being literally memcpy, but if it were any other function with the same signature, the problem would exist. Again, I picked some code at random, I'm not saying this one specific instance is even the best one. The broader point of "Rust has a type system that lets you encode more invariants than C's" is still broadly true.
> In Rust, the compiler does that for you.
No it doesn't? That comment is expressing a human analysis. The compiler would allow you to stuff any pointer in that you want, even ones that overlap. You're right that some side effects of the runtime can be exploited to do that analysis. But that's true of C too! (Like, "these are two separate heap blocks", or "these are owned by two separate objects", etc...). Still human analysis.
Frankly you're overselling hard here. A human author can absolutely mess that analysis up, which is the whole reason Rust calls it "unsafe" to begin with.
I think you're misunderstanding of what I'm claiming is being checked. I don't mean the unsafe block directly. I mean that &mut Ts do not alias. That is checked by the compiler.
I'm saying that even in a codebase with a lot of unsafe, the checks that are still performed have value.
Sure, but C++ objects returned from operator new are likewise guaranteed not to alias. There's "value" there, but not a lot of value. And I repeat, you're overselling hard here. People who write rust like this are going to produce roughly the same amount of memory safety bugs, and pretending otherwise is frankly dangerous, IMHO.
The difference is:
In c++ i could do something like:
x_ptr = new object y_ptr = x_ptr
copy(x_ptr, y_ptr)
In safe rust there is no way to call the function in question if that sort of aliasing has happened. This means that if you get a bug from your copy, its in the copy method - the possibility it's been used inappropriately has been eliminated.
It reduces the search space for problems from: everywhere that created a pointer that is ultimately used in the copy, to: the copy function itself.
It reduces the number of programmers who have to keep the memory semantics of that copy in their head from "potentially everyone" to just "those who directly implement and check copy".
Pretending that has no value is absurd.
This comment summarizes the difference of unsafe Rust quite well. Basically, mostly safe Rust, but with few exceptions, fewer than one would imagine: https://news.ycombinator.com/item?id=43382176
C is not assembly, nor is it portable assembly at all in this century, so your phrasing is very off.
C code will go through a huge amounts of transformations by the compiler, and unless you are a compiler expert you will have no idea how the resulting code looks. It's not targeting the PDP-11 anymore.
I mentioned in under another comment - and while I consider myself versed enough in deflate - comparing the library to zlib-ng is quite weird as the latter is generally hand written assembly. In order to beat it'd take some oddity in the test itself
I'm not sure why people say this about certain languages (it is sometimes said about Haskell, as well).
The code has a C style to it, but that doesn't mean it wasn't actually written in Rust -- Rust deliberately has features to support writing this kind of code, in concert with safer, stricter code.
Imagine if we applied this standard to C code. "Zlib-NG is basically written in assembler, not C..." https://github.com/zlib-ng/zlib-ng/blob/50e9ca06e29867a9014e...
> Imagine if we applied this standard to C code. "Zlib-NG is basically written in assembler, not C..."
We absolutely should, if someone claimed/implied-via-headline that naive C was natively as fast as hand-tuned assembly! This kind of context matters.
FWIW: I'm not talking about the assembly in zlib-rs, I was specifically limiting my analysis to the rust layers doing memory organization, etc... Discussing Rust is just exhausting. It's one digression after another, like the community can't just take a reasonable point ("zlib-rs isn't a good example of idiomatic rust performance") on its face.
I'm not sure anyone really believes `zlib-rs` is a good example of idiomatic Rust performance, though
Maybe the reason I think that is because I've written Rust for a variety of purposes (web application, database bindings, high performance parser) so I account for the "register" of Rust that is appropriate without thinking about it.
https://en.wikipedia.org/wiki/Register_(sociolinguistics)
It might be that a simple description like the headline leads some people to believe they could write Rust the easy way and get code that's as fast as writing "Rust the hard way".
However, that is different than what you earlier said -- "It's... basically written in C.". I have actually written Rust programs where some parts were literally written in C and linked in -- in order to build functioning plugins -- and there is a world of difference with that.
Regarding
Discussing Rust is just exhausting. It's one digression after another, like the community can't just take a reasonable point ("zlib-rs isn't a good example of idiomatic rust performance") on its face.
I'm just not sure what to say to this. What do you expect from me, here?
It does actually seem like what a C -> Rust transpiler would spit out.
Cannot understand your complain. It written in Rust, but for you it looks like C. So what?
So, it is basically like it was written in C.
Yes, it's possible to write in Rust like in C. This code is example of that. You can even use automatic code converter to convert C into Rust.
It doesn't exploit (and in fact deliberately evades) Rust's signature memory safety features. The impression from the headline is "Rust is as fast as C now!", but in fact the subset of the language that has been shown to be as fast as C is the subset that is basically isomorphic to C.
The impression a naive reader might take is that idiomatic/safe/best-practices Rust has now closed the performance gap. But clearly that's not happening here.
Rust's many memory safety features (including the borrow checker) are still enabled in unsafe Rust blocks.
For more information: https://news.ycombinator.com/item?id=43382176
But again, not exploited by the code in question. This isn't using the Rust runtime heap, it's doing its own thing with raw pointers/indexing, and even seems to have its own allocator.
That is not correct; in another comment you can see where the code takes advantage of the rust-specific &mut notation to use a fast memcpy for non-overlapping pointers.
> This isn't using the Rust runtime heap,
Rust does not have a specific "Rust runtime heap."
It does, it has a default global heap allocator.
That's not part of Rust, that's a feature of its standard library. This is the same as C, where a freestanding implementation doesn't include malloc.
Put another way, there's no issues with a library using its own heap if it wants to.
That's not a "Rust runtime", that's an extension point. The default setting is `malloc()`.
> The C code is able to use switch implicit fallthroughs to generate very efficient code. Rust does not have an equivalent of this mechanism
Rust very much can emulate this, with `break` + nested blocks. But not if you also add in `goto` to previous branches
Which library compiles faster.
Which library has fewer dependencies.
Is each library the same size. Which one is smaller.
I would argue compile time changes don't matter much, as the amount of data going through zlib all across the world is so large, that any performance gain should more than compensate any additional compilation time (and zlib-rs compiles in a couple of seconds anyway on my laptop).
As for dependencies: zlib, zlib-ng and zlib-rs all obviously need some access to OS APIs for filesystem access if compiled with that functionality. At least for zlib-rs: if you provide an allocator and don't need any of the file IO you can compile it without any dependencies (not even standard library or libc, just a couple of core types are needed). zlib-rs does have some testing dependencies though, but I think that is fair. All in: all of them use almost exactly the same external dependencies (i.e.: nothing aside from libc-like functionality).
zlib-rs is a bit bigger by default (around 400KB), with some of the Rust machinery. But if you change some of that (i.e. panic=abort), use a nightly compiler (unfortunately still needed for the right flags) and add the right flags both libraries are virtually the same size, with zlib at about 119KB and zlib-rs at about 118KB.
One of the things I like about C is I can download a statically-compiled native GCC for use on a computer with modest amounts of memory, storage and a relatively old, slow CPU. Total size uncompressed is 242.3MB.
Using this I can statically compile a cross-compiler. Total size uncompressed 169.4MB.
I use GCC to compille zlib and a wide variety of other software. I can build an operating system from the ground up.
Perhaps someday during my lifetime it will be possible to compile programs written in Rust using inexpensive computers with modest amounts of memory, storage and relatively slow CPUs. Meanwhille, there is C.
Does this performance have anything to do with Rust itself, or is it just more optimized than the other C-language versions (more SIMD instructions / raw assembly code)? I ask because there is a canonical use case where C++ can consistently outperform C -- sorting, because the comparison operator in C++ allows for more compiler optimization compared to the C version: qsort(). I am wondering if there is something similar here for Rust vs C.
these are facts about the C and C++ stdlib sort functions which nobody should really use.
Rust folks love compare rust to C but C folks seldom compare C to rust.
Not that surprising, Rust folks are more likely to be familiar with C than the reverse.
Finally, now is the day - today - where rust is faster than C
If you're dealing with a compiled system language the language is going to make almost no difference in speed, especially if they are all being optimized by LLVM.
An optimized version that controls allocations, has good memory access patterns, uses SIMD and uses multi-threading can easily be 100x faster or more. Better memory access alone can speed a program up 20x or more.
New native code implementation of zlib faster than old native code version. So what? Rust has a lot of recommend it, but it's not automatically faster than C.
You mean the implementation is faster than the one in C. Because nothing is “faster than C”.
Why can’t something be faster than C? If a language is able to convey more information to a backend like LLVM, the backend could use that to produce more optimised code than what it could do for C.
For example, if the language is able to say, for any two pointers, the two pointers will not overlap - that would enable the backend to optimise further. In C this requires an explicit restrict keyword. In Rust, it’s the default.
By the way this isn’t theoretical. Image decoders written in Rust are faster than ones written in C, probably because the backend is able to autovectorise better. (https://www.reddit.com/r/rust/comments/1ha7uyi/memorysafe_pn...).
grep (C) is about 5-10x slower than ripgrep (Rust). That’s why ripgrep is used to execute all searches in VS Code and not grep.
Or a different tack. If you wrote a program that needed to sort data, the Rust version would probably be faster thanks to the standard library sort being the fastest, across languages (https://github.com/rust-lang/rust/pull/124032). Again, faster than C.
Happy to give more examples if you’re interested.
There’s nothing special about C that entitles it to the crown of “nothing faster”. This would have made sense in 2005, not 2025.
Narrow correction on two points:
First, I would say that "ripgrep is generally faster than GNU grep" is a true statement. But sometimes GNU grep is faster than ripgrep and in many cases, performance is comparable or only a "little" slower than ripgrep.
Secondly, VS Code using ripgrep because of its speed is only one piece of the picture. Licensing was also a major consideration. There is an issue about this where they originally considered ripgrep (and ag if I recall correctly), but I'm on mobile so I don't have the link handy.
The kind of code you can write in rust can indeed be faster than C, but someone will wax poetic about how anything is possible in C and they would be valid.
The major reason that rust can be faster than C though, is because due to the way the compiler is constructed, you can lean on threading idiomatically. The same can be true for Go, coroutines vs no coroutines in some cases is going to be faster for the use case.
You can write these things to be the same speed or even faster in C, but you won’t, because it’s hard and you will introduce more bugs per KLOC in C with concurrency vs Go or Rust.
> but someone will wax poetic about how anything is possible in C and they would be valid.
Not at all would that be valid.
C has a semantic model which was close to how early CPUs worked, but a lot has changed since. It's more like CPUs deliberately expose an API so that C programmers could feel at home, but stuff like SIMD and the like is non-existent in C besides as compiler extensions. But even just calling conventions, the stack, etc are all stuff you have no real control over in the C language, and a more optimal version of your code might want to do so. Sure, the compiler might be sufficiently smart, but then it might as well convert my Python script to that ultra-efficient machine code, right?
So no, you simply can't write everything in C, something like simd-json is just not possible. Can you put inline assembly into C? Yeah, but I can also call inline assembly from Scratch and JS, that's not C at all.
Also, Go is not even playing in the same ballpark as C/C++/Rust.
If you don't count manual SIMD intrinsics or inline assembly as C, then Rust and FORTRAN can be faster than C. This is mainly thanks to having pointer aliasing guarantees that C doesn't have. They can get autovectorization optimizations where C's semantics get in the way.
Of course many things can be faster than C, because C is very far from modern hardware. If you compile with optimisation flags, the generated machine code looks nothing like what you programmed in C.
It is quite easy for C++ and Rust to both be faster than C in things larger than toy projects. C is hardly a panacea of efficiency, and the language makes useful things very hard to do efficiently.
You can contort C to trick it into being fast[1], but it quickly becomes an unmaintainable nightmare so almost nobody does.
1: eg, correct use of restrict, manually creating move semantics, manually creating small string optimizations, etc...
In the chance this is a speed of light joke, I'll add pedantically that C isn't the speed of light. Mathematics/Physics symbols are case sensitive.
Nothing is faster than C in a vacuum, but depending on the context (medium?) that can happen.
In other words, someone should name a language Cerenkov
Gravitation is slightly faster than c in vacuum.
In GR, the speed of gravitational waves is _exactly equal_ to c.
In reality, speed of light is slightly lower than speed of gravitation, because gravitation slows down speed of light.
We were presumably talking about an ideal massless space [Minkowski] in which the speed of light in a vaccuum is considered -- that is what c is defined as.
Fortran has been faster than C, because C has aliasing, preventing optimizations. At least for decades this was why for some applications Fortran was just faster.
It's not just "a sufficiently smart compiler", without completely unrealistic (as in "halting problem" unrealistic, in the general case) "smartness".
So no, C is inherently slower than some other languages.
Wtf, since when?
Besides the famous "C is not a low-level language" blog post.. I don't even get what you are thinking. C is not even the performance queen for large programs (the de facto standard today is C++ for good reasons), let alone for tiny ultra hot loops like codecs and stuff, which are all hand-written assembly.
It's not even hard to beat C with something like Rust or C++, because you can properly do high level optimizations as the language is expressive enough for that.
Tachyons?
Maybe if you reverse the beam polarity and route them through the main deflector array.
But that requires rerouting auxiliary power from life support to the shield generators. In Rust you would need to use unsafe for that.
C after an optimizing compiler has chewed through it is faster than C
Bravo. Now Rust has its existence justified.
While AI can certainly produce code that's faster or otherwise better than human-written code, I am utterly skeptical of LLMs doing that. My own experience with LLM is that humans can do everything they do, but they are faster than humans. I believe we should look at non-LLM AI technologies for going beyond what a skilled human programmer can expect to do. The most famous example of AI doing that is https://www.nature.com/articles/s41586-023-06004-9 where no LLM is involved.