________________________________________________________________________________
Did I miss something? I got to the end and still don't know what -n % n actually accomplishes in practice, nor why it's mainly used in high-performance code
It computes (2^b) % n, assuming n is an unsigned b-bit integer. You can't do this directly, since 2^b itself doesn't fit into a b-bit integer.
First it's important to note that `n` is unsigned; if it's signed the value of `-n % n` is 0, intuitively.
For unsigned n, the value is: MAX - n + 1 (where max is the maximum representable value in the type of n, e.g., UINT_MAX). The article explains this nicely. (I thought of 2's complement when reasoning through this, but you don't actually need to assume 2's complement to follow the reasoning).
So, `-n % n` computes `(MAX - n + 1) % n` efficiently, without needing to worry about corner cases.
I suspect this is useful when you want to generate random numbers with a limited range, where the range doesn't cleanly divide UINT_MAX. You need to cut a bit off the top from your underlying random number generator.
-n % n (where n is unsigned) suffers from the problem that -n calculates a two's complement. That value is implementation-defined, due to the implementation-defined width of the unsigned type.
It's calculating ((-n) mod (2^bits)) mod n, where bits is compiler/platform-dependent.
;; TXR Lisp 1> (defun -n%n (n bits) (mod (mod (- n) (expt 2 bits)) n)) -n%n 2> (-n%n 7 8) 4 3> (-n%n 7 16) 2 4> (-n%n 7 17) 4 5> (-n%n 7 18) 1 6> (-n%n 7 32) 4 7> (-n%n 7 64) 2
I would not use this, except possibly with a precise-width type like uint32_t.
> -n % n (where n is unsigned) suffers from the problem that -n calculates a two's complement. That value is implementation-defined, due to the implementation-defined width of the unsigned type.
In C a "computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type." (C11 6.2.5p9) In other words, it behaves as-if it were two's complement. So the "problem" of assuming two's complement isn't actually a problem at all. (On machines that use signed-magnitude representation for integer types, for example, such as some only recently retired Unisys mainframes, C compilers actually emulate unsigned arithmetic in the generated code.)
While the number of value and padding bits of integer types (other than the fixed-width types) is implemented-defined, that dilemma is precisely what the construct -n % n is intended to deal with, and to do so in a type-safe manner (i.e. no risk of somebody changing the underlying types without changing UFOO_MAX to UBAR_MAX, presuming the macros even exist).
Your alternative construct just assumes "bits" out of thin air, but because the number of padding bits is implementation-defined in C you can only deduce this from UFOO_MAX[1]. In the next C version there should be new macros that specify the number of value bits, but there's still the type-safety problem of the compiler unable to enforce the constraint that a macro (or any other parameter, for that matter) represents a characteristic of the type your primary expression actually uses. This can be problematic even in languages where you can directly query properties of a type.
-n % n is well suited to its purpose. Elegant, even. And not just in the context of C.
[1] EDIT: Or, alternatively, (unsigned type)-1.
Woah, there's even a scary gotcha for types that aren't unsigned. E.g., check out what happens with "unsigned short":
https://gcc.godbolt.org/z/4xWo1G
. It looks like -n is implicitly converted (to signed int, maybe?) so unless you explicitly re-cast it to "unsigned short", you get something unexpected.
If n is unsigned short then -n means n is first promoted to either int, or unsigned int: the first of those two types which holds all values of unsigned short. Then the - operation is taking place on the resulting value in that promoted type.
An example where unsigned short promotes to unsigned int are platforms where sizeof(short) == sizeof(int). E.g. 16 bit systems where short and int is 16 bits, and long is 32: compilers for 8086 and such.
If the promotion goes to int, the subsequent - is safe; there is no way the resulting value can be that problematic most negative int of two's complement.
On two's complement systems, even if the - is signed, if you convert the value back to unsigned short, you will get the two's complement value, as if the calculation was done in the unsigned type all along.
For instance a 16 bit unsigned short value of 65535 (0xFFFF) will go to 65535 of type int, which will go to -65535, which will then map to 1 if converted to unsigned short.
Right - I mean, I’m not confused about what’s going on, now that I’ve seen the result. But I do think it is surprising that `unsigned int` and `unsigned short` behave differently in this expression, even if I basically see the logic in the standard.
Well, it computes "(MAX - n + 1) % n", where MAX obviously depends on the type.
let threshold = (0 &- range) % range
It is somewhat inconvenient that Swift forces us to write so much code, but we must admit that the result is probably less likely to confuse a good programmer.
Oh come on, it's basically just one extra character.
_warning C4146: unary minus operator applied to unsigned type, result still unsigned_
#ifdef _MSVC x = ~x + 1; // "Manual" two's complement to avoid warning. #else x = -x; // Regular two's complement any good C coder knows #endif
If you're going to go to the effort of writing the complement version instead of pragma-suppressing the warning on that line, just remove the ifdef and use that with all compilers.
Then I would look like someone who doesn't know that unary minus on an unsigned int obtains the two's complement. That aspect can be taken care of by a comment. Why I might go for the #ifdef is to avoid changing working code, except for that one compiler that is complaining.
This only works on two's complement machines. The C++ standard only required two's complement with c++20.
That being said I am not sure I've ever programmed a one's complement machine.
>This only works on two's complement machines.
No it does not. The behavior of -n where n is an unsigned integer is defined by the standard to be the result of subtracting n from 2^(number of bits in the type), independent of the characteristics of the machine.
This is true of both C and C++; and has been true since at least C99 and C++03
Random C++ standard draft from 2005:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n190...
Section 5.3.1
>The operand of the unary - operator shall have arithmetic or enumeration type and the result is the negation of its operand. Integral promotion is performed on integral or enumeration operands. The negative of an unsigned quantity is computed by subtracting its value from 2^n, where n is the number of bits in the promoted operand. The type of the result is the type of the promoted operand.
Random C standard draft from 2007:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
Section 6.2.5
>The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same. A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
I don’t read those extracts as implying the same result as you do. The result type (promoted type in the C++ case) will still be unsigned, yet will be in most cases (especially small ones) full of 1s where there were zeros, and won’t compute the expected modulus. Draw out the bit patterns of 1’s vs 2’s complement for (say) n=5 and n=65535 and see if I have it wrong.
Not sure what you're trying to say. The point of what I wrote, and what the article says, is that `-n` is equivalent to `2^N - n`, and thus `-n % n` is equivalent to `(2^N - n) % n`.
If 2^N - n happens to be a number that's "full of 1s where there were zeros", then that's what it's supposed to be. I don't understand why you think it "won't compute the expected modulus". That's exactly what it will do.
If N is 32 and n is 5, then -n is 4294967291 and -n % n is 1. It doesn't matter whether that was calculated on a ones complement or twos complement machine.
The ampersand (%) in this expression
s/ampersand/percent sign/
I read that too. And it was also were I decided to stop.
> And it was also were I decided to stop.
I can see why you might think this is a good heuristic---don't trust blog authors who don't know the right names for common typographic symbols---but in this case I think you went wrong. Daniel is an excellent computer scientist, with a talent for writing straightforward papers about difficult topics. He's also a non-native English speaker who is usually very gracious about accepting corrections to his normally excellent prose. In this case, it was probably just a typo. It's already been corrected---likely because of a comment on the post. But if you ignore everything that contains small temporary errors like this, you're probably missing out on a lot of otherwise good articles.
Well, in this instance your knee-jerk isn't helping you. Take a look at Dr. Lemire's list of papers:
https://lemire.me/en/#publications
There's a lot of high-performance machine code knowledge to be gained there.
It's zero in Python, which makes perfect sense to me. Quotient is negative, remainder is zero.
Because:
> When the variable range is a signed integer, then the expression -range % range is zero. In a programming language with only signed integers, like Java, this expression is always zero.
Not because of your intuition on the quotient. E.g. try -8 % 5, -8 % 6, and -8 % 7 and this feeling of perfect sense compared to the languages with unsigned types should melt away.
int main(void) {
unsigned int n = 7;
printf("%i", (-n % n));
return 0;
}
this gives 4 :(
Note that %i requires an _int_ argument, not _unsigned_; you want %u.
If your int is 32 bits wide, 4 is exactly what it should give.
Apart from the overall oddness of this, the result of applying the modulo operator to signed arguments is technically undefined behavior in C.
This isn't correct, section 6.5.5 states merely that "The operands of the % operator shall have integer type."
Ah, my bad, it seems this was the case for K&R C but the behavior was since standardized.