💾 Archived View for dioskouroi.xyz › thread › 24918927 captured on 2020-10-31 at 00:53:49. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
Having used both stricter (Rust) and more loose programming languages (JS), I can add that stricter APIs have made me think more about edge cases than the less-strict APIs. In the less-strict languages I often find myself debugging some edge case, whereas in the stricter languages I am more often required to think about edge cases up front (including those that I know will not be relevant, so in some cases this causes more work than necessary!). Still I much prefer the stricter APIs.
I like Rusty Russell's classic "Hard to Misuse Interface Levels". The first couple of items are
0: Impossible to get wrong/DWIM
1: Compiler/linker won't let you get it wrong
...through
9: Read the correct LKML thread and you'll get it right - SET_MODULE_OWNER
...all the way up to 17. I'll let you discover the rest yourself :-)
https://ozlabs.org/~rusty/ols-2003-keynote/img39.html
pure gold
I think that's a trick question. If "substring()" has "substring(int start, int end)" semantics, then I prefer the strict version: it must be 0 <= start < len(s), start <= end <= len(s), otherwise it'll throw an exception.
But if the semantics is "substring(int start, int length)", then I prefer the _partial_ forgiveness _for the length_ parameter: if start + length > len(s), then assume length was actually len(s) - start; but if length < 0, still throw an exception.
Why should the out of bounds behavior be different depending on the semantics of the second parameter?
Because if you want to slice to the end you can just ask for the source's length which is free in most languages (it's not in C or in list-based languages), whereas the "result length" version is either going to have to be computed from the collection length or it's a _constraint_. In the latter case especially it's annoying to have to do weird computations on the basis of length in order to clamp your existing constraint in case the result would already fit.
Only if the stricter one really requires you to use it differently (e.g. because of checked exceptions). If not you will have to guess for each function if it is strict or not, which potentially causes even more trouble if you get it wrong.
Well, that's one of the reasons why I sometimes myself writing unit tests for the 3rd-party library functions: did I read the docs right? What about some obvious yet still undocumented edge cases? With particularly convoluted APIs those tests sometimes actually end staying in the project's tree.
In my experience forgiving APIs don't play well together. I've inadvertently tried it and wind up having to make everything 'forgiving in the same way'. Otherwise out-of-range inputs mean different things to different functions, and using multiple APIs requires tracking all the forgiving behavior.
Its not always binary. You want to throw an error at even the slight amount of error. This will annoy users, but over time as bugs and edge cases are fixed - it will lead to a very robust program.
Meanwhile you want to save time/work for the caller. You dont want the caller to need many lines of boilerplate just to setup the call.
For reference, this is what String.substring(int, int) does[1] in Java:
public String substring(int beginIndex, int endIndex) { int length = length(); checkBoundsBeginEnd(beginIndex, endIndex, length); ...
Where checkBoundsBeginEnd[2] does:
static void checkBoundsBeginEnd(int begin, int end, int length) { if (begin < 0 || begin > end || end > length) { throw new StringIndexOutOfBoundsException( "begin " + begin + ", end " + end + ", length " + length); } }
(as it should, in my opinion).
[1]
https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/f0ef2826d...
[2]
https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/f0ef2826d...
As someone who writes APIs for a living, this is often something interesting that I find a lot of division on:
Say you have an API that is documented to accept params A,B or C i.e /v1/api?A=1&B=2&C=3
What should happen if you pass a param D to it? i.e. /v1/api?A=1&B=2&C=3&D=4
The two most common schools of thought are:
1) Ignore D
2) Throw an error
Both present their own problems, esp. when D may be closely related to A,B,C. Interesting how API design also tends to side with personal preferences for strictness or leniency
In second case you may break other people's software because of your expectations on how the world should work.
For example, Twitter pages not working if there is a fbclid parameter.
FBCLID and UTM tags are their own kind of craziness. They shouldn't have been present in the first place.
What's the old adage - be lenient in what you accept, be strict in what you produce? That ensures you can operate in a wide variety of scenarios with little to no interaction and coordination with the clients.
What if D is "always-use-utf8"? Or, in general, the option being requested is something that places much more stringent restrictions on semantics.
There are two main reasons that users might validly use unknown options. The first is that it's an option with valid semantics in different versions/implementations of the protocol. The other is that you're reusing an option block for many API calls, and some of the options may not be relevant for some of those APIs. For the first use case, erroring out (or at least warning) is usually the superior solution: you don't know what it means, so you can't guarantee that you'll implement it correctly. For the latter use case, ignoring can be a safe solution.
FWIW, there's been a pushback against Postel's Law (which is what the adage you cite is usually called) in more recent times. In particular, it should be emphasized that the law is generally most applicable when you're dealing with multiple interpretations of an ambiguous specification, and is least applicable when the standards are prescribing or proscribing particular behavior.
The strict vs. lenient in this case also comes down to extensibility. HTML for example is notoriously forgiving and HTML parsers are quite complex because of it but it allowed HTML to be extended and improved upon while maintaining backwards compatibility.
If the first web browser was extremely strict on the tags and structure it supported then web could not have evolved that way that it did.
This may be unrelated but every time I see a timestamp as a string in json I wonder why. There is so much less to go wrong with a numeric unix timestamp yet 9 times out of 10 a developer will use a string.
If you are going to make things strict at least make it hard to mess up.
My 2 cents... For the vast majority of the world where legacy (or at least not-programmed-using-super-modern-standards) systems are still in play, clarity is the most important factor in software. What happens when a legacy system is actually returning the number of _seconds_ since epoch versus _milliseconds_? What if they're returning ticks (looking at you .NET...) because nobody realized ticks are inconsistent across different systems at the time the system was written?
I don't think you can get any more clear than an ISO timestamp (WITH time zone). Unix timestamps are great and efficient, but I personally value the absolute clarity of ISO above all else.
Unix timestamps aren't human readable
Use better tools.
What do you use when you debug XHR in chrome?
Or what do you use to tail logs?
most logfiles beg to differ. (bleh!)
they're also... unix ... timestamps.
An ISO date with timezone can be useful, if you would like to know the timezone when a particular bit of data was recorded.
If you mean ISO 8601 and its "2007-04-05T12:30-02:00" timezone thingy, not really.
It doesn't store a timezone, but a UTC offset. Which is a problem because the UTC offset of a timezone may change.
This happens rarely enough that people do indeed use "ISO 8601 + offset" to store "timezoned" timestamps, but often enough that it'll probably end up corrupting your data sooner or later.
Well, usually what you want is actully the UTC offset, not the timezone, because that's what you need to get the actual-wall-clock timestamp of the event. If you all you have is a timzeone, then you need an accurate historical tzdata _and_ also you need to be sure that the system had accurate tzdata itself when it produced the timestamp.
Of course, if you're storing the timestamp of a _future, planned_ event, then yes, you need a timezone. But for historical records, store the UTC offset that was in effect when the event happened.
2038 called, they'd like to talk.
Javascript numbers (as in JSON) can safely represent integers much higher than a 32 bit integer.
Ha! Almost. You can get slight rounding errors at nanosecond resolutions, so be aware of this if you're using Go's time.Time internally and convert it to/from float64 seconds since Unix epoch.
Something I read in the RFC for email message format (or maybe elsewhere, but I have it associated with email for some reason) is: to be strict in output that you produce, and lenient in the input you accept.
I think it makes some sense for data formats, if you want max compatibility of independently developed software that processes the data. Not sure about APIs though. Having them fail fast on unexpected input is pretty valuable.
That’s Postel’s robustness principle. Many people now believe it was a harmful idea that actually achieves the _opposite_ of what it was intended to achieve.
https://tools.ietf.org/html/draft-iab-protocol-maintenance-0...
expounds. (I confess I rather liked the earlier name of the draft, postel-was-wrong.)
Plenty of people have also argued that being strict on input would be bad
https://web.archive.org/web/20060613193727/http://diveintoma...
https://friendlybit.com/html/why-xhtml-is-a-bad-idea/
Those arguments are inapplicable here. They’re arguing against XHTML’s _particular form_ of strictness and its failure mode, when compared with HTML, given that both are available to you the web developer; they are not arguing against strictness itself. (Also: if browsers had _always_ rejected invalid documents, rather than having the HTML/XHTML split where one was liberal and one strict, I think we’d have had fewer problems, especially around injection bugs, and people would have been more careful about how they wrote their documents; but that’s fairly subjective.)
If you can choose whether your platform supports strict or forgiving behaviour, security interests will side with strict every time.
And indeed, HTML is no longer loose but rather strict in this protocol sense that we’re talking about, as it defines how _all_ inputs should be parsed, leaving no scope for being liberal in what you accept. (It may surprise people, but among specs of at least moderate complexity, HTML is by far the strictest out there that I know of. I wish more specs were as strict. Actually, JavaScript is probably fairly close.)
Also from the links of dissenting opinions at the start of the _dive into mark_ link,
https://web.archive.org/web/20060616150034/http://bitworking...
has a good discussion of just where Postel’s law may seem most reasonable to be applicable and inapplicable, with its two-axis (text–binary, data–language) diagram. It’s worth reading and contemplating.
I used to be a big fan of 'strict send, lenient accept', but looking at systems built on it -- (html, email), I think being lenient around format deviation just turns everything into a big mess.
Granted those are two of the most successful technologies ever, so perhaps it was the right call =).
But html and email are also incredibly successful... so hmm?
I believe being lenient in accepting input is what leads to SSRF attacks (HTTP request smuggling via disagreeing `transfer-encoding` and `content-length` headers).
It seems like "forgiving with warnings" should be another option listed here. This also lets API creators do neat things like warn people ahead of time of future deprecations, etc.
If I had to choose I'd say prefer strict... But reading this made me wonder: what if you could be strict during development but switch to forgiving for deployment?
> what if you could be strict during development but switch to forgiving for deployment?
That sounds like a debugging nightmare, where your production system is subtlety different in behaviour to your dev system.
Also how do you know that the forgiving behaviour is correct in production? Maybe it prevents a couple of scary error messages, but it could just as equally allow incorrect inputs to be processed and stored, creating a data cleanup nightmare later.
That sounds like the modern JavaScript toolchain. During development, run a linter to make sure you're not using error-prone or outdated functionality, run a type checker to make sure you're passing sane data around, but when running in the browser, don't blow up in the user's face.
This is what we do, but we warn rather than just swallow it. I find it the best of both.
that's the idea behind assert in C/C++.
if it accepts a list of items [], but only a single string is provided "", accept it
What is preferred by API consumers?
The author's opinion in TFA is "strict". That's my preference as well, for the same reasons they state.
obligatory link to postels law[1] and criticism
https://en.wikipedia.org/wiki/Robustness_principle
Suppose it's the early 1990's and you're James Gosling implementing String.substring(int, int) for the first time. What should happen when the index arguments are out-of-range? Should these tests pass? Or throw?
It depends entirely on what the rules of the API are. Function signatures in most languages lack any form of compiler enforcement of rules, so you must implement them in code, and then list the rules in the function's description. The strictness you apply doesn't matter as much as your description of what argument range is allowed, and how the behaviour is affected.
For example, substring could allow overshoot with the description "If the substring would go beyond the end of the string, the remainder of the string is returned".
What you should be concentrating on is the 80% use case of your API. What will 80% of your users need? If the lack of length overshoot support would be cumbersome to the 80%, you support overshoot. If it's useless to the 80%, you leave it out. You can also implement things as layered APIs, with more general lower level functions, and then higher level functions that are more strict. Then the 20% can use the lower level functions for their esoteric use cases, and the 80% can stick to your easy-to-use and hard-to-screw-up high level API.