Strict APIs vs. Forgiving APIs

Author: ajabhish

Score: 53

Comments: 47

Date: 2020-10-28 14:14:00

________________________________________________________________________________

misterdata wrote at 2020-10-28 16:23:38:

Having used both stricter (Rust) and more loose programming languages (JS), I can add that stricter APIs have made me think more about edge cases than the less-strict APIs. In the less-strict languages I often find myself debugging some edge case, whereas in the stricter languages I am more often required to think about edge cases up front (including those that I know will not be relevant, so in some cases this causes more work than necessary!). Still I much prefer the stricter APIs.

Karellen wrote at 2020-10-28 17:01:33:

I like Rusty Russell's classic "Hard to Misuse Interface Levels". The first couple of items are

0: Impossible to get wrong/DWIM

1: Compiler/linker won't let you get it wrong

...through

9: Read the correct LKML thread and you'll get it right - SET_MODULE_OWNER

...all the way up to 17. I'll let you discover the rest yourself :-)

https://ozlabs.org/~rusty/ols-2003-keynote/img39.html

an_d_rew wrote at 2020-10-29 05:14:57:

pure gold

Joker_vD wrote at 2020-10-28 16:04:37:

I think that's a trick question. If "substring()" has "substring(int start, int end)" semantics, then I prefer the strict version: it must be 0 <= start < len(s), start <= end <= len(s), otherwise it'll throw an exception.

But if the semantics is "substring(int start, int length)", then I prefer the _partial_ forgiveness _for the length_ parameter: if start + length > len(s), then assume length was actually len(s) - start; but if length < 0, still throw an exception.

gruez wrote at 2020-10-28 16:26:09:

Why should the out of bounds behavior be different depending on the semantics of the second parameter?

masklinn wrote at 2020-10-28 16:47:04:

Because if you want to slice to the end you can just ask for the source's length which is free in most languages (it's not in C or in list-based languages), whereas the "result length" version is either going to have to be computed from the collection length or it's a _constraint_. In the latter case especially it's annoying to have to do weird computations on the basis of length in order to clamp your existing constraint in case the result would already fit.

misterdata wrote at 2020-10-28 16:26:32:

Only if the stricter one really requires you to use it differently (e.g. because of checked exceptions). If not you will have to guess for each function if it is strict or not, which potentially causes even more trouble if you get it wrong.

Joker_vD wrote at 2020-10-29 00:56:27:

Well, that's one of the reasons why I sometimes myself writing unit tests for the 3rd-party library functions: did I read the docs right? What about some obvious yet still undocumented edge cases? With particularly convoluted APIs those tests sometimes actually end staying in the project's tree.

aimor wrote at 2020-10-28 17:15:00:

In my experience forgiving APIs don't play well together. I've inadvertently tried it and wind up having to make everything 'forgiving in the same way'. Otherwise out-of-range inputs mean different things to different functions, and using multiple APIs requires tracking all the forgiving behavior.

z3t4 wrote at 2020-10-28 18:00:05:

Its not always binary. You want to throw an error at even the slight amount of error. This will annoy users, but over time as bugs and edge cases are fixed - it will lead to a very robust program.

Meanwhile you want to save time/work for the caller. You dont want the caller to need many lines of boilerplate just to setup the call.

thamer wrote at 2020-10-28 18:41:00:

For reference, this is what String.substring(int, int) does[1] in Java:

public String substring(int beginIndex, int endIndex) {
        int length = length();
        checkBoundsBeginEnd(beginIndex, endIndex, length);
        ...

Where checkBoundsBeginEnd[2] does:

static void checkBoundsBeginEnd(int begin, int end, int length) {
        if (begin < 0 || begin > end || end > length) {
            throw new StringIndexOutOfBoundsException(
                "begin " + begin + ", end " + end + ", length " + length);
        }
    }

(as it should, in my opinion).

[1]

https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/f0ef2826d...

[2]

https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/f0ef2826d...

madmax108 wrote at 2020-10-28 17:10:28:

As someone who writes APIs for a living, this is often something interesting that I find a lot of division on:

Say you have an API that is documented to accept params A,B or C i.e /v1/api?A=1&B=2&C=3

What should happen if you pass a param D to it? i.e. /v1/api?A=1&B=2&C=3&D=4

The two most common schools of thought are:

1) Ignore D

2) Throw an error

Both present their own problems, esp. when D may be closely related to A,B,C. Interesting how API design also tends to side with personal preferences for strictness or leniency

rvnx wrote at 2020-10-28 17:30:31:

In second case you may break other people's software because of your expectations on how the world should work.

For example, Twitter pages not working if there is a fbclid parameter.

selfhoster11 wrote at 2020-10-29 14:03:51:

FBCLID and UTM tags are their own kind of craziness. They shouldn't have been present in the first place.

taylodl wrote at 2020-10-28 18:04:29:

What's the old adage - be lenient in what you accept, be strict in what you produce? That ensures you can operate in a wide variety of scenarios with little to no interaction and coordination with the clients.

jcranmer wrote at 2020-10-28 19:48:59:

What if D is "always-use-utf8"? Or, in general, the option being requested is something that places much more stringent restrictions on semantics.

There are two main reasons that users might validly use unknown options. The first is that it's an option with valid semantics in different versions/implementations of the protocol. The other is that you're reusing an option block for many API calls, and some of the options may not be relevant for some of those APIs. For the first use case, erroring out (or at least warning) is usually the superior solution: you don't know what it means, so you can't guarantee that you'll implement it correctly. For the latter use case, ignoring can be a safe solution.

FWIW, there's been a pushback against Postel's Law (which is what the adage you cite is usually called) in more recent times. In particular, it should be emphasized that the law is generally most applicable when you're dealing with multiple interpretations of an ambiguous specification, and is least applicable when the standards are prescribing or proscribing particular behavior.

wvenable wrote at 2020-10-28 19:06:49:

The strict vs. lenient in this case also comes down to extensibility. HTML for example is notoriously forgiving and HTML parsers are quite complex because of it but it allowed HTML to be extended and improved upon while maintaining backwards compatibility.

If the first web browser was extremely strict on the tags and structure it supported then web could not have evolved that way that it did.

postalrat wrote at 2020-10-28 18:41:56:

This may be unrelated but every time I see a timestamp as a string in json I wonder why. There is so much less to go wrong with a numeric unix timestamp yet 9 times out of 10 a developer will use a string.

If you are going to make things strict at least make it hard to mess up.

shhsshs wrote at 2020-10-28 23:37:32:

My 2 cents... For the vast majority of the world where legacy (or at least not-programmed-using-super-modern-standards) systems are still in play, clarity is the most important factor in software. What happens when a legacy system is actually returning the number of _seconds_ since epoch versus _milliseconds_? What if they're returning ticks (looking at you .NET...) because nobody realized ticks are inconsistent across different systems at the time the system was written?

I don't think you can get any more clear than an ISO timestamp (WITH time zone). Unix timestamps are great and efficient, but I personally value the absolute clarity of ISO above all else.

genidoi wrote at 2020-10-28 18:47:56:

Unix timestamps aren't human readable

postalrat wrote at 2020-10-28 21:18:46:

Use better tools.

mewpmewp2 wrote at 2020-10-29 08:32:37:

What do you use when you debug XHR in chrome?

Or what do you use to tail logs?

m463 wrote at 2020-10-28 18:49:54:

most logfiles beg to differ. (bleh!)

they're also... unix ... timestamps.

bartvk wrote at 2020-10-28 19:25:49:

An ISO date with timezone can be useful, if you would like to know the timezone when a particular bit of data was recorded.

crote wrote at 2020-10-28 19:32:11:

If you mean ISO 8601 and its "2007-04-05T12:30-02:00" timezone thingy, not really.

It doesn't store a timezone, but a UTC offset. Which is a problem because the UTC offset of a timezone may change.

This happens rarely enough that people do indeed use "ISO 8601 + offset" to store "timezoned" timestamps, but often enough that it'll probably end up corrupting your data sooner or later.

Joker_vD wrote at 2020-10-28 21:45:26:

Well, usually what you want is actully the UTC offset, not the timezone, because that's what you need to get the actual-wall-clock timestamp of the event. If you all you have is a timzeone, then you need an accurate historical tzdata _and_ also you need to be sure that the system had accurate tzdata itself when it produced the timestamp.

Of course, if you're storing the timestamp of a _future, planned_ event, then yes, you need a timezone. But for historical records, store the UTC offset that was in effect when the event happened.

remus wrote at 2020-10-28 19:00:43:

2038 called, they'd like to talk.

postalrat wrote at 2020-10-28 21:23:09:

Javascript numbers (as in JSON) can safely represent integers much higher than a 32 bit integer.

Joker_vD wrote at 2020-10-28 21:50:43:

Ha! Almost. You can get slight rounding errors at nanosecond resolutions, so be aware of this if you're using Go's time.Time internally and convert it to/from float64 seconds since Unix epoch.

megous wrote at 2020-10-28 16:00:23:

Something I read in the RFC for email message format (or maybe elsewhere, but I have it associated with email for some reason) is: to be strict in output that you produce, and lenient in the input you accept.

I think it makes some sense for data formats, if you want max compatibility of independently developed software that processes the data. Not sure about APIs though. Having them fail fast on unexpected input is pretty valuable.

chrismorgan wrote at 2020-10-28 16:48:54:

That’s Postel’s robustness principle. Many people now believe it was a harmful idea that actually achieves the _opposite_ of what it was intended to achieve.

https://tools.ietf.org/html/draft-iab-protocol-maintenance-0...

expounds. (I confess I rather liked the earlier name of the draft, postel-was-wrong.)

greggman3 wrote at 2020-10-28 19:11:38:

Plenty of people have also argued that being strict on input would be bad

https://web.archive.org/web/20060613193727/http://diveintoma...

https://friendlybit.com/html/why-xhtml-is-a-bad-idea/

chrismorgan wrote at 2020-10-28 19:46:10:

Those arguments are inapplicable here. They’re arguing against XHTML’s _particular form_ of strictness and its failure mode, when compared with HTML, given that both are available to you the web developer; they are not arguing against strictness itself. (Also: if browsers had _always_ rejected invalid documents, rather than having the HTML/XHTML split where one was liberal and one strict, I think we’d have had fewer problems, especially around injection bugs, and people would have been more careful about how they wrote their documents; but that’s fairly subjective.)

If you can choose whether your platform supports strict or forgiving behaviour, security interests will side with strict every time.

And indeed, HTML is no longer loose but rather strict in this protocol sense that we’re talking about, as it defines how _all_ inputs should be parsed, leaving no scope for being liberal in what you accept. (It may surprise people, but among specs of at least moderate complexity, HTML is by far the strictest out there that I know of. I wish more specs were as strict. Actually, JavaScript is probably fairly close.)

Also from the links of dissenting opinions at the start of the _dive into mark_ link,

https://web.archive.org/web/20060616150034/http://bitworking...

has a good discussion of just where Postel’s law may seem most reasonable to be applicable and inapplicable, with its two-axis (text–binary, data–language) diagram. It’s worth reading and contemplating.

showerst wrote at 2020-10-28 16:03:01:

I used to be a big fan of 'strict send, lenient accept', but looking at systems built on it -- (html, email), I think being lenient around format deviation just turns everything into a big mess.

Granted those are two of the most successful technologies ever, so perhaps it was the right call =).

megous wrote at 2020-10-28 19:10:32:

But html and email are also incredibly successful... so hmm?

tynorf wrote at 2020-10-28 16:08:02:

I believe being lenient in accepting input is what leads to SSRF attacks (HTTP request smuggling via disagreeing `transfer-encoding` and `content-length` headers).

CharlesW wrote at 2020-10-28 15:57:58:

It seems like "forgiving with warnings" should be another option listed here. This also lets API creators do neat things like warn people ahead of time of future deprecations, etc.

dicroce wrote at 2020-10-28 15:53:36:

If I had to choose I'd say prefer strict... But reading this made me wonder: what if you could be strict during development but switch to forgiving for deployment?

avianlyric wrote at 2020-10-28 17:32:05:

> what if you could be strict during development but switch to forgiving for deployment?

That sounds like a debugging nightmare, where your production system is subtlety different in behaviour to your dev system.

Also how do you know that the forgiving behaviour is correct in production? Maybe it prevents a couple of scary error messages, but it could just as equally allow incorrect inputs to be processed and stored, creating a data cleanup nightmare later.

Vinnl wrote at 2020-10-28 16:44:37:

That sounds like the modern JavaScript toolchain. During development, run a linter to make sure you're not using error-prone or outdated functionality, run a type checker to make sure you're passing sane data around, but when running in the browser, don't blow up in the user's face.

4thwaywastrel wrote at 2020-10-29 04:09:49:

This is what we do, but we warn rather than just swallow it. I find it the best of both.

gpderetta wrote at 2020-10-28 17:35:29:

that's the idea behind assert in C/C++.

RocketSyntax wrote at 2020-10-28 18:17:44:

if it accepts a list of items [], but only a single string is provided "", accept it

ajabhish wrote at 2020-10-28 14:14:00:

What is preferred by API consumers?

CharlesW wrote at 2020-10-28 15:54:57:

The author's opinion in TFA is "strict". That's my preference as well, for the same reasons they state.

m463 wrote at 2020-10-28 18:52:12:

obligatory link to postels law[1] and criticism

https://en.wikipedia.org/wiki/Robustness_principle

kstenerud wrote at 2020-10-28 16:24:27:

Suppose it's the early 1990's and you're James Gosling implementing String.substring(int, int) for the first time. What should happen when the index arguments are out-of-range? Should these tests pass? Or throw?

It depends entirely on what the rules of the API are. Function signatures in most languages lack any form of compiler enforcement of rules, so you must implement them in code, and then list the rules in the function's description. The strictness you apply doesn't matter as much as your description of what argument range is allowed, and how the behaviour is affected.

For example, substring could allow overshoot with the description "If the substring would go beyond the end of the string, the remainder of the string is returned".

What you should be concentrating on is the 80% use case of your API. What will 80% of your users need? If the lack of length overshoot support would be cumbersome to the 80%, you support overshoot. If it's useless to the 80%, you leave it out. You can also implement things as layered APIs, with more general lower level functions, and then higher level functions that are more strict. Then the 20% can use the lower level functions for their esoteric use cases, and the 80% can stick to your easy-to-use and hard-to-screw-up high level API.