Why Typing Erlang is Hard: Standard Erlang

Author: leostera

Score: 125

Comments: 43

Date: 2020-11-06 12:10:33

Web Link

________________________________________________________________________________

dnautics wrote at 2020-11-06 15:38:30:

I'm currently actively working on a typechecking system for elixir (three packages:

https://github.com/ityonemo/selectrix

for compile-time interface, ../mavis for type logic, and ../mavis_inference for performing type inference on compiled code).

The chief difficulties I am running into are:

1) How to do intersections on functions and maps (this is in the article):

f(integer) -> integer is not usable as f(any) -> integer

2) not all of the erlang bytecode instructions are well documented

3) I still haven't figured out how to deal with circular dependencies at the module level.

Erlang makes this easy in these ways:

1) compiled module structure is simple and the way branching works makes the mechanics of type inference trivial.

2) erlangs concurrency makes dealing with, for example in-module dependencies trivial

3) there are only a handful of types in erlang so, aside from functions, their subtyping relationships are very straightforward.

Elixir makes this easy in these ways:

1) Typechecking structs can be more straightforward than records.

2) Writing a composable type logic (mavis) is super easy with elixir's protocols

3) I built (and am refining) a very easy dsl that makes creating and testing logic for the beam opcodes less of a hair-tearing experience:

https://github.com/ityonemo/mavis_inference/blob/main/lib/ty...

FrankBro wrote at 2020-11-06 17:09:38:

I'm going to shill Gradualizer[1] in here as well.

1. I think you have to see typing as helpful information. Like:

-spec add(integer(), integer()) -> integer().

add(A, B) -> A + B.

Technically, nothing prevents add from accepting number() or float(). It's too much specification.

2. I was surprised how much there actually is but also disappointed that a lot of errors were in until OTP 23. Gradualizer also overrides some definitions [2]

3. Haven't encountered a problem with this yet with gradualizer, would need to test.

[1]

https://github.com/josefs/Gradualizer

[2]

https://github.com/josefs/Gradualizer/blob/master/priv/prelu...

dnautics wrote at 2020-11-06 17:27:22:

Yeah, I had thought that development of Gradualizer had halted, glad to see it's not. Part of why I wanted to do Mavis is because I wanted to make it easy for developers to make their own custom overrides and ship it with their libraries, especially for Elixir protocols[0]. still haven't figured out how to architect that, but I have an idea.

> I think you have to see typing as helpful information

Indeed. Selectrix will allow you to chain your inference with fallbacks, so, maybe the first level is a fast cache, second level is checking files for user overrides, third level is dialyzer specs, fourth level is inference.

I think it should also be possible to create an addon module with Mavis that lets you type inbound messages for GenServers, given a set of constraints (all possible calls and casts are present in the module itself as an API)

[0] it's a big problem that in elixir, Enumerable protocol destroys typing information, so I want to make an typing override for Enumerable an "example" typing override system.

FrankBro wrote at 2020-11-06 18:32:43:

I think it did partially stopped. I'm the only one committing right now but some of the original devs will review PRs and are EXTREMELY helpful. Great bunch.

dmix wrote at 2020-11-06 18:51:55:

The sytnax and integration look great here, I like it.

Although not entirely the same Dialyzer always seemed to be strange and quirky in its design choices - in the same way other parts of Erlang syntax can be a bit odd sometimes. The kind of thing I assume inspired Elixir in the first place (although in general I like Erlang as a language, it’s small enough and my complaints are few - it just feels syntactically quirky sometimes).

But I never felt inspired to use Dialyzer largely for these reasons. Never seemed to “fit” the syntax entirely and made it quite ugly IMO.

Which is why I’m happy to see these modern attempts, they look much better.

Good luck with the project. It should be an excellent learning process regardless.

layer8 wrote at 2020-11-06 23:54:36:

> 3) I still haven't figured out how to deal with circular dependencies at the module level.

That is usually dealt with by distinguishing between interface and implementation (cf. dependency inversion principle), which should eliminate the circularity. Disclaimer: I know very little about Erlang and Elixir.

dnautics wrote at 2020-11-07 04:18:41:

Totally different.

This the question that needs resolution:

    f = (x) => (...) ? Foo : g(x)

    g = (x) => (...) ? Bar : f(x)

Obviously the type of the result of f should be Foo | Bar but figuring that out is not trivial because there is a circular dependency.

Joker_vD wrote at 2020-11-06 14:35:41:

Interestingly enough, the article doesn't say anything about propagation of types, which is the crux of the type checking and what makes any typing system useful or useless, obnoxious or helpful. Consider the example from the article:

% a good old list length counter
    count(A=[_|_]) -> count(A, 0).

    count([], C) -> C;
    count([_|T], C) -> count(T, C+1).

What is the type of count/2? Arguably, it's "(list, any | number) -> any | number", which means that count/1 has type "list -> any | number". Well, that's bloody pointless, isn't it? What you probably want is "(list, any) -> any | (list, number) -> number" for count/2, so you could give count/1 its useful type "list -> number", but intersection function types are notoriously difficult to infer or typecheck, even with manual annotations.

dnautics wrote at 2020-11-06 15:47:29:

If you drill down to the beam opcodes it should be possible to infer correctly that the type of count/2 can only take number in the second argument.

Joker_vD wrote at 2020-11-06 16:03:10:

No, count/2 can sometimes take non-numbers in the second argument, behold:

    1> c(test_count).
    {ok,test_count}
    2> test_count:count([], bang).
    bang

yellowapple wrote at 2020-11-06 19:24:01:

I'd imagine that in this context (i.e. one where a 1-arity function initializes a tail-recursive 2-arity function) the 1-arity function is the only one that'd be exposed outside of the module, in which case you can know with reasonable certainty that only things within the current module will ever call it.

If you really do need to export the 2-arity version, you could always do a meaningless addition to force it to typecheck to a number, like so:

      count([], C) -> C+0;
    count([_|T], C) -> count(T, C+1);

Which should then result in the following:

      erl> c(test_count).
    {ok,test_count}
    erl> test_count:count([], bang).
    ** exception error: an error occurred when evaluating an arithmetic expression
         in function  test_count:count/2 (test_count.erl, line 8)

Or alternately, use guards (though in the context of BEAM-level introspection I haven't the slightest idea if this would produce an equivalent check):

      count([],    C) when is_integer(C) -> C;
    count([_|T], C) when is_integer(C) -> count(T, C+1);

Which then gets you a somewhat clearer error message:

      10> c(test_count).
    {ok,test_count}
    11> test_count:count([], bang).
    ** exception error: no function clause matching test_count:count([],bang) (test_count.erl, line 8)

Joker_vD wrote at 2020-11-06 20:26:09:

Yes, if you look at my reply to the sibling comment, you'll see that if count/2 is unexported, TypEr (apparently, that's the proper capitalization) infers the types with non_negative_integer().

In fact, I've just looked at the original TypEr paper [0], and wow, they use exactly this example in the section 6: "One might even jump to the conclusion that success typings are unnecessarily general and, as such, quite useless. However, notice that this success typing succinctly captures all possible applications of length_3 which will not result in a type error. [...] Still, we also find the situation sub-optimal and we will improve on it as explained below", and then they use exactly your proposal, recognizing that non-exported functions are only called from the exported functions, and perform dataflow analysis to constrain the types even further.

[0] Tobias Lindahl and Konstantinos Sagonas, "Practical type inference based on success typings",

https://www.it.uu.se/research/group/hipe/dialyzer/publicatio...

dnautics wrote at 2020-11-06 16:09:54:

Yeah, I mean that it will properly identify that the only time you can do that is when you have an empty list. The final spec will be (in elixir syntax):

      (nonempty_list, number -> number) | ([], any -> any)

It should also be able to correctly infer that the count/1 function must be

      (nonempty_list -> pos_integer) | ([] -> 0)

Joker_vD wrote at 2020-11-06 16:33:38:

I am using an old version of erlang, so typer infers the following types:

        -spec count([any()],_) -> any().
    -spec count([any(),...]) -> any().

If I unexport count/2, it infers

        -spec count([any(),...]) -> non_neg_integer().
    -spec count([any()],non_neg_integer()) -> non_neg_integer().

dnautics wrote at 2020-11-06 16:57:51:

I did say "should be able to" without regard to any of the existing type inference engines =D I'll be sure to make this as explicit test in Mavis Inference, which is the one I'm working on.

https://github.com/ityonemo/mavis_inference/issues/47

Snitch-Thursday wrote at 2020-11-06 15:49:12:

I'm watching Gleam[1] for my 'I want to learn Erlang but dont want to deal with having to do runtime type checks so I wish there was a Statically-Typed Erlang' dreams.

[1]

https://gleam.run/news/gleam-v0.12-and-gleam-otp-v0.1-releas...

mastermind1014 wrote at 2020-11-06 18:17:53:

Check out Caramel [1] as well -

[1]

https://github.com/AbstractMachinesLab/caramel

dnautics wrote at 2020-11-06 16:13:31:

I think that devon estes has a point that it is likely that trying to graft a foreign typesytem onto the BEAM is not a good idea.

I'm glad someone is trying, though.

modernerd wrote at 2020-11-06 19:34:19:

I think the tweet was deleted but it's cached here, for others like me who had not seen Devon's take:

https://webcache.googleusercontent.com/search?q=cache:JkQRku...

flubert wrote at 2020-11-06 23:11:10:

Maybe also add Idris?

https://github.com/chrrasmussen/Idris2-Erlang

noxplode wrote at 2020-11-06 18:38:01:

Just in case anyone here missed or hasn't seen this news: Facebook is working on a statically typed Erlang.

https://twitter.com/robertoaloi/status/1304085677252935681

ketzo wrote at 2020-11-06 19:49:15:

Wow, that's really exciting. I spent some time over this summer working on a toy project in Elixir and _really_ enjoyed the language, but the bigger it got the more I felt the lack of all my familiar TypeScript tools. I look forward to seeing this soon.

dmix wrote at 2020-11-06 18:10:17:

all sorts of pattern matching

Outside of the famous builtin process model/BEAM (plus the subsequent ‘crash first’ style design of programs which adds resiliency) the pattern matching is what makes Erlang one of the best languages and the one thing I wish every new language copied.

You kinda get it in Haskell and Rust with the maybe style pattern matching. Rust in particular makes good use of it.

But it’s one thing I miss the most not writing Erlang/Elixir for a living... so far. It makes functions and data handling into these clean trees instead of conditionals (not to start a language flamewar but boo Golangs conditional obsession which seems to be the polar opposite in many ways for error handling, or at least it used to be I haven’t tried it recently).

There’s a lot to like and learn from in Erlang. Static typing will give it wider adoption IMO. And I’m happy to see so many developers working on it.

skohan wrote at 2020-11-06 14:54:51:

I really want to like Erlang because the priorities and USP's of the language are super compelling for a lot of use-cases. I tried to kick the tires on it a couple years ago, and what I found is that the error messages were just so vexing, and it took a minute to understand what they meant, even in toy examples. Maybe it gets better with time, or there is tooling to help with this, but it seemed a bit behind the times with the standard which has been set by more modern programming languages.

ramchip wrote at 2020-11-06 15:07:44:

It's true it can feel cryptic at times. The good news is improving error messages and ease of use is a major focus area in next year's release. This PR from a few days ago looks like a very nice improvement in that direction:

https://github.com/erlang/otp/pull/2849

the_only_law wrote at 2020-11-06 15:37:28:

Ive had so many projects I've wanted to use erlang with because it seems a nearly perfect choice until I do more research. I really miss things like the bitstring syntax when stuck working in most other things.

cpursley wrote at 2020-11-06 15:03:00:

Give Elixir a spin.

dognotdog wrote at 2020-11-06 15:44:18:

I'd second that, Elixir does away with the sometimes vexing comma and dot syntax used for scoping. Also, it does have mutable variables.

You do have to get to grips with pattern matching and processes as objects being such core concepts, but I found that once I got it, it is wonderful.

macintux wrote at 2020-11-06 16:47:17:

The scoping punctuation is very easy if you think about it. I begrudgingly accept “I don’t like the syntax” as a reason to not use Erlang, but there’s nothing challenging about it. It’s a much smaller syntax than Elixir.

The punctuation rules are very similar to English: comma means you’re not through with your thought yet, semi-colon means you’re introducing a new clause, period means you’re done with the entire thought.

spinningslate wrote at 2020-11-06 20:05:12:

>The punctuation rules are very similar to English: comma means you’re not through with your thought yet, semi-colon means you’re introducing a new clause, period means you’re done with the entire thought.

Agree, at least when looking at code statically. It's a bit of an inconvenience when editing though. For example, re-ordering lines in a function / adding a new line to the end. That leaves a full stop at the end of an intermediate line and/or a semi-colon at the end. Compiling/running that without fixing (assuming you didn't spot it) can lead to cryptic compiler errors.

I find this hits me a surprisingly large number of times. Maybe it's just the way I write code. But it is an annoyance (for me anyway).

To be fair, any decent editor/IDE will highlight the error. None that I'm aware of has an auto-correct though.

It's a minor annoyance in the great scheme of things. Erlang remains one of my favourite languages. Writing REST services as OTP applications on top of Cowboy* is a very rewarding experience and generates rock-solid, highly scalable, production-quality binaries.

*Webmachine was even better though sadly now not actively maintained.

dnautics wrote at 2020-11-06 16:03:23:

Just to clarify: Values in elixir are still immutable.

The core team will insist that variables are immutable but bindings are mutable, but in my mind variables _are_ their bindings (there is not a third category distinct from the value and the lexical binding).

I think there is huge debate as to whether this is good or bad - I think though you really really want to have mutable variables in the REPL. In your static code it's trivial to check for variable rebinding and be sure you're not doing it (credo makes this easy to set up)

sodapopcan wrote at 2020-11-06 22:57:35:

> in my mind variables are their bindings

This feels like flawed thinking to me. I know what you're saying, but claiming that variables are mutable would lead me to believe we could have things like counters. I would say it's best to think of variables as the memory address of their binding. We are allow to rebind the name. That's how I see it, at least.

wtetzner wrote at 2020-11-06 16:54:00:

Variables in Elixir are not mutable, they can just be rebound. Think 'let' in Scheme or ML.

dnautics wrote at 2020-11-06 16:58:32:

> in my mind variables _are_ their bindings (there is not a third category distinct from the value and the lexical binding)

wtetzner wrote at 2020-11-06 17:28:51:

I was responding to this:

> The core team will insist that variables are immutable but bindings are mutable

The documentation on the Elixir website refers to it as rebinding, and I don't see any references to bindings being considered mutable.

Rebinding means you've created a _new_ binding with the same name. The original isn't being mutated, it's just not accessible anymore because of the scoping rules.

dnautics wrote at 2020-11-06 18:14:06:

My point is I don't know what that means. The variable _is the binding_ changing the binding by throwing away the old one is not distinct from mutating it because the binding "isn't a real thing, just conceptual", just as the variable isn't "a real thing"; unlike a value which is actually a thing sitting in a memory slot in your vm. If you can give me an example of where there is a meaningful distinction between the variable and its lexical binding I am happy to be enlightened.

wtetzner wrote at 2020-11-07 02:15:37:

In a scenario where you're just rebinding, you can't do this, for example (pseudo code):

                foo = 12
    for x in xs {
      foo = x + foo
    }

It would _always_ create a new foo binding in the for loop with value (12 + x);

I think using Lisp syntax might help clarify, because it makes the binding scope explicit:

                (let [foo 15]
      (println foo) ; prints 15
      (let [foo 12]
        ; prints 12
        (println foo)
      ; prints 15
      (println foo)))

> unlike a value which is actually a thing sitting in a memory slot in your vm.

A mutable variable _is_ a memory slot that can be mutated, though.

lfischer wrote at 2020-11-06 21:00:25:

I believe an example of where binding differs from assignment is if you define a function which references a binding in the outer scope, the create a new binding with the same name in the outer scope, the value inside the function does not change:

a = 1

f = fn -> a end

f.() # returns 1

a = 2

f.() # still returns 1

bcrosby95 wrote at 2020-11-06 16:30:49:

If your problem is errors then Elixir can still be pretty confusing, especially with respect to things like GenServer.

sharno wrote at 2020-11-06 17:36:47:

WhatsApp is supposed to release an Erlang typechecker this month, I thinking it’d be good to see how they handle this

Ndymium wrote at 2020-11-06 20:39:14:

It has been delayed:

https://elixirforum.com/t/facebook-is-writing-a-new-statical...

FrankBro wrote at 2020-11-06 17:02:16:

I replied to your post on twitter but I highly recommend Gradualizer[1].

Discovered it last year and started using it. The refinement portion handles unions and guards very well. It's not perfect but I'm integrating it in our codebase with some interesting successes. I'm hoping more people use it on their codebase and can find any bugs they have with it so it can be improved further.

[1]

https://github.com/josefs/Gradualizer

huseyinkeles wrote at 2020-11-06 15:33:39:

I'd recommend anyone who have troubles with outdatedness (?) of the Erlang, to try Elixir. It runs on Erlang virtual machine, very modern and pleasure to code in.