💾 Archived View for dmerej.info › blog › 0071-giving-mypy-a-go.gmi captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-11-30)

➡️ Next capture (2022-01-08)

-=-=-=-=-=-=-

2018, Jun 11 - Dimitri Merejkowsky
License: CC By 4.0

Introduction

This is a follow up to the I don't need types[1] article.

1: I don't need types

I left a teaser explaining I'll be giving concrete examples using a Python project, so here goes.

As you may have guessed, I'm going to talk about type annotations and the mypy[2] static type checker.

2: http://mypy-lang.org/

How mypy works

Here's a quick overview of how mypy works, in case you are not already familiar with it.

Let's use a rather contrived example:

def is_odd(num):
    return num % 2 == 0


if is_odd("this sentence contains %s"):
    print("ok")

We have a function `is_odd`, which obviously only works with numbers, and we call it with a string.

But since the string contains `%s`, Python will happily assume we are trying to format it, and the bug will go unnoticed. `is_odd` will simply return False, because we are *also* allowed to compare strings and numbers with `==` in Python.

Without type annotations, mypy detects nothing:

$ mypy foo.py
<nothing>

But if we add type annotations and re-run mypy, we do get an error message:

def is_odd(num: int) -> bool:
    return num % 2 == 0


if is_odd("this sentence contains %s"):
    print("ok")

$ mypy foo.py
error: Argument 1 to "is_odd" has incompatible type "str"; expected "int"

Thus, you can use mypy in a loop:

This is called "gradual typing".

Designing an experiment

The goal of the experiment is to check if going through the trouble of adding type annotations is worth it.

Some definitions

Let me define what I mean by "worth it".

There are many things said about types annotations like:

Those may be true, but they are hard to measure.

So I asked myself something else:

I know the second question is a bit subjective. I'm still going to use this metric because it's one that really matters to me.

Choosing a project

To test this hypothesis, I needed an existing project. I used tsrc[3], the command-line tool we use at work[4] to manage multiple git repositories and automate GitHub and GitLab review automation.

3: https://github.com/dmerejkowsky/tsrc

4: https://tanker.io

I chose it because:

The protocol

So here is the protocol I used:

Before I continue, I should tell you I used mypy with three important options:

Looking at patches

Let's look at trivial patches first:

Truthy string

def find_workspace_path() -> Path:
    head = os.getcwd()
     tail = True
     while tail:
         tsrc_path = os.path.join(head, ".tsrc")
         if os.path.isdir(tsrc_path):
             return Path(head)
         else:
             head, tail = os.path.split(head)
     raise tsrc.Error("Could not find current workspace")

This function starts by checking there is a `.tsrc` hidden directory in the working path. If not, it goes through every parent directory and check if the `.tsrc` directory is here. If it reaches the root of the file system (the second value of `os.path.split` is None), it raises an exception.

mypy did not like that `tail` started as a boolean, and then was assigned to a string.

We can fix that by using a non-empty string, with a value that reveals the intention:

- tail = True
+ tail = "a truthy string"
     while tail:

An other way would be to use a type like `Union[bool,str]` but that would be more confusing I think.

Anyway, I'm not sure the quality of the code improved there. No point for mypy.

save_config

class Options:
    def __init__(self, url, shallow=False) -> None:
        self.shallow: bool = shallow
        self.url: str = url


def save_config(options: Options):
    config = dict()
    config["url"] = options.url
    config["shallow"] = options.shallow

    with self.cfg_path.open("w") as fp:
        ruamel.yaml.dump(config, fp)

We are using `save_config` to serialize a "value object" (the `Options` class) into a YAML file.

mypy saw the first two lines, `config = dict()`, `config["url"] = options.ul` and wrongly deduced that `config` was a dict from strings to strings.

Then it complained about `config["shallow"]` that was assigned to a boolean.

We can fix that by forcing the `config` type to be a dict from string to `Any`: `Any` is a "magic" type we can use precisely for this kind of situation[5].

5: http://mypy.readthedocs.io/en/latest/kinds_of_types.html#the-any-type

- config = dict()
+ config: Dict[str, Any] = dict()

This a bit annoying, but the type annotation makes it clearer what the `config` is. 1 point for mypy.

Encoding project names

When you use the GitLab API, you often have to use the 'project id'. The doc says you can use the `<namespace>/<project_name>` string if it is "url encoded", like this:

Get info about the Foo project in the FooFighters namespace:
GET  /api/v4/projects/FooFighters%2Foo

In python, the naive approach does not work:

import urllib.parse

>>> urllib.parse.quote("FooFighters/Foo")
FooFighters/Foo

Instead you have to specify a list of "safe" characters, that is characters that you *don't* need to encode.

The doc says the default value is "/".

Thus, here's what we use in `tsrc.gitlab`:

project_id = "%s/%s" % (namespace, project)
encoded_project_id = urllib.parse.quote(project_name, safe=list())

mypy saw that the default value was a string, so he complained we were using a list instead. We fixed it by:

-        encoded_project_name = urllib.parse.quote(project_name, safe=list())
+        encoded_project_name = urllib.parse.quote(project_name, safe="")

Here, mypy forced us to follow an implicit convention. There are two ways to represent a list of characters in Python. A real list: `['a', 'b', 'c']`, or a string: `"abc"`. The authors of the `urllib.quote()` function decided to use the second form, so it's a good thing we follow this convention too.

An other win for mypy.

Bugs found

We already use a lot of tools in the hope of catching bugs:

Despite of this, mypy found bugs that tests, all the linters and all the reviewers did not find.

Here are a few of them. Feel free to try and find the bug yourself, the answer will be given after the ⁂ symbol.

handle_stream_errors

class GitLabAPIError(GitLabError):
    def __init__(self, url: str, status_code: int, message: str) -> None:
    ...

def handle_stream_errors(response: requests.models.Response) -> None:
     if response.status_code >= 400:
        raise GitLabAPIError(
            response.url, "Incorrect status code:", response.status_code)

This code wraps calls to the GitLab API, and arrange for a particular `GitLabAPIError` to be raised, when we make a "stream" request. (For instance, to download a GitLab CI artifact):

<center>⁂</center>

The problem here is that we inverted the `status_code` and `message` parameter. Easy to fix:

-        raise GitLabAPIError(
-           response.url, "Incorrect status code:", response.status_code)
+        raise GitLabAPIError(
+           response.url, response.status_code, "Incorrect status code")

The bug was not caught because the code in question was actually copy/pasted from a CI script (and you usually don't write tests for CI scripts).

We actually don't need streamed responses anywhere in tsrc, so this is in fact dead code (and it is now gone from the code base)

handle_json_errors

def handle_json_errors(response: requests.models.Response):

    try:
         json_details = response.json()
     except ValueError:
     json_details["error"] = (
         "Expecting json result, got %s" % response.text
     )

    ...

    url = response.url
    if 400 <= status_code < 500:
        for key in ["error", "message"]:
            if key in json_details:
                raise GitLabAPIError(url, status_code, json_details[key])
         raise GitLabAPIError(url, status_code, json_details)
    if status_code >= 500:
        raise GitLabAPIError(url, status_code, response.text)

This one is slightly more interesting. It is located near the previous one and handles errors for the calls to the GitLab API which returns JSON objects.

We of course have to catch 500 errors, which hopefully do not happen often.

In case of a status code between 400 and 499, we know there was a problem in the request we made, but we need to tell the user why the request was rejected.

Most of the time the GitLab API returns a JSON object containing a `error` or `message` key, but sometimes neither key is found in the returned object, and sometimes the text of the response is not even valid JSON code.

So we have to check for both keys in the JSON object, and if not found (if we exit the for loop), just store the entire JSON response in the exception.

<center>⁂</center>

The bug is in the second `raise GitLabAPIError`. We are passing an entire object where the `GitLabAPIError` expected a string.

The fix was:

- raise GitLabAPIError(url, status_code, json_details)
+ raise GitLabAPIError(url, status_code, json.dumps(json_details, indent=2))

Again, this was hard to catch with tests. The case where the JSON returned by GitLab did *not* contain a `error` or `message` key only happened once in the lifetime of the project (which explain why the code was written), so manual QA and unit tests did not need to check this code path.

Anyway, note we did not blindly wrote something like `str(json_details)` to convert the JSON object to a string. We found out it was used in a message displayed to the end user, thus we use `json.dumps(json_details), indent=2)` to make sure the message contains neatly indented JSON and is easy to read.

LocalManifest

class LocalManifest:
    def __init__(self, workspace_path: Path) -> None:
        hidden_path = workspace_path.joinpath(".tsrc")
        self.clone_path = hidden_path.joinpath("manifest")
        self.cfg_path = hidden_path.joinpath("manifest.yml")
        self.manifest: Optional[tsrc.manifest.Manifest] = None

    def update(self) -> None:
        ui.info_2("Updating manifest")
        tsrc.git.run_git(self.clone_path, "fetch")
        tsrc.git.run_git(self.clone_path, "reset", "--hard", branch)

    def load(self) -> None:
        yml_path = self.clone_path.joinpath("manifest.yml")
        if not yml_path.exists():
            message = "No manifest found in {}. Did you run `tsrc init` ?"
            raise tsrc.Error(message.format(yml_path))
        self.manifest = tsrc.manifest.load(yml_path)

    @property
    def copyfiles(self) -> List[Tuple[str, str]]:
        return self.manifest.copyfiles

    def get_repos(self) -> List[tsrc.Repo]:
        assert self.manifest, "manifest is empty. Did you call load()?"
        return self.manifest.get_repos(groups=self.active_groups)

After you run `tsrc init git@example.com/manifest.git`, the manifest is cloned inside `<workspace>/.tsrc/manifest`.

Thus, the contents of the `manifest.yml` is in `<workspace>/.tsrc/manifest/manifest.yml>`.

The `LocalManifest` class represent this manifest repository.

Here's what happens when you run `tsrc sync`:

<center>⁂</center>

It's not really a bug, but mypy forced us to acknowledge that `LocalManifest.manifest` starts as None, and only gets its real value after `.load()` has been called.

We already have an `assert` in place in `get_repos()`, but mypy forced us to add a similar check in the `copyfiles` getter:

    @property
    def copyfiles(self) -> List[Tuple[str, str]]:
+       assert self.manifest, "manifest is empty. Did you call load()?"
        return self.manifest.copyfiles

GitServer

Some of the tests for tsrc are what we call end-to-end tests:

Here's how they work:

Thus, we don't have to mock file systems or git commands (which is doable but pretty hard), and things are easy to debug because in case of problem we can just `cd` to the test directory and inspect the state of the git repositories by hand.

Any way, those tests are written with a test helper called `GitServer`.

You can use this class to create git repositories, push files on some branches, and so on:

Here's what the helper looks like:

    def add_repo(self, name: str, default_branch="master") -> str:
        ...
        url = self.get_url(name)
        return url

    def push_file(self, name: str, file_path: str, *,
                  contents="", message="") -> None:
        ...
        full_path = ...
        full_path.parent.makedirs_p()
        full_path.touch()
        if contents:
            full_path.write_text(contents)
        commit_message = message or ("Create/Update %s" % file_path)
        run_git("add", file_path)
        run_git("commit", "--message", commit_message)
        run_git("push", "origin", "--set-upstream", branch)

    def tag(self, name: str, tag_name: str) -> None:
        run_git("tag", tag_name)
        run_git("push", "origin", tag_name)

    def get_tags(self, name: str) -> List[str]:
         src_path = self.get_path(name)
         rc, out = tsrc.git.run_git_captured(src_path, "tag", "--list")
         return out

You can use it like this:

def test_tsrc_sync(tsrc_cli: CLI, git_server: GitServer) -> None:
    git_server.add_repo("foo/bar")
    git_server.add_repo("spam/eggs")
    manifest_url = git_server.manifest_url
    tsrc_cli.run("init", manifest_url)
    git_server.push_file("foo/bar", "bar.txt", contents="this is bar")

    tsrc_cli.run("sync")

    bar_txt_path = workspace_path.joinpath("foo", "bar", "bar.txt")
    assert bar_txt_path.text() == "this is bar"

<center>⁂</center>

The bug is in `get_tags`:

    def get_tags(self):
         rc, out = tsrc.git.run_git(src_path, "tag", raises=False)
-        return out
+        return out.splitlines()

The `get_tags` methods is also dead code. It has an interesting story.

The need for the `GitServer` helper occurred when I started writing end-to-end tests for tsrc and discovered I needed a test framework to write those end-to-end test.

It was vital that the `GitServer` implementation uses clean code.

So to have a clean implementation of the `GitServer` I of course used the best technique I know of: TDD.

yo dawg tests [IMG]

You can find them in the [test_git_server.py](https://github.com/dmerejkowsky/tsrc/blob/main/tsrc/test/helpers/test_git_server.py[6] file.

6: https://github.com/dmerejkowsky/tsrc/blob/main/tsrc/test/helpers/test_git_server.py

Anyway, I was writing an end-to-end test for tsrc that involved tags.

I thought: "OK, I need a `.tag()` method in `GitServer`. So I also need a `get_tags()` method to check the tag was actually pushed". Thus I wrote the `get_tags` method, forgetting to write a failing test for `GitServer` first (that still happens after 5 years of TDD, so don't worry if it happens to you too.). At that point I only had my end-to-end test failing, so I made it pass and completely forgot about the `get_tags()` method. Oh well.

Executors and Tasks

In the implementation of tsrc we often have to loop over a list of items, perform actions an each of them and record which one failed, and when the loop is done, display the error messages to the user.

For instance:

tsrc sync

remote: Counting objects: 3, done.
...
Updating 62a5d28..39eb3bd
error: The following untracked files would be overwritten by merge:
    bar.txt
Please move or remove them before you merge.
Aborting

Already up to date.
Error: Synchronize workspace failed

Or:

$tsrc foreach
:: Running `ls foo` on every repo

$ ls foo
bar.txt

$ ls foo
ls: cannot access 'foo': No such file or directory
Error: Running `ls foo` on every repo failed

So in order to keep things <abbr title="Don't Repeat Yourself">DRY</abbr>, we have some high-level code that only deals with loop and error handling:

First, a generic `Task` method.

(Note that you can have your own generic types[7] with mypy. This is *awesome* because without it you get laughed at by C++ programmers)

7: https://mypy.readthedocs.io/en/latest/generics.html


class Task(Generic[T], metaclass=abc.ABCMeta):

    @abc.abstractmethod
    def description(self) -> str:
        pass

    @abc.abstractmethod
    def display_item(self, item: T) -> str:
        pass

    @abc.abstractmethod
    def process(self, item: T) -> None:
        pass

And then, a generic `SequentialExecutor` who knows how to execute a given task on a list of items and display the outcome:

class SequentialExecutor(Generic[T]):
    def __init__(self, task: Task[T]) -> None:
        self.task = task
        self.errors: List[Tuple[Any, tsrc.Error]] = list()

    def process(self, items: List[T]) -> None:
        if not items:
            return True
        ui.info_1(self.task.description())

        self.errors = list()
        for item in enumerate(items):
            ...
            self.process_one(item)

        if self.errors:
            self.handle_errors()

    def process_one(self, item: T) -> None:
        try:
            self.task.process(item)
        except tsrc.Error as error:
            self.errors.append((item, error))

    def handle_errors(self) -> None:
        # Display nice error message
        ...
        raise ExecutorFailed()

Thus we can inherit from `Task` to implement `tsrc sync`:

class Syncer(tsrc.executor.Task[Repo]):

    def process(self, repo: Repo) -> None:
        ui.info(repo.src)
        self.fetch(repo)
        self.check_branch(repo)
        self.sync_repo_to_branch(repo)

    def check_branch(self, repo: Repo):
        current_branch = tsrc.git.get_current_branch(repo_path)
        if not current_branch:
            raise tsrc.Error("Not on any branch")

Ditto for `tsrc foreach`:

class CmdRunner(tsrc.executor.Task[Repo]):

    def process(self, repo: Repo) -> None:
        full_path = self.workspace.joinpath(repo.src)
        rc = subprocess.call(cmd, ...)
        if rc != 0:
            raise CommandFailed

Did you spot the bug?

<center>⁂</center>

The bug is in here:

    def process(self, items: List[T]) -> None:
         if not items:
             return True

This is the result of a bad refactoring. `Executor` used to track success of task by looking at the return value the `process()` method.

After a while, I found out it was clearer to just use exceptions for that, mostly when I implemented the `Syncer` class. (You can just raise an exception instead of adding lots of `ifs`).

But the early `return True` was left. Here `mypy` found something that would have puzzled an external reader. Almost everything function or method dealing with Executors and Tasks in tsrc either return None or raise Exception. What is that boolean doing here?

There were of course no tests left that checked the return value of `process` (they got refactored at the same time of the rest of the code), so the bug went unnoticed.

run_git

The last changed required by mypy was also pretty interesting. Here's the deal.

In tsrc, we often need to run `git` commands, but we can run them in two very different ways:

Here's what the implementation looked like:

def run_git(working_path, *cmd, raises=True):
    """ Run git `cmd` in given `working_path`

    If `raises` is True and git return code is non zero, raise
    an exception. Otherwise, return a tuple (returncode, out)

    """
    git_cmd = list(cmd)
    git_cmd.insert(0, "git")
    options = dict()
    if not raises:
        options["stdout"] = subprocess.PIPE
        options["stderr"] = subprocess.STDOUT

    process = subprocess.Popen(git_cmd, cwd=working_path, **options)

    if raises:
        process.wait()
    else:
        out, _ = process.communicate()
        out = out.decode("utf-8")

    returncode = process.returncode
    if raises:
        if returncode != 0:
            raise GitCommandError(working_path, cmd)
    else:
        return returncode, out

And here's how to use it:

# run `git fetch`, leaving the output as-is
run_git(foo_path, "fetch")

# run git remote get-url origin
rc, out = run_git(foo_path, "remote", "get-url", "origin", raises=False):
if rc == 0:
    # Handle the case where there is no remote called 'origin'
    ...
else:
    # Do something with out

Here's an other place we used `run_git`: we have a command named `tsrc status`, which displays a summary of the whole workspace. Here what's its output looks like:

tsrc status

And here is the implementation:

class GitStatus:

    def update_remote_status(self):
        _, ahead_rev = run_git(
            self.working_path,
            "rev-list", "@{upstream}..HEAD",
            raises=False
        )
        self.ahead = len(ahead_rev.splitlines())

        _, behind_rev = run_git(
            self.working_path,
            "rev-list", "HEAD..@{upstream}",
            raises=False
        )
        self.behind = len(behind_rev.splitlines())

Found the bug yet?

<center>⁂</center>

The code in `update_remote_status()` assumes that `git rev-list` outputs a list of commits, and then count the lines in the output.

This works well if there *is* a remote branch configured:

$ git revlist HEAD..@{upstream}
30f729a6a0ec3926cf063f5f8a3953b89d7b252e
ef564f0ef38a163beb3db52474ac4e256a6c2cd4

But if not remote is configured, the git command will fail with a message looking like:

$ git revlist HEAD..@{upstream}
fatal: no upstream configured for branch 'dm/foo'

Since `update_remote_status()` does not check the return code, `self.ahead_rev` and `self.behind_rev` both get set to `1`, and the output looks like:


Oops.

<center>⁂</center>

But there is more to it than just this bug.

Notice the type of the return value of `run_git` depends *on the value of the `raises` parameter*.

At first I tried to annotate the function with a `Union` type, like this:

def run_git(working_path: Path, *cmd: str, raises=True) ->
      Union[Tuple[int, str], None]:
    pass

But then mypy complained for each and every line that used `run_git` with `raises=False`:

rc, out = run_git(..., raises=False)

foo.py:39: error: 'builtins.None' object is not iterable

I thought about this and found out it was cleaner to split `run_git` into `run_git` and `run_git_captured`:

def run_git(working_path: Path, *cmd: str) -> None:
    """ Run git `cmd` in given `working_path`

    Raise GitCommandError if return code is non-zero.
    """
    git_cmd = list(cmd)
    git_cmd.insert(0, "git")

    returncode = subprocess.call(git_cmd, cwd=working_path)
    if returncode != 0:
        raise GitCommandError(working_path, cmd)


def run_git_captured(working_path: Path, *cmd: str,
                     check: bool = True) -> Tuple[int, str]:
    """ Run git `cmd` in given `working_path`, capturing the output

    Return a tuple (returncode, output).

    Raise GitCommandError if return code is non-zero and check is True
    """
    git_cmd = list(cmd)
    git_cmd.insert(0, "git")
    options: Dict[str, Any] = dict()
    options["stdout"] = subprocess.PIPE
    options["stderr"] = subprocess.STDOUT

    returncode = process.returncode
    if check and returncode != 0:
        raise GitCommandError(working_path, cmd, output=out)
    return returncode, out

True, there's a bit more code and slightly duplication but:

Now the intent about whether you want to capture the output or not is encoded *in the name of the function* (`run_git_captured` instead of `run_git`).

Also note that you can't forget about checking the return code anymore.

You can either write:

_, out = run_git_captured(repo_path, "some-cmd")
# We know `out` is *not* an error message, because if `some-cmd` failed, an
# exception would have been raised before `out` was set.

Or you can be explicit about your error handling:

rc, out = run_git_captured(repo_path, "some-cmd", check=False)
if rc == 0:
    # Ditto, out can't be an error message
else:
    # Need to handle error here

That alone would be a good reason to use mypy I think :)

What's next

Well, the `mypy` changes have been merged[8] and the CI now runs `mypy` in strict mode on every pull request.

8: https://github.com/dmerejkowsky/tsrc/pull/110

I'm still curious about the other benefits of type annotations I could not check (maintainability, code comprehension, ease of refactorings ...).

I guess we'll see how the next refactoring in tsrc goes.

Conclusion

We saw how mypy, while stilling making relative few false positives, still found inconsistencies, a few bugs, and even design problems.

So, my advice is for you to use it if you get a chance. Cheers!

----

Back to Index

Contact me