š¾ Archived View for dcreager.net āŗ 2021 āŗ 06 āŗ getting-started-with-tree-sitter.gmi captured on 2023-11-04 at 11:39:16. Gemini links have been rewritten to link to archived content
ā¬ ļø Previous capture (2023-07-22)
-=-=-=-=-=-=-
2021-06-07
This is the first in a series of posts on using the tree-sitter parsing framework. The primary audience, at least for these first posts, is people who want to use _existing grammars_ to write program analysis tools that work across a variety of programming languages. Iām not (yet) writing about how to create a _new_ grammar for a new programming language.
Weāre going to start by keeping it very simple. In this post, weāre going to install the ātree-sitterā command-line program and the Python grammar, and then use those to parse and syntax highlighting some Python code.
Your first goal is to get tree-sitter itself installed. You have a few options:
Some platforms have tree-sitter available in the native package manager. For instance, on Arch Linux, you can install tree-sitter using āpacmanā:
$ sudo pacman -S tree-sitter $ tree-sitter --version tree-sitter 0.19.5
Similarly, if youāre using a Mac, thereās a Homebrew formula available:
$ brew install tree-sitter $ tree-sitter --version tree-sitter 0.19.5
If your platform doesnāt package tree-sitter (or if it does, but itās out of date), you can download a precompiled binary from tree-sitterās releases page on GitHub.
tree-sitter releases [github.com]
The ātree-sitterā command-line program is a static binary with no dependencies, so you just need to download it, unpack it, and place it somewhere in your $PATH:
$ curl -OL https://github.com/tree-sitter/tree-sitter/releases/download/v0.19.5/tree-sitter-linux-x64.gz $ mkdir -p $HOME/bin $ gunzip tree-sitter-linux-x64.gz > $HOME/bin/tree-sitter $ chmod u+x $HOME/bin/tree-sitter $ export PATH=$HOME/bin:$PATH $ tree-sitter --version tree-sitter 0.19.5 (8d8690538ef0029885c7ef1f163b0e32f256a5aa)
The command-line program is also available in the NPM registry via the ātree-sitter-cliā package:
$ npm install tree-sitter-cli
Since this places the command-line program into your `node_modules` directory, you use ānpxā to run it:
$ npx tree-sitter --version tree-sitter 0.19.4 (6dd41e2e45f8b4a00fda21f28bc0ebc6b172ffed)
(This option is especially useful when youāre editing a grammar, since itās the easiest way to install tree-sitter as part of a CI build in your grammar repository.)
At this point, you should have the ātree-sitterā program installed. If we try to parse some Python code, however, it wonāt work!
$ tree-sitter --version tree-sitter 0.19.5 $ cat example.py import utils def add_four(x): return x + 4 print(add_four(5)) $ tree-sitter parse example.py No language found
This is because tree-sitter does not install any language grammars by default ā after all, we have no idea which particular languages you want to parse and analyze!
That means that if we want to parse Python code, we need to install the tree-sitter Python grammar. The ātree-sitterā program has a nice feature where it will automatically generate and compile language parsers for you; all you have to do is check out the grammarās git repository into a well-known location.
To set this up, we first need to generate a configuration file for the command-line program. This config file will tell ātree-sitterā where to find the language grammars that you want to use. If you run the following:
$ tree-sitter init-config
then ātree-sitterā will create a new configuration file for you at ā$HOME/.tree-sitter/config.jsonā. Open that file in your editor of choice, and you will see a `parser_directoriesā section at the top:
$ head -n 6 ~/.tree-sitter/config.json { "parser-directories": [ "/home/dcreager/github", "/home/dcreager/src", "/home/dcreager/source" ],
You can choose whatever directories you want to hold your grammar definitions. The ātree-sitterā program will assume that any subdirectory in those locations whose name matches the pattern ātree-sitter-[language]ā holds a grammar definition. It will automatically generate and compile those grammars, if needed, each time it starts up.
For this to work, you must also have Node.js and a C compiled installed (since the grammar definitions are written in a JavaScript-based DSL, and the generated parsers are implemented in C).
Given all of this, you need to clone the Python grammar into one of the directories listed in the config file. (If you decide to change your config file to use a different directory, make sure to change the commands below accordingly.)
$ mkdir -p ~/src $ cd ~/src $ git clone https://github.com/tree-sitter/tree-sitter-python
Having done that, the ātree-sitter parseā command should now print out a parse tree for our example file:
$ tree-sitter parse example.py (module [0, 0] - [6, 0] (import_statement [0, 0] - [0, 12] name: (dotted_name [0, 7] - [0, 12] (identifier [0, 7] - [0, 12]))) (function_definition [2, 0] - [3, 16] name: (identifier [2, 4] - [2, 12]) parameters: (parameters [2, 12] - [2, 15] (identifier [2, 13] - [2, 14])) body: (block [3, 4] - [3, 16] (return_statement [3, 4] - [3, 16] (binary_operator [3, 11] - [3, 16] left: (identifier [3, 11] - [3, 12]) right: (integer [3, 15] - [3, 16]))))) (expression_statement [5, 0] - [5, 18] (call [5, 0] - [5, 18] function: (identifier [5, 0] - [5, 5]) arguments: (argument_list [5, 5] - [5, 18] (call [5, 6] - [5, 17] function: (identifier [5, 6] - [5, 14]) arguments: (argument_list [5, 14] - [5, 17] (integer [5, 15] - [5, 16])))))))
You can play around further by parsing example files from other languages ā first clone the necessary language grammar into the same ā$HOME/srcā directory, and then use ātree-sitter parseā.