From: Yuan Fu
parent field: (child (grandchild (…))) +parent field: (node (child (…)))
child, grand, grand-grandchild, etc., are nodes that -begin at point. parent is the parent node of child. +
where node, child, etc, are nodes which begin at point. +parent is the parent of node. node is displayed in +bold typeface. field-names are field names of node and +child, etc.
-If there is no node that starts at point, i.e., point is in the middle -of a node, then the mode-line only displays the smallest node that -spans the position of point, and its immediate parent. +
If no node starts at point, i.e., point is in the middle of a node, +then the mode line displays the earliest node that spans point, and +its immediate parent.
-This minor mode doesn’t create parsers on its own. It simply uses the
-first parser in (treesit-parser-list) (see Using Tree-sitter Parser).
+
This minor mode doesn’t create parsers on its own. It uses the first
+parser in (treesit-parser-list) (see Using Tree-sitter Parser).
Sometimes, the source of a programming language could contain snippets @@ -76,8 +75,22 @@ example. In that case, text segments written in different languages need to be assigned different parsers. Traditionally, this is achieved by using narrowing. While tree-sitter works with narrowing (see narrowing), the recommended way is -instead to set regions of buffer text in which a parser will operate. +instead to set regions of buffer text (i.e., ranges) in which a parser +will operate. This section describes functions for setting and +getting ranges for a parser. +
+Lisp programs should call treesit-update-ranges to make sure
+the ranges for each parser are correct before using parsers in a
+buffer, and call treesit-language-at to figure out the language
+responsible for the text at some position. These two functions don’t
+work by themselves, they need major modes to set
+treesit-range-settings and
+treesit-language-at-point-function, which do the actual work.
+These functions and variables are explained in more detail towards the
+end of the section.
This function sets up parser to operate on ranges. The
@@ -126,24 +139,6 @@ ranges, the return value is nil.
Like treesit-parser-set-included-ranges, this function sets
-the ranges of parser-or-lang to ranges. Conveniently,
-parser-or-lang could be either a parser or a language. If it is
-a language, this function looks for the first parser in
-(treesit-parser-list) for that language in the current buffer,
-and sets the ranges for it.
-
This function returns the ranges of parser-or-lang, like
-treesit-parser-included-ranges. And like
-treesit-set-ranges, parser-or-lang can be a parser or
-a language symbol.
-
This function matches source with query and returns the
@@ -166,57 +161,56 @@ range in which this function queries.
treesit-query-error error if query is malformed.
This variable holds the list of range functions. Font-locking and -indenting code use functions in this list to set correct ranges for -a language parser before using it. -
-The signature of each function in the list should be: -
-(start end &rest _) -
where start and end specify the region that is about to be -used. A range function only needs to (but is not limited to) update -ranges in that region. +
It should suffice for general Lisp programs to call the following two +functions in order to support program sources that mixes multiple +languages.
-The functions in the list are called in order. -
This function is used by font-lock and indentation to update ranges -before using any parser. Each range function in -treesit-range-functions is called in-order. Arguments -start and end are passed to each range function. +
This function updates ranges for parsers in the buffer. It makes sure
+the parsers’ ranges are set correctly between beg and end,
+according to treesit-range-settings. If omitted, beg
+defaults to the beginning of the buffer, and end defaults to the
+end of the buffer.
+
For example, fontification functions use this function before querying +for nodes in a region.
This function tries to figure out which language is responsible for
-the text at buffer position pos. Under the hood it just calls
-treesit-language-at-point-function.
-
Various Lisp programs use this function. For example, the indentation
-program uses this function to determine which language’s rule to use
-in a multi-language buffer. So it is important to provide
-treesit-language-at-point-function for a multi-language major
-mode.
+
This function returns the language of the text at buffer position
+pos. Under the hood it calls
+treesit-language-at-point-function and returns its return
+value. If treesit-language-at-point-function is nil,
+this function returns the language of the first parser in the returned
+value of treesit-parser-list. If there is no parser in the
+buffer, it returns nil.
Normally, in a set of languages that can be mixed together, there is a -major language and several embedded languages. A Lisp program usually -first parses the whole document with the major language’s parser, sets -ranges for the embedded languages, and then parses the embedded +host language and one or more embedded languages. A Lisp +program usually first parses the whole document with the host +language’s parser, retrieves some information, sets ranges for the +embedded languages with that information, and then parses the embedded languages.
-Suppose we need to parse a very simple document that mixes -HTML, CSS and JavaScript: +
Take a buffer containing HTML, CSS and JavaScript
+as an example. A Lisp program will first parse the whole buffer with
+an HTML parser, then query the parser for
+style_element and script_element nodes, which
+correspond to CSS and JavaScript text, respectively. Then
+it sets the range of the CSS and JavaScript parser to the
+ranges in which their corresponding nodes span.
+
Given a simple HTML document:
<html> @@ -225,8 +219,8 @@ languages. </html>
We first parse with HTML, then set ranges for CSS -and JavaScript: +
a Lisp program will first parse with a HTML parser, then set +ranges for CSS and JavaScript parsers:
;; Create parsers. @@ -251,10 +245,76 @@ and JavaScript: (treesit-parser-set-included-ranges js js-range)
We use a query pattern (style_element (raw_text) @capture)
-to find CSS nodes in the HTML parse tree. For how
-to write query patterns, see Pattern Matching Tree-sitter Nodes.
+
Emacs automates this process in treesit-update-ranges. A
+multi-language major mode should set treesit-range-settings so
+that treesit-update-ranges knows how to perform this process
+automatically. Major modes should use the helper function
+treesit-range-rules to generate a value that can be assigned to
+treesit-range-settings. The settings in the following example
+directly translate into operations shown above.
(setq-local treesit-range-settings + (treesit-range-rules + :embed 'javascript + :host 'html + '((script_element (raw_text) @capture)) +
+ +
:embed 'css + :host 'html + '((style_element (raw_text) @capture)))) +
This function is used to set treesit-range-settings. It +takes care of compiling queries and other post-processing, and outputs +a value that treesit-range-settings can have. +
+It takes a series of query-specs, where each query-spec is +a query preceded by zero or more pairs of keyword and +value. Each query is a tree-sitter query in either the +string, s-expression or compiled form, or a function. +
+If query is a tree-sitter query, it should be preceeded by two
+:keyword value pairs, where the :embed keyword
+specifies the embedded language, and the :host keyword
+specified the host language.
+
treesit-update-ranges uses query to figure out how to set
+the ranges for parsers for the embedded language. It queries
+query in a host language parser, computes the ranges in which
+the captured nodes span, and applies these ranges to embedded
+language parsers.
+
If query is a function, it doesn’t need any :keyword and +value pair. It should be a function that takes 2 arguments, +start and end, and sets the ranges for parsers in the +current buffer in the region between start and end. It is +fine for this function to set ranges in a larger region that +encompasses the region between start and end. +
This variable helps treesit-update-ranges in updating the
+ranges for parsers in the buffer. It is a list of settings
+where the exact format of a setting is considered internal. You
+should use treesit-range-rules to generate a value that this
+variable can have.
+
This variable’s value should be a function that takes a single
+argument, pos, which is a buffer position, and returns the
+language of the buffer text at pos. This variable is used by
+treesit-language-at.
+
font-lock-keyword face.
treesit-major-mode-setup.
This function is used to set treesit-font-lock-settings. It takes care of compiling queries and other post-processing, and outputs a value that treesit-font-lock-settings accepts. Here’s an @@ -129,13 +129,18 @@ example: "(script_element) @font-lock-builtin-face")
This function takes a list of text or s-exp queries. Before each
-query, there are :keyword-value pairs that configure
-that query. The :lang keyword sets the query’s language and
-every query must specify the language. The :feature keyword
-sets the feature name of the query. Users can control which features
-are enabled with font-lock-maximum-decoration and
-treesit-font-lock-feature-list (see below).
+
This function takes a series of query-specs, where each +query-spec is a query preceded by multiple pairs of +:keyword and value. Each query is a tree-sitter +query in either the string, s-expression or compiled form. +
+For each query, the :keyword and value pairs add
+meta information to it. The :lang keyword declares
+query’s language. The :feature keyword sets the feature
+name of query. Users can control which features are enabled
+with font-lock-maximum-decoration and
+treesit-font-lock-feature-list (described below). These two
+keywords are mandated.
Other keywords are optional:
@@ -148,7 +153,7 @@ are enabled withfont-lock-maximum-decoration and
keepLisp programs mark patterns in the query with capture names (names +
Lisp programs mark patterns in query with capture names (names
that starts with @), and tree-sitter will return matched nodes
tagged with those same capture names. For the purpose of
fontification, capture names in query should be face names like
@@ -230,9 +235,10 @@ these common features.
A list of settings for tree-sitter based font lock. The exact format
-of this variable is considered internal. One should always use
+of each setting is considered internal. One should always use
treesit-font-lock-rules to set this variable.
-
Multi-language major modes should provide range functions in
treesit-range-functions, and Emacs will set the ranges
diff --git a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html
index 2fdb50df7c1..5ea1f9bc332 100644
--- a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html
+++ b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html
@@ -106,7 +106,8 @@ the current line to matcher; if it returns non-nil, this
rule is applicable. Then Emacs passes the node to anchor, which
returns a buffer position. Emacs takes the column number of that
position, adds offset to it, and the result is the indentation
-column for the current line.
+column for the current line. offset can be an integer or a
+variable whose value is an integer.
The matcher and anchor are functions, and Emacs provides
convenient defaults for them.
@@ -117,8 +118,8 @@ arguments: node, parent, and bol. The argument
position of the first non-whitespace character after the beginning of
the line. The argument node is the largest (highest-in-tree)
node that starts at that position; and parent is the parent of
-node. However, when that position is on a whitespace or inside
-a multi-line string, no node that starts at that position, so
+node. However, when that position is in a whitespace or inside
+a multi-line string, no node can start at that position, so
node is nil. In that case, parent would be the
smallest node that spans that position.
This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the first non-whitespace charater on the previous line. +
+point-min ¶This anchor is a function is called with 3 arguments: node, +parent, and bol, and returns the beginning of the buffer. +This is useful as the beginning of the buffer is always at column 0.
nil, it looks for smallest named child.
This function traverses the subtree of node (including
node itself), looking for a node for which predicate
returns non-nil. predicate is a regexp that is matched
-(case-insensitively) against each node’s type, or a predicate function
-that takes a node and returns non-nil if the node matches. The
-function returns the first node that matches, or nil if none
-does.
+against each node’s type, or a predicate function that takes a node
+and returns non-nil if the node matches. The function returns
+the first node that matches, or nil if none does.
By default, this function only traverses named nodes, but if all
is non-nil, it traverses all the nodes. If backward is
@@ -279,9 +278,9 @@ down the tree.
Like treesit-search-subtree, this function also traverses the
parse tree and matches each node with predicate (except for
-start), where predicate can be a (case-insensitive) regexp
-or a function. For a tree like the below where start is marked
-S, this function traverses as numbered from 1 to 12:
+start), where predicate can be a regexp or a function.
+For a tree like the below where start is marked S, this function
+traverses as numbered from 1 to 12:
12 @@ -336,8 +335,8 @@ as intreesit-search-forward.It takes the subtree under root, and combs it so only the nodes that match predicate are left. Like previous functions, the predicate can be a regexp string that matches against each -node’s type case-insensitively, or a function that takes a node and -return non-
nilif it matches. +node’s type, or a function that takes a node and return non-nil+if it matches.For example, for a subtree on the left that consist of both numbers and letters, if predicate is “letter only”, the returned tree