Thread View: gmane.emacs.devel
6 messages
6 total messages
Started by Yuan Fu
Fri, 01 Sep 2023 22:01
Update on tree-sitter structure navigation
Author: Yuan Fu
Date: Fri, 01 Sep 2023 22:01
Date: Fri, 01 Sep 2023 22:01
71 lines
3483 bytes
3483 bytes
Hey guys, In the months after wrapping up tree-sitter stuff in emacs-29, I was thinking about how to implement structural navigation and extracting information from the parser with tree-sitter. In emacs-29 we have things like treesit-beginning/end-of-defun, and treesit-defun-name. I was thinking maybe we can generalize this to support getting arbitrary āthingā at point, move around them, and getting information like the name of a defun, its arglist, parent of a class, type of an variable declaration, etc, in a language-agnostic way. Also, at the time, we only support defining things by a regexp matching a nodeās type, which is often not enough. And it would be nice to somehow take advantage of the tree-sitter queries for the features I mentioned above. Tree-sitter query is what every other editor are using for virtually all tree-sitter related features. But in Emacs, we mostly only use it for font-lock. Hereās the progress as of now: - Functions like treesit-search-forward, treesit-induce-sparse-tree, treesit-thing-at-point, treesit--navigate-thing, etc, support a richer set of predicates now. Besides regexp matching the type, the predicate can also be a predication function, or (REGEP . FUNC), or compound predicates like (or PRED PRED) or (not PRED). - Thereās now a variable treesit-thing-settings, which holds definition for things. Then, instead of passing the predicate to the functions I mentioned above, you can save the predicate in treesit-thing-settings under a symbol, say āsexp', and pass the symbol instead, just like thing-at-point.el. (Weāll work on integrating with thing-at-point.el later.) - I canāt think of a good way to integrate tree-sitter queries with the navigation functions we have right now. Most importantly, tree-sitter query always search top-down, and you canāt limit the depth it searches. OTOH, our navigation functions work by traversing the tree node-to-node. - Thereās no progress on getting information like name and type, etc, in a language-agnostic way. I havenāt come up with a good interface and/or implementation. I encourage interested folks to give it some thought. Bonus points for reusing the query files neovim folks has accumulated :-) Some other things on the TODO list that people can take a jab at: - Query-based indentation (neovimās implementation can be a source of inspiration) - Improve c-ts-mode (indentation styles, other cc-mode features, etc) and other tree-sitter modes - Solve the grammar versioning/breaking-change problem: tree-sitter grammar donāt have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error. - Major mode fallback/inheritance, this has been discussed many times, no good solution emerged. - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. Iāll probably work on this one next. Finally, feel free to send me an email or send to emacs-devel and CC me, if there are things treesit.c and treesit.el can do better, or when there are nice things in neovim and other editors and Emacs ought to have, too. Yuan
Re: Update on tree-sitter structure navigation
Author: Ihor Radchenko
Date: Sat, 02 Sep 2023 06:52
Date: Sat, 02 Sep 2023 06:52
114 lines
4946 bytes
4946 bytes
Yuan Fu <casouri@gmail.com> writes: > In the months after wrapping up tree-sitter stuff in emacs-29, I was > thinking about how to implement structural navigation and extracting > information from the parser with tree-sitter. In emacs-29 we have > things like treesit-beginning/end-of-defun, and treesit-defun-name. I > was thinking maybe we can generalize this to support getting arbitrary > āthingā at point, move around them, and getting information like the > name of a defun, its arglist, parent of a class, type of an variable > declaration, etc, in a language-agnostic way. Note that Org mode also does all of these using https://orgmode.org/worg/dev/org-element-api.html It would be nice if we could converge to more consistent interface across all the modes. For example, by extending `thing-at-point' to handle parsed elements, not just simplistic regexp-based "thing" boundaries exposed by `thing-at-point' now. Org approaches getting name/begin/end/arguments using a common API: (org-element-property :begin NODE) (org-element-property :end NODE) (org-element-property :contents-begin NODE) (org-element-property :contents-end NODE) (org-element-property :name NODE) (org-element-property :args NODE) Language-agnostic "thing"s will certainly be welcome, especially given that tree-sitter grammars use inconsistent naming schemes, which have to be learned separately, and may even change with grammar versions. I think that both NODE types and attributes can be standardized. > Also, at the time, we only support defining things by a regexp > matching a nodeās type, which is often not enough. > > And it would be nice to somehow take advantage of the tree-sitter > queries for the features I mentioned above. Tree-sitter query is what > every other editor are using for virtually all tree-sitter related > features. But in Emacs, we mostly only use it for font-lock. I recall one user asking about something like VIM's textobjects via tree-sitter queries. Example: https://github.com/nvim-treesitter/nvim-treesitter-textobjects/blob/master/queries/cpp/textobjects.scm > Hereās the progress as of now: > > - Functions like treesit-search-forward, treesit-induce-sparse-tree, > treesit-thing-at-point, treesit--navigate-thing, etc, support a richer > set of predicates now. Besides regexp matching the type, the predicate > can also be a predication function, or (REGEP . FUNC), or compound > predicates like (or PRED PRED) or (not PRED). Slightly unrelated, but do you have any idea if it can be faster to use Emacs' regexp search combined with treesit-thing-at-point vs. pure tree-sitter query? > - Thereās now a variable treesit-thing-settings, which holds > definition for things. Then, instead of passing the predicate to the > functions I mentioned above, you can save the predicate in > treesit-thing-settings under a symbol, say āsexp', and pass the symbol > instead, just like thing-at-point.el. (Weāll work on integrating with > thing-at-point.el later.) This sounds similar to textobjects I linked above. One question: how will it integrate with multiple parsers in one buffer? > - I canāt think of a good way to integrate tree-sitter queries with > the navigation functions we have right now. Most importantly, > tree-sitter query always search top-down, and you canāt limit the > depth it searches. OTOH, our navigation functions work by traversing > the tree node-to-node. May you elaborate about the difficulties you encountered? > Some other things on the TODO list that people can take a jab at: > > - Solve the grammar versioning/breaking-change problem: tree-sitter grammar donāt have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error. May we somehow get a hash of the library? That way, we can at least detect if something has changed. > - Major mode fallback/inheritance, this has been discussed many times, no good solution emerged. I think that integration of tree-sitter with navigation functions might be a step towards solving this problem. If common Emacs commands can automatically choose between tree-sitter and classic implementations, it might become easier to unify foo-ts-mode with foo-mode. > - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. Iāll probably work on this one next. Do you mean that a single parser sees subsequent block as a continuation of the previous? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>
Re: Update on tree-sitter structure navigation
Author: Hugo Thunnissen
Date: Sat, 02 Sep 2023 10:50
Date: Sat, 02 Sep 2023 10:50
54 lines
2260 bytes
2260 bytes
Ihor Radchenko <yantar92@posteo.net> writes: > Yuan Fu <casouri@gmail.com> writes: > >> In the months after wrapping up tree-sitter stuff in emacs-29, I was >> thinking about how to implement structural navigation and extracting >> information from the parser with tree-sitter. In emacs-29 we have >> things like treesit-beginning/end-of-defun, and treesit-defun-name. I >> was thinking maybe we can generalize this to support getting arbitrary >> āthingā at point, move around them, and getting information like the >> name of a defun, its arglist, parent of a class, type of an variable >> declaration, etc, in a language-agnostic way. > > Note that Org mode also does all of these using > https://orgmode.org/worg/dev/org-element-api.html > > It would be nice if we could converge to more consistent interface > across all the modes. For example, by extending `thing-at-point' to handle > parsed elements, not just simplistic regexp-based "thing" boundaries > exposed by `thing-at-point' now. > > Org approaches getting name/begin/end/arguments using a common API: > > (org-element-property :begin NODE) > (org-element-property :end NODE) > (org-element-property :contents-begin NODE) > (org-element-property :contents-end NODE) > (org-element-property :name NODE) > (org-element-property :args NODE) > > Language-agnostic "thing"s will certainly be welcome, especially given > that tree-sitter grammars use inconsistent naming schemes, which have to > be learned separately, and may even change with grammar versions. > > I think that both NODE types and attributes can be standardized. > It would be great to see standardization that can work with more than just tree-sitter. Depending on how extensive such a generic NODE type and accompanying API are, I could see standardization of a lot of things that are currently implemented in major modes, to name a few: - indentation - fontification - thing-at-point - imenu - simple forms of completion (variables, function names in buffer) I have some idea of the underpinnings, but I have never implemented a full major mode so it is hard for me to judge the practicality of this. How much would be practical to standardize, without needlessly complicated/resource-heavy abstractions?
Re: Update on tree-sitter structure navigation
Author: Yuan Fu
Date: Sat, 02 Sep 2023 15:09
Date: Sat, 02 Sep 2023 15:09
167 lines
6459 bytes
6459 bytes
> On Sep 1, 2023, at 11:52 PM, Ihor Radchenko <yantar92@posteo.net> wrote: > > Yuan Fu <casouri@gmail.com> writes: > >> In the months after wrapping up tree-sitter stuff in emacs-29, I was >> thinking about how to implement structural navigation and extracting >> information from the parser with tree-sitter. In emacs-29 we have >> things like treesit-beginning/end-of-defun, and treesit-defun-name. I >> was thinking maybe we can generalize this to support getting arbitrary >> āthingā at point, move around them, and getting information like the >> name of a defun, its arglist, parent of a class, type of an variable >> declaration, etc, in a language-agnostic way. > > Note that Org mode also does all of these using > https://orgmode.org/worg/dev/org-element-api.html > > It would be nice if we could converge to more consistent interface > across all the modes. For example, by extending `thing-at-point' to handle > parsed elements, not just simplistic regexp-based "thing" boundaries > exposed by `thing-at-point' now. > > Org approaches getting name/begin/end/arguments using a common API: > > (org-element-property :begin NODE) > (org-element-property :end NODE) > (org-element-property :contents-begin NODE) > (org-element-property :contents-end NODE) > (org-element-property :name NODE) > (org-element-property :args NODE) > > Language-agnostic "thing"s will certainly be welcome, especially given > that tree-sitter grammars use inconsistent naming schemes, which have to > be learned separately, and may even change with grammar versions. > > I think that both NODE types and attributes can be standardized. If we come up with a thing-at-point interface that provides more information than the current (BEG . END), tree-sitter surely can support it as a backend. Just need SomeOne to come up with it :-) But I donāt see how this interface can support semantic information like arglist of a defun, or type of a declarationāthese things are not universal to all ānodesā. > >> Also, at the time, we only support defining things by a regexp >> matching a nodeās type, which is often not enough. >> >> And it would be nice to somehow take advantage of the tree-sitter >> queries for the features I mentioned above. Tree-sitter query is what >> every other editor are using for virtually all tree-sitter related >> features. But in Emacs, we mostly only use it for font-lock. > > I recall one user asking about something like VIM's textobjects via > tree-sitter queries. Example: > https://github.com/nvim-treesitter/nvim-treesitter-textobjects/blob/master/queries/cpp/textobjects.scm I think thatās something that can be implemented with thing definitions. >> Hereās the progress as of now: >> >> - Functions like treesit-search-forward, treesit-induce-sparse-tree, >> treesit-thing-at-point, treesit--navigate-thing, etc, support a richer >> set of predicates now. Besides regexp matching the type, the predicate >> can also be a predication function, or (REGEP . FUNC), or compound >> predicates like (or PRED PRED) or (not PRED). > > Slightly unrelated, but do you have any idea if it can be faster to use > Emacs' regexp search combined with treesit-thing-at-point vs. pure > tree-sitter query? Not really. > >> - Thereās now a variable treesit-thing-settings, which holds >> definition for things. Then, instead of passing the predicate to the >> functions I mentioned above, you can save the predicate in >> treesit-thing-settings under a symbol, say āsexp', and pass the symbol >> instead, just like thing-at-point.el. (Weāll work on integrating with >> thing-at-point.el later.) > > This sounds similar to textobjects I linked above. > One question: how will it integrate with multiple parsers in one buffer? This only concerns with checking if a node satisfies the definition of a āthingā, and doesnāt care how you get the node. Retrieving node through either treesit-node-at or other functions already works with multiple parsers. Also the āthingā definition is language-specific. > >> - I canāt think of a good way to integrate tree-sitter queries with >> the navigation functions we have right now. Most importantly, >> tree-sitter query always search top-down, and you canāt limit the >> depth it searches. OTOH, our navigation functions work by traversing >> the tree node-to-node. > > May you elaborate about the difficulties you encountered? Ideally Iād like to pass a query and a node to treesit-node-match-p, which returns t if the query matches the node. But queries donāt work like that. They search the node and returns all the matches within that node, which could be potentially wasteful. > >> Some other things on the TODO list that people can take a jab at: >> >> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar donāt have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error. > > May we somehow get a hash of the library? That way, we can at least > detect if something has changed. All we get is a binary dynamic library. So I donāt think so. > >> - Major mode fallback/inheritance, this has been discussed many times, no good solution emerged. > > I think that integration of tree-sitter with navigation functions might > be a step towards solving this problem. If common Emacs commands can > automatically choose between tree-sitter and classic implementations, it > might become easier to unify foo-ts-mode with foo-mode. Unifying tree-sitter and non-tree-sitter modes creates many problems. Iām rather thinking about some way to share some configuration between two modes. Weāve had many discussions before with no fruitful conclusion. > >> - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. Iāll probably work on this one next. > > Do you mean that a single parser sees subsequent block as a continuation > of the previous? Exactly. Yuan
Re: Update on tree-sitter structure navigation
Author: Yuan Fu
Date: Sat, 02 Sep 2023 15:12
Date: Sat, 02 Sep 2023 15:12
75 lines
2785 bytes
2785 bytes
> On Sep 2, 2023, at 1:50 AM, Hugo Thunnissen <devel@hugot.nl> wrote: > > Ihor Radchenko <yantar92@posteo.net> writes: > >> Yuan Fu <casouri@gmail.com> writes: >> >>> In the months after wrapping up tree-sitter stuff in emacs-29, I was >>> thinking about how to implement structural navigation and extracting >>> information from the parser with tree-sitter. In emacs-29 we have >>> things like treesit-beginning/end-of-defun, and treesit-defun-name. I >>> was thinking maybe we can generalize this to support getting arbitrary >>> āthingā at point, move around them, and getting information like the >>> name of a defun, its arglist, parent of a class, type of an variable >>> declaration, etc, in a language-agnostic way. >> >> Note that Org mode also does all of these using >> https://orgmode.org/worg/dev/org-element-api.html >> >> It would be nice if we could converge to more consistent interface >> across all the modes. For example, by extending `thing-at-point' to handle >> parsed elements, not just simplistic regexp-based "thing" boundaries >> exposed by `thing-at-point' now. >> >> Org approaches getting name/begin/end/arguments using a common API: >> >> (org-element-property :begin NODE) >> (org-element-property :end NODE) >> (org-element-property :contents-begin NODE) >> (org-element-property :contents-end NODE) >> (org-element-property :name NODE) >> (org-element-property :args NODE) >> >> Language-agnostic "thing"s will certainly be welcome, especially given >> that tree-sitter grammars use inconsistent naming schemes, which have to >> be learned separately, and may even change with grammar versions. >> >> I think that both NODE types and attributes can be standardized. >> > > It would be great to see standardization that can work with more than > just tree-sitter. Depending on how extensive such a generic NODE type > and accompanying API are, I could see standardization of a lot of things > that are currently implemented in major modes, to name a few: > > - indentation > - fontification > - thing-at-point > - imenu > - simple forms of completion (variables, function names in buffer) > > I have some idea of the underpinnings, but I have never implemented a > full major mode so it is hard for me to judge the practicality of > this. How much would be practical to standardize, without needlessly > complicated/resource-heavy abstractions? I donāt know which level of standardization you are thinking about, but arenāt they already standardized? - indentation: indent-line/region-function - fontification: font-lock-defaults - thing-at-point: thing-at-point function - imenu: imenu-create-index-function - completion: completion-at-point-function Yuan
Re: Update on tree-sitter structure navigation
Author: Dmitry Gutov
Date: Sun, 03 Sep 2023 03:56
Date: Sun, 03 Sep 2023 03:56
23 lines
1276 bytes
1276 bytes
Hi Yuan, On 02/09/2023 08:01, Yuan Fu wrote: > - Solve the grammar versioning/breaking-change problem: tree-sitter grammar donāt have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error. I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser repositories and the ref of the latest known good revision, for the current version of the major mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly to how auto-mode-alist is appended to). > Finally, feel free to send me an email or send to emacs-devel and CC me, if there are things treesit.c and treesit.el can do better, or when there are nice things in neovim and other editors and Emacs ought to have, too. Something I mentioned previously, there is notion of scopes in tree-sitter docs, see the Local Variables section here: https://tree-sitter.github.io/tree-sitter/syntax-highlighting#local-variables Basically to know which symbols are defined and for how long, the parser needs additional help from the major mode author. Neovim's definition here: https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/ruby/locals.scm
Thread Navigation
This is a paginated view of messages in the thread with full content displayed inline.
Messages are displayed in chronological order, with the original post highlighted in green.
Use pagination controls to navigate through all messages in large threads.
Back to All Threads