Using Tree-sitter in Emacs
About
在写 CFFI 的时候手动写一堆的 cffi:defcfun
实在是头痛,
而使用 c2ffi 却并没有那么的省心.
不过话又说回来了, 反正都是从 *.h
文件中读取函数的定义,
然后按照一定的规则去生成代码 cffi:defcfun
.
一个简单的想法是写一个 parser, 然后把函数名和类型等提取,
最后根据规则进行代码的生成 – 但是为什么不直接利用 Emacs
的 Tree-sitter 的集成 (Parsing Program Source) 呢?
Requirement
首先需要确保 Tree-sitter 在 Emacs 中是可用的:
(treesit-available-p)
t
并且 C parser 已经被正确地安装了:
(treesit-install-language-grammar 'c)
注: 在 Emacs 29 之后 Tree-sitter 就已经被内置在了 Emacs 中, 假如你的 Emacs 版本较早, 可以参考 emacs-tree-sitter 进行配置.
*.h
\(→\) *.lisp
via Tree-sitter
Parse *.h
假设有一个 foo.h
头文件被打开了 (其 buffer 名称为 foo.h
),
创建一个 Tree-sitter 的 parser:
foo.h
#include "stdio.h"
typedef int bar;
void print_help_and_exit();
int some_funcions(bar *foo);
(defvar parser (treesit-parser-create 'c (get-buffer "foo.h")))
注: 更多请参考 Using Tree-sitter Parser.
Extract infomation from Tree
这里差不多用到了 Retrieving Nodes 和 Accessing Node Information 这两个文档中说明的特性.
使用 treesit-node-children
可以很方便地实现一个 node
的历遍:
(defun treesit-node-sexp (node)
"Turn Tree-sitter NODE as S-expression. "
(let ((children (treesit-node-children node)))
(if (null children)
(list (treesit-node-type node)
(substring-no-properties (treesit-node-text node)))
(cons (treesit-node-type node)
(mapcar #'treesit-node-sexp children)))))
其效果类似如下
(treesit-node-sexp (treesit-parser-root-node parser))
("translation_unit" ("preproc_include" ("#include" "#include") ("string_literal" ("\"" "\"") ("string_content" "stdio.h") ("\"" "\""))) ("type_definition" ("typedef" "typedef") ("primitive_type" "int") ("type_identifier" "bar") (";" ";")) ("declaration" ("primitive_type" "void") ("function_declarator" ("identifier" "print_help_and_exit") ("parameter_list" ("(" "(") (")" ")"))) (";" ";")) ("declaration" ("primitive_type" "int") ("pointer_declarator" ("*" "*") ("function_declarator" ("identifier" "some_funcions") ("parameter_list" ("(" "(") ("parameter_declaration" ("type_identifier" "bar") ("pointer_declarator" ("*" "*") ("identifier" "foo"))) ("," ",") ("parameter_declaration" ("storage_class_specifier" ("static" "static")) ("type_qualifier" ("const" "const")) ("primitive_type" "int") ("identifier" "num")) (")" ")")))) (";" ";")))
不过其实也可以在 Buffer 中使用 treesit-explore-mode
来进行预览.
效果如下
(translation_unit (preproc_include #include path: (string_literal " (string_content) ")) (declaration type: (primitive_type) declarator: (function_declarator declarator: (identifier) parameters: (parameter_list ( ))) ;) (declaration type: (primitive_type) declarator: (function_declarator declarator: (identifier) parameters: (parameter_list ( (parameter_declaration type: (primitive_type) declarator: (pointer_declarator * declarator: (identifier))) ))) ;))
但是不经过筛选的 node 的处理有点麻烦, 虽然在前面的历遍函数上面添加一些判断,
也能够让结果比较好用. 但是考虑到为啥不直接用现成的 treesit-query-capture
呢?
(mapcar (lambda (query)
(let ((func (cdr query)))
(substring-no-properties (treesit-node-text func))))
(treesit-query-capture (treesit-parser-root-node parser)
'((declaration) @declaration)))
("void print_help_and_exit();" "int some_funcions(bar *foo);")
现在你已经学会了从最简单的 tree 中提取节点了, 那么使用更加复杂一些的规则:
(treesit-query-capture (treesit-parser-root-node parser)
'((declaration
declarator: (_ declarator: (_)))
@declaration))
((declaration . #<treesit-node declaration in 39-66>) (declaration . #<treesit-node declaration in 68-119>))
一个小小的 Tip
可以使用 treesit-query-validate
来对 query
进行合法性进行判断:
(treesit-query-validate 'c
'((declaration
declarator: (function_declarator !body))
@declaration))
QUERY is valid
既然已经能够写出这样稍微复杂一些的 query 规则…
于是就可以写一个简单的函数来提取一个头文件中的所有函数定义,
即对于找到的 declaration
, 提取其中的函数返回类型, 函数名称以及函数参数.
(type function &rest (type name))
这里有一个需要注意的点: 因为 Tree-sitter 会把 *func
变成
(pointer_declarator * declarator: (_))
的形式,
所以需要做一个简单的操作来把函数的类型 type
变成 (:pointer type)
这样的形式.
(defun treesit-c--ptr-type (type declarator)
"Return pointer noted TYPE and DECLARATOR. "
(let ((pointer-p (string= "*" (treesit-node-text
(treesit-node-child declarator 0)))))
(if pointer-p
(treesit-c--ptr-type
(list :pointer type)
(treesit-node-child-by-field-name declarator "declarator"))
(cons type declarator))))
以及一些信息的提取:
(defun treesit-node-text-no-property (node)
(substring-no-properties (treesit-node-text node)))
于是如下实现:
(lexical-let ((func-query (treesit-query-compile
'c
'((declaration
type: (_) @type
declarator: (_ declarator: (_)) @declarator))))
(para-query (treesit-query-compile 'c
'((parameter_declaration
type: (_) @type
declarator: (_) @declarator)))))
(defun treesit-c--function-declarations (root)
(cl-loop with matched = (treesit-query-capture root func-query)
for ((_ . type) (_ . declarator)) on matched by #'cddr
for (type* . declare) = (treesit-c--ptr-type
(treesit-node-text-no-property type)
declarator)
for name = (treesit-node-child-by-field-name declare "declarator")
for params =
(cl-loop with p* = (treesit-query-capture declare para-query)
for ((_ . type) (_ . declarator)) on p* by #'cddr
for (type* . declare) = (treesit-c--ptr-type
(treesit-node-text-no-property type)
declarator)
collect (list type* (treesit-node-text-no-property declare)))
collect `(,type*
,(treesit-node-text-no-property name)
,@params))))
于是最终的效果就如下所示:
(treesit-c--function-declarations (treesit-parser-root-node parser))
(("void" "print_help_and_exit") ((:pointer "int") "some_funcions" ((:pointer "bar") "foo") ("int" "num")))
Tree to cffi:defcfun
于是就可以用这样的规则来生成 cffi:defcfun
的函数定义了:
(mapconcat
(lambda (definition)
(cl-destructuring-bind (type name . params) definition
(concat (format "(cffi:defcfun (%s %S) %s"
(string-param-case name) name
(c-type-to-lisp-name type))
(if params "\n" "")
(string-join (cl-loop for (type var) in params
collect (format " (%s %s)"
(string-param-case var)
(c-type-to-lisp-name type)))
"\n")
")")))
definitions
"\n\n")
(cffi:defcfun (print_help_and_exit "print_help_and_exit") :void)
(cffi:defcfun (some_funcions "some_funcions") (:pointer :int)
(foo (:pointer bar))
(num :int))
这里用到的一些其他的函数
(defvar c-type-lisp-name-alist
'(("int" . :int)
("float" . :float)
("void" . :void)
("char" . :char)
("size_t" . :size)))
(defun c-type-to-lisp-name (type)
(if (atom type)
(or (cdr (assoc type c-type-lisp-name-alist #'string=))
type)
(list (car type) (c-type-to-lisp-name (cadr type)))))
;; TODO
(defun string-param-case (string) string)
注: 这里参考 6.1 Built-In Types 作为 c-type-lisp-name-alist
的定义.
实际上的效果还行, 如果愿意处理一下更多的边缘条件的话, 感觉可以做得更好一些.
那么挑战一下稍微复杂一些的真实环境中的头文件的解析
这里用的是苹果的 MLX (一个接下来想做的东西) 作为测试例子:
(setq parser (treesit-parser-create 'c (get-buffer "array.h")))
最终的效果如下:
(cffi:defcfun (mlx_dtype_size "mlx_dtype_size") :size
(dtype mlx_dtype))
(cffi:defcfun (mlx_array_tostring "mlx_array_tostring") :int
(str (:pointer mlx_string))
(arr mlx_array))
(cffi:defcfun (mlx_array_new "mlx_array_new") mlx_array)
(cffi:defcfun (mlx_array_free "mlx_array_free") :int
(arr mlx_array))
(cffi:defcfun (mlx_array_new_bool "mlx_array_new_bool") mlx_array
(val bool))
(cffi:defcfun (mlx_array_new_int "mlx_array_new_int") mlx_array
(val :int))
(cffi:defcfun (mlx_array_new_float32 "mlx_array_new_float32") mlx_array
(val :float))
(cffi:defcfun (mlx_array_new_float "mlx_array_new_float") mlx_array
(val :float))
(cffi:defcfun (mlx_array_new_float64 "mlx_array_new_float64") mlx_array
(val double))
(cffi:defcfun (mlx_array_new_double "mlx_array_new_double") mlx_array
(val double))
(cffi:defcfun (mlx_array_new_complex "mlx_array_new_complex") mlx_array
(real_val :float)
(imag_val :float))
(cffi:defcfun (mlx_array_new_data "mlx_array_new_data") mlx_array
(data (:pointer :void))
(shape (:pointer :int))
(dim :int)
(dtype mlx_dtype))
(cffi:defcfun (mlx_array_set "mlx_array_set") :int
(arr (:pointer mlx_array))
(src mlx_array))
(cffi:defcfun (mlx_array_set_bool "mlx_array_set_bool") :int
(arr (:pointer mlx_array))
(val bool))
(cffi:defcfun (mlx_array_set_int "mlx_array_set_int") :int
(arr (:pointer mlx_array))
(val :int))
(cffi:defcfun (mlx_array_set_float32 "mlx_array_set_float32") :int
(arr (:pointer mlx_array))
(val :float))
(cffi:defcfun (mlx_array_set_float "mlx_array_set_float") :int
(arr (:pointer mlx_array))
(val :float))
(cffi:defcfun (mlx_array_set_float64 "mlx_array_set_float64") :int
(arr (:pointer mlx_array))
(val double))
(cffi:defcfun (mlx_array_set_double "mlx_array_set_double") :int
(arr (:pointer mlx_array))
(val double))
(cffi:defcfun (mlx_array_set_complex "mlx_array_set_complex") :int
(arr (:pointer mlx_array))
(real_val :float)
(imag_val :float))
(cffi:defcfun (mlx_array_set_data "mlx_array_set_data") :int
(arr (:pointer mlx_array))
(data (:pointer :void))
(shape (:pointer :int))
(dim :int)
(dtype mlx_dtype))
(cffi:defcfun (mlx_array_itemsize "mlx_array_itemsize") :size
(arr mlx_array))
(cffi:defcfun (mlx_array_size "mlx_array_size") :size
(arr mlx_array))
(cffi:defcfun (mlx_array_nbytes "mlx_array_nbytes") :size
(arr mlx_array))
(cffi:defcfun (mlx_array_ndim "mlx_array_ndim") :size
(arr mlx_array))
(cffi:defcfun (mlx_array_shape "mlx_array_shape") (:pointer :int)
(arr mlx_array))
(cffi:defcfun (mlx_array_strides "mlx_array_strides") (:pointer :size)
(arr mlx_array))
(cffi:defcfun (mlx_array_dim "mlx_array_dim") :int
(arr mlx_array)
(dim :int))
(cffi:defcfun (mlx_array_dtype "mlx_array_dtype") mlx_dtype
(arr mlx_array))
(cffi:defcfun (mlx_array_eval "mlx_array_eval") :int
(arr mlx_array))
(cffi:defcfun (mlx_array_item_bool "mlx_array_item_bool") :int
(res (:pointer bool))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_uint8 "mlx_array_item_uint8") :int
(res (:pointer uint8_t))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_uint16 "mlx_array_item_uint16") :int
(res (:pointer uint16_t))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_uint32 "mlx_array_item_uint32") :int
(res (:pointer uint32_t))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_uint64 "mlx_array_item_uint64") :int
(res (:pointer uint64_t))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_int8 "mlx_array_item_int8") :int
(res (:pointer int8_t))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_int16 "mlx_array_item_int16") :int
(res (:pointer int16_t))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_int32 "mlx_array_item_int32") :int
(res (:pointer int32_t))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_int64 "mlx_array_item_int64") :int
(res (:pointer int64_t))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_float32 "mlx_array_item_float32") :int
(res (:pointer :float))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_float64 "mlx_array_item_float64") :int
(res (:pointer double))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_complex64 "mlx_array_item_complex64") :int
(res (:pointer :float))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_float16 "mlx_array_item_float16") :int
(res (:pointer float16_t))
(arr mlx_array))
(cffi:defcfun (mlx_array_item_bfloat16 "mlx_array_item_bfloat16") :int
(res (:pointer bfloat16_t))
(arr mlx_array))
(cffi:defcfun (mlx_array_data_bool "mlx_array_data_bool") (:pointer bool)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_uint8 "mlx_array_data_uint8") (:pointer uint8_t)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_uint16 "mlx_array_data_uint16") (:pointer uint16_t)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_uint32 "mlx_array_data_uint32") (:pointer uint32_t)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_uint64 "mlx_array_data_uint64") (:pointer uint64_t)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_int8 "mlx_array_data_int8") (:pointer int8_t)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_int16 "mlx_array_data_int16") (:pointer int16_t)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_int32 "mlx_array_data_int32") (:pointer int32_t)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_int64 "mlx_array_data_int64") (:pointer int64_t)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_float32 "mlx_array_data_float32") (:pointer :float)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_float64 "mlx_array_data_float64") (:pointer double)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_complex64 "mlx_array_data_complex64") (:pointer :float)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_float16 "mlx_array_data_float16") (:pointer float16_t)
(arr mlx_array))
(cffi:defcfun (mlx_array_data_bfloat16 "mlx_array_data_bfloat16") (:pointer bfloat16_t)
(arr mlx_array))
(cffi:defcfun (_mlx_array_is_available "_mlx_array_is_available") :int
(res (:pointer bool))
(arr mlx_array))
(cffi:defcfun (_mlx_array_wait "_mlx_array_wait") :int
(arr mlx_array))
(cffi:defcfun (_mlx_array_is_contiguous "_mlx_array_is_contiguous") :int
(res (:pointer bool))
(arr mlx_array))
(cffi:defcfun (_mlx_array_is_row_contiguous "_mlx_array_is_row_contiguous") :int
(res (:pointer bool))
(arr mlx_array))
(cffi:defcfun (_mlx_array_is_col_contiguous "_mlx_array_is_col_contiguous") :int
(res (:pointer bool))
(arr mlx_array))
感觉挺好的, 这样的话, 只需要进行一些简单的例外处理即可,
而对应的数据类型 (比如 mlx_array
) 可以自己做 wrapping,
这样的话就会比 c2ffi 多一些自由度了.
Ending
花了一天多才写好的简单小功能, 感觉最近编程的动力极其的弱… 啥也不想干, 这毕设真是害人啊, 坏了我的作息也磨灭了我的心情…