About

记录一下我尝试复刻 objectvie-c-bridge (参考的是 LispWorks 的 API, CCL 也有一个 objective-c-bridge, 但是因为 CCL arm port 目前是缺失的, 所以我没法在我的电脑上面测试其表现, 我不是一个善于读文档的人).

注: 因为我做这个的目标完全只是为了不花钱去买 LispWorks 的 Licence, 以及给比较无聊的生活加点乐子和非游戏的打发时间的事做. 所以该项目在我能 被批准用学校的钱去购买 Licence 或者很忙的情况下就会被中断.

目前我的计划是这样的:

  1. 用 CFFI 建立 Objcective-C Runtime 的一个绑定
  2. 去了解一下 Objective-C Runtime 该怎么 编写
  3. 去模拟 LispWorks 的 ObjC 的函数
  4. 去添加其他的库, 或者做一些高层的 wrapping 之类的
  5. 去尝试实现 CLIM 的 backend 或者模拟 CAPI 的 API?

这估计会是一个跨越时间非常长的项目了…

不过这个 post 的主要内容是如何从苹果的官网上把文档给爬下来, 并解析成 CFFI 可以使用的形式. 其中使用了我对之前 WebDriver 协议的一个小小改进 版本的代码 (gist).

一些基于 WebDriver 库的小小 wrapper
(defmacro map-find-elems ((elem selector &optional (node '*session*))
                          &body body)
  `(mapcar (lambda (,elem) ,@body) (wd:find-elems ,node ,selector)))

(defun find-text (selector &optional (node *session*) (retry 5) (wait 1.0))
  (handler-case (wd:text (wd:find-elem node selector))
    (wd::webdriver-error (err)
      (cond ((> retry 0)
             (sleep wait)
             (find-text selector node (1- retry) wait))
            (T
             (error err))))))

Objective-C 文档的读取

新建一个 WebDriver Session:

(defparameter *session* (wd:make-webdriver-session))

然后访问苹果 Objective-C Runtime 的网页:

(wd:navigate *session* "https://developer.apple.com/documentation/objectivec/objective-c-runtime?language=objc")

Section

其被分隔成多个 section, 于是可以提取到 sections 中, 并提取每章的子文档 的链接用于之后分章节进行实现:

(defparameter sections
  (map-find-elems (section "div.contenttable-section")
    (list (find-text "h3.contenttable-title" section)
          (map-find-elems (link "a:not(.deprecated):has(code)" section)
            (cons (wd:text link) (wd:property link "href"))))))

注: 这里用 :has(code) 的 CSS selector 来选择是子节点 code 类型的链接而 不是链接到其他的说明文档去.

类似如下:

SectionLinks Counts
Working with Classes30
Adding Classes5
Instantiating Classes3
Working with Instances10
Obtaining Class Definitions6
Working with Instance Variables3
Associative References3
Sending Messages5
Working with Methods13
Working with Libraries3
Working with Selectors4
Working with Protocols15
Working with Properties4
Using Objective-C Language Features9
Class-Definition Data Structures9
Instance Data Types3
Boolean Value1
Associative References1
Constants0
Related Documentation0
Reference0

Function, Type Alias, Structure

对于单个文档, 例:

(defparameter link
  (let ((section (first sections)))
    (destructuring-bind (title (code . rest)) section
      (declare (ignore rest))
      (format t ";;; ~A~%" title)
      (format t "~A~%~A~%" (car code) (cdr code))
      (cdr code))))
;;; Working with Classes
class_getName
https://developer.apple.com/documentation/objectivec/class_getname(_:)?language=objc

其有一些比较有用的信息:

  • div.topictitle: 类别和简要文档说明 objc-doc-type, objc-doc-short
    (defun objc-doc-type  (session) (find-text "div.topictitle > span" session))
    (defun objc-doc-short (session) (find-text "div.abstract" session))
    
  • pre.source > code: lambda list objc-doc-lambda
    (defun objc-doc-lambda (session) (find-text "pre.source > code" session))
    

    例:

    extern const char * class_getName(Class cls);
        

    这里有一个比较有趣的事情是如何解析这个 objc-lambda.

  • #parameters: 参数说明 objc-doc-params
    (defun objc-doc-params (session)
      (let ((param (first (wd:find-elems session "#parameters + dl"))))
        (when param
          (mapcar #'cons
                  (map-find-elems (name "dt" param)
                    (get-nickname (wd:text name)))
                  (map-find-elems (doc-paras "dd" param)
                    (map-find-elems (para "p" doc-paras)
                      (objc-doc-text para)))))))
    

    例:

    classA class object.
    objc-doc-text 的一个说明

    用于将 HTML 转换为可读的 Lisp 文档:

    (defparameter *objc-nickname-alist*
      '(("cls"      . "class")
        ("Class"    . "objc-class")
        ("Method"   . "objc-method")
        ("IMP"      . "objc-imp")
        ("SEL"      . "objc-sel")
        ("Protocol" . "objc-protocol")))
    
    (defun get-nickname (key)
      (let ((cons (assoc key *objc-nickname-alist* :test #'equal)))
        (if cons (cdr cons) (str:param-case key))))
    
    (defun objc-doc-text (node)
      (let ((dom (plump:parse (wd:property node "innerHTML"))))
        (flet ((parse (node)
                 (if (or (plump:text-node-p node)
                         (and (string= (plump:tag-name node) "code")
                              (/= (length (plump:children node)) 1)))
                     (plump:text node)
                     (format nil "`~A'" (get-nickname (plump:text node))))))
          (str:join "" (map 'list #'parse (plump:children dom))))))
    
  • #return-value: 返回值 objc-doc-return-value
    (defun objc-doc-return-value (session)
      (map-find-elems (elem "#return-value ~ p" session)
        (objc-doc-text elem)))
    

    例:

    The name of the class, or the empty string if class is nil.
  • #Discussion: 一些额外的说明 objc-doc-discussion
    (defun objc-doc-discussion (session)
      (map-find-elems (elem "#Discussion ~ p" session)
        (objc-doc-text elem)))
    

于是可以实现 objc-doc-parse-url 的功能:

(defun objc-doc-parse-name-url-cons (cons)
  (let ((name (car cons))
        (url  (cdr cons)))
    (unless (string= (wd:url *session*) url)
      (wd:navigate *session* url))
    (list :name   name
          :type   (objc-doc-type   *session*)
          :doc    (objc-doc-short  *session*)
          :lambda (objc-doc-lambda *session*)
          :params (objc-doc-params *session*)
          :return (objc-doc-return-value *session*)
          :discussion (objc-doc-discussion *session*))))

例:

(objc-doc-parse-name-url-cons (first (second (first sections))))
(:name "class_getName" :type "Function" :doc "Returns the name of a class."
 :lambda "extern const char * class_getName(Class cls);" :params
 (("class" "A class object.")) :return
 ("The name of the class, or the empty string if `class' is `nil'.")
 :discussion nil)

Parse Function Lambda

Tokenrize

(defparameter *objc-keywords-alist*
  '(("extern"   . :extern)
    ("struct"   . :struct)
    ("unsigned" . :unsigned)
    ("const"    . :const)))

(defparameter *objc-type-alist*
  '(("char"     . :char)
    ("int"      . :int)
    ("void"     . :void)
    ("uint8_t"  . :uint8)
    ("size_t"   . :size)
    ;; 注: 这里开了一个 parser 的洞
    ("void (*)(id)" . (:function objc-id))))

(defun objc-token-regexp ()
  "void \\(\\*\\)\\(id\\)|[a-zA-Z][a-zA-Z0-9_]*|\\(|\\)|\\;|\\,|\\*")

(defun objc-lexer (str)
  (flet ((tokenrize (token)
           (cond ((string= token ";") (values :eol        :eol))
                 ((string= token "*") (values :pointer    :pointer))
                 ((string= token "(") (values :args-start :args-start))
                 ((string= token ")") (values :args-end   :args-end))
                 ((string= token ",") (values :comma      :comma))
                 ((assoc token *objc-keywords-alist* :test #'equal)
                  (let ((token (cdr (assoc token *objc-keywords-alist* :test #'equal))))
                    (values token token)))
                 ((assoc token *objc-type-alist* :test #'equal)
                  (let ((token (cdr (assoc token *objc-type-alist* :test #'equal))))
                    (values 'type token)))
                 (T (values 'name (intern (str:upcase (get-nickname token))))))))
    (let ((search 0)
          (regexp (ppcre:create-scanner (objc-token-regexp))))
      (lambda ()
        (multiple-value-bind (start end) (ppcre:scan regexp str :start search)
          (when end (setf search end))
          (if start
              (tokenrize (str:substring start end str))
              (values nil nil)))))))

(defun objc-tokenrize (str)
  (let ((lexer (objc-lexer str)))
    (loop for (terminal value) = (multiple-value-list (funcall lexer))
          while terminal collect (list terminal value))))

这个 tokenizer 的实现还是挺 trivial 的, 毕竟需要处理的问题有限, 可以进行一个肮脏的开洞.

例:

:extern:extern
nameivar
nameclass-get-instance-variable
:args-start:args-start
nameobjc-class
nameclass
:comma:comma
:const:const
type:char
:pointer:pointer
namename
:args-end:args-end
:eol:eol

Grammer Parser

使用 cl-yacc 作为 parser generator:

(yacc:define-parser *objc-lambda-parser*
  (:start-symbol objc-lambda)
  (:terminals (name type :comma :extern :const :struct :unsigned
                         :pointer :args-start :args-end :eol))
  (objc-lambda
   (:extern types name args :eol
            (lambda (extern type name args eol)
              (declare (ignore extern eol))
              (list (list name type) args))))

  (types
   type
   name
   (:const    types (lambda (const type)
                      (declare (ignore const))
                      type))
   (:unsigned types (lambda (unsigned type)
                      (list unsigned type)))
   (:struct   name  (lambda (struct name)
                      (list struct name)))
   (types  :pointer (lambda (type pointer)
                      (list pointer type))))

  (args
   (:args-start pair* :args-end (lambda (a pairs c)
                                  (declare (ignore a c))
                                  pairs))
   (:args-start :args-end       (constantly nil)))

  (pair*
   (types name              (lambda (type name)
                              (list (list name type))))
   (types name :comma pair* (lambda (type name comma pairs)
                              (declare (ignore comma))
                              (cons (list name type) pairs)))))

(defun objc-parse-lambda (objc-lambda)
  (yacc:parse-with-lexer (objc-lexer objc-lambda) *objc-lambda-parser*))
语法的设计的问题

这里有个小问题就是在 types 节点的 types :pointer 的语法, 这可能会导致 parser 陷入无限循环的 bug 中. 但是能跑就行了?

例:

((class-get-name :char) ((class objc-class)))

Lisp-Spider, Go!

(defparameter *objc-runtime-doc*
  (dolist-bind-collect ((title elem) sections)
    (list title (dolist-bind-collect ((method . url) elem)
                  (let ((property (objc-doc-parse-name-url-cons (cons method url))))
                    (when (string= (getf property :type) "Function")
                      (setf (getf property :cffi)
                            (objc-parse-lambda (getf property :lambda))))
                    property)))))

End

最终处理结束的结果可以见 gist, 理论上来说通过一些简单的 format 操作就 能够生成 CFFI 的 bindings 了. 不过考虑到自动生成的 bingdings 可能并不 是那么的可用 (不知道正确性, 毕竟不太会 CFFI 和 ObjC Runtime), 所以我决 定先去看看 ObjC Runtime 的一些 Hello World 的例子, 然后去尝试构建模拟 兼容 LispWorks 的 ObjC-Bridge API.