ObjC 0: WebDriver Download ObjC Runtime Documentation
About
记录一下我尝试复刻 objectvie-c-bridge (参考的是 LispWorks 的 API, CCL 也有一个 objective-c-bridge, 但是因为 CCL arm port 目前是缺失的, 所以我没法在我的电脑上面测试其表现, 我不是一个善于读文档的人).
注: 因为我做这个的目标完全只是为了不花钱去买 LispWorks 的 Licence, 以及给比较无聊的生活加点乐子和非游戏的打发时间的事做. 所以该项目在我能 被批准用学校的钱去购买 Licence 或者很忙的情况下就会被中断.
目前我的计划是这样的:
- 用 CFFI 建立 Objcective-C Runtime 的一个绑定
- 去了解一下 Objective-C Runtime 该怎么 编写
- 去模拟 LispWorks 的 ObjC 的函数
- 去添加其他的库, 或者做一些高层的 wrapping 之类的
- 去尝试实现 CLIM 的 backend 或者模拟 CAPI 的 API?
这估计会是一个跨越时间非常长的项目了…
不过这个 post 的主要内容是如何从苹果的官网上把文档给爬下来, 并解析成 CFFI 可以使用的形式. 其中使用了我对之前 WebDriver 协议的一个小小改进 版本的代码 (gist).
一些基于 WebDriver 库的小小 wrapper
(defmacro map-find-elems ((elem selector &optional (node '*session*))
&body body)
`(mapcar (lambda (,elem) ,@body) (wd:find-elems ,node ,selector)))
(defun find-text (selector &optional (node *session*) (retry 5) (wait 1.0))
(handler-case (wd:text (wd:find-elem node selector))
(wd::webdriver-error (err)
(cond ((> retry 0)
(sleep wait)
(find-text selector node (1- retry) wait))
(T
(error err))))))
Objective-C 文档的读取
新建一个 WebDriver Session:
(defparameter *session* (wd:make-webdriver-session))
然后访问苹果 Objective-C Runtime 的网页:
(wd:navigate *session* "https://developer.apple.com/documentation/objectivec/objective-c-runtime?language=objc")
Section
其被分隔成多个 section, 于是可以提取到 sections
中, 并提取每章的子文档
的链接用于之后分章节进行实现:
(defparameter sections
(map-find-elems (section "div.contenttable-section")
(list (find-text "h3.contenttable-title" section)
(map-find-elems (link "a:not(.deprecated):has(code)" section)
(cons (wd:text link) (wd:property link "href"))))))
注: 这里用 :has(code)
的 CSS selector 来选择是子节点 code
类型的链接而
不是链接到其他的说明文档去.
类似如下:
Section | Links Counts |
Working with Classes | 30 |
Adding Classes | 5 |
Instantiating Classes | 3 |
Working with Instances | 10 |
Obtaining Class Definitions | 6 |
Working with Instance Variables | 3 |
Associative References | 3 |
Sending Messages | 5 |
Working with Methods | 13 |
Working with Libraries | 3 |
Working with Selectors | 4 |
Working with Protocols | 15 |
Working with Properties | 4 |
Using Objective-C Language Features | 9 |
Class-Definition Data Structures | 9 |
Instance Data Types | 3 |
Boolean Value | 1 |
Associative References | 1 |
Constants | 0 |
Related Documentation | 0 |
Reference | 0 |
Function, Type Alias, Structure
对于单个文档, 例:
(defparameter link
(let ((section (first sections)))
(destructuring-bind (title (code . rest)) section
(declare (ignore rest))
(format t ";;; ~A~%" title)
(format t "~A~%~A~%" (car code) (cdr code))
(cdr code))))
;;; Working with Classes class_getName https://developer.apple.com/documentation/objectivec/class_getname(_:)?language=objc
其有一些比较有用的信息:
div.topictitle
: 类别和简要文档说明objc-doc-type
,objc-doc-short
(defun objc-doc-type (session) (find-text "div.topictitle > span" session)) (defun objc-doc-short (session) (find-text "div.abstract" session))
pre.source > code
: lambda listobjc-doc-lambda
(defun objc-doc-lambda (session) (find-text "pre.source > code" session))
例:
extern const char * class_getName(Class cls);
这里有一个比较有趣的事情是如何解析这个
objc-lambda
.#parameters
: 参数说明objc-doc-params
(defun objc-doc-params (session) (let ((param (first (wd:find-elems session "#parameters + dl")))) (when param (mapcar #'cons (map-find-elems (name "dt" param) (get-nickname (wd:text name))) (map-find-elems (doc-paras "dd" param) (map-find-elems (para "p" doc-paras) (objc-doc-text para)))))))
例:
class A class object. objc-doc-text 的一个说明
用于将 HTML 转换为可读的 Lisp 文档:
(defparameter *objc-nickname-alist* '(("cls" . "class") ("Class" . "objc-class") ("Method" . "objc-method") ("IMP" . "objc-imp") ("SEL" . "objc-sel") ("Protocol" . "objc-protocol"))) (defun get-nickname (key) (let ((cons (assoc key *objc-nickname-alist* :test #'equal))) (if cons (cdr cons) (str:param-case key)))) (defun objc-doc-text (node) (let ((dom (plump:parse (wd:property node "innerHTML")))) (flet ((parse (node) (if (or (plump:text-node-p node) (and (string= (plump:tag-name node) "code") (/= (length (plump:children node)) 1))) (plump:text node) (format nil "`~A'" (get-nickname (plump:text node)))))) (str:join "" (map 'list #'parse (plump:children dom))))))
#return-value
: 返回值objc-doc-return-value
(defun objc-doc-return-value (session) (map-find-elems (elem "#return-value ~ p" session) (objc-doc-text elem)))
例:
The name of the class, or the empty string if class is nil. #Discussion
: 一些额外的说明objc-doc-discussion
(defun objc-doc-discussion (session) (map-find-elems (elem "#Discussion ~ p" session) (objc-doc-text elem)))
于是可以实现 objc-doc-parse-url
的功能:
(defun objc-doc-parse-name-url-cons (cons)
(let ((name (car cons))
(url (cdr cons)))
(unless (string= (wd:url *session*) url)
(wd:navigate *session* url))
(list :name name
:type (objc-doc-type *session*)
:doc (objc-doc-short *session*)
:lambda (objc-doc-lambda *session*)
:params (objc-doc-params *session*)
:return (objc-doc-return-value *session*)
:discussion (objc-doc-discussion *session*))))
例:
(objc-doc-parse-name-url-cons (first (second (first sections))))
(:name "class_getName" :type "Function" :doc "Returns the name of a class." :lambda "extern const char * class_getName(Class cls);" :params (("class" "A class object.")) :return ("The name of the class, or the empty string if `class' is `nil'.") :discussion nil)
Parse Function Lambda
Tokenrize
(defparameter *objc-keywords-alist*
'(("extern" . :extern)
("struct" . :struct)
("unsigned" . :unsigned)
("const" . :const)))
(defparameter *objc-type-alist*
'(("char" . :char)
("int" . :int)
("void" . :void)
("uint8_t" . :uint8)
("size_t" . :size)
;; 注: 这里开了一个 parser 的洞
("void (*)(id)" . (:function objc-id))))
(defun objc-token-regexp ()
"void \\(\\*\\)\\(id\\)|[a-zA-Z][a-zA-Z0-9_]*|\\(|\\)|\\;|\\,|\\*")
(defun objc-lexer (str)
(flet ((tokenrize (token)
(cond ((string= token ";") (values :eol :eol))
((string= token "*") (values :pointer :pointer))
((string= token "(") (values :args-start :args-start))
((string= token ")") (values :args-end :args-end))
((string= token ",") (values :comma :comma))
((assoc token *objc-keywords-alist* :test #'equal)
(let ((token (cdr (assoc token *objc-keywords-alist* :test #'equal))))
(values token token)))
((assoc token *objc-type-alist* :test #'equal)
(let ((token (cdr (assoc token *objc-type-alist* :test #'equal))))
(values 'type token)))
(T (values 'name (intern (str:upcase (get-nickname token))))))))
(let ((search 0)
(regexp (ppcre:create-scanner (objc-token-regexp))))
(lambda ()
(multiple-value-bind (start end) (ppcre:scan regexp str :start search)
(when end (setf search end))
(if start
(tokenrize (str:substring start end str))
(values nil nil)))))))
(defun objc-tokenrize (str)
(let ((lexer (objc-lexer str)))
(loop for (terminal value) = (multiple-value-list (funcall lexer))
while terminal collect (list terminal value))))
这个 tokenizer 的实现还是挺 trivial 的, 毕竟需要处理的问题有限, 可以进行一个肮脏的开洞.
例:
:extern | :extern |
name | ivar |
name | class-get-instance-variable |
:args-start | :args-start |
name | objc-class |
name | class |
:comma | :comma |
:const | :const |
type | :char |
:pointer | :pointer |
name | name |
:args-end | :args-end |
:eol | :eol |
Grammer Parser
使用 cl-yacc 作为 parser generator:
(yacc:define-parser *objc-lambda-parser*
(:start-symbol objc-lambda)
(:terminals (name type :comma :extern :const :struct :unsigned
:pointer :args-start :args-end :eol))
(objc-lambda
(:extern types name args :eol
(lambda (extern type name args eol)
(declare (ignore extern eol))
(list (list name type) args))))
(types
type
name
(:const types (lambda (const type)
(declare (ignore const))
type))
(:unsigned types (lambda (unsigned type)
(list unsigned type)))
(:struct name (lambda (struct name)
(list struct name)))
(types :pointer (lambda (type pointer)
(list pointer type))))
(args
(:args-start pair* :args-end (lambda (a pairs c)
(declare (ignore a c))
pairs))
(:args-start :args-end (constantly nil)))
(pair*
(types name (lambda (type name)
(list (list name type))))
(types name :comma pair* (lambda (type name comma pairs)
(declare (ignore comma))
(cons (list name type) pairs)))))
(defun objc-parse-lambda (objc-lambda)
(yacc:parse-with-lexer (objc-lexer objc-lambda) *objc-lambda-parser*))
语法的设计的问题
这里有个小问题就是在 types
节点的 types :pointer
的语法, 这可能会导致
parser 陷入无限循环的 bug 中. 但是能跑就行了?
例:
((class-get-name :char) ((class objc-class)))
Lisp-Spider, Go!
(defparameter *objc-runtime-doc*
(dolist-bind-collect ((title elem) sections)
(list title (dolist-bind-collect ((method . url) elem)
(let ((property (objc-doc-parse-name-url-cons (cons method url))))
(when (string= (getf property :type) "Function")
(setf (getf property :cffi)
(objc-parse-lambda (getf property :lambda))))
property)))))
End
最终处理结束的结果可以见 gist, 理论上来说通过一些简单的 format
操作就
能够生成 CFFI 的 bindings 了. 不过考虑到自动生成的 bingdings 可能并不
是那么的可用 (不知道正确性, 毕竟不太会 CFFI 和 ObjC Runtime), 所以我决
定先去看看 ObjC Runtime 的一些 Hello World 的例子, 然后去尝试构建模拟
兼容 LispWorks 的 ObjC-Bridge API.