About

这是一个 Categories for AI 的快速阅读笔记, 目标是写一个简单的实现以检验我自己真的学会了.

注: 为了快速了解, 我没有选择看视频而是看 Slides 与对应的 paper. 可能会有部分内容有缺漏或者不正确.

免责声明: 不是数学系的, 也不是计算机系的, 所以数学证明或者相关的部分讲究一个差不多得了和一眼 obvious, 计算机部分讲究一个能跑就行别问太多.

全文省流版

Theory
- Category = Object + Morphism \(→\) 提供高层的抽象
- Fuctor: Cat \(→\) Cat
- Monoidal: abstract for parallel process

2. 3.

Essential Building Blocks

Category Therory

省流版:

范畴 = 对象 + 对象间的映射关系 (morphism)
四要素: [identity, compose, id compose, (h * (g * f)) = (h * g) * f]
- 不同种类的 object
  - initial object I -> {a1, a2, a3}
  - terminal object {a1, a2, a3} -> T
- 不同种类的 morphism
  - epimorphism (满射)
  - monomorphism (单射)
函子: 范畴间的映射

小节

哦, 对了, 这学期开始的时候貌似还想要读点范畴论的数学书? 虽然最后鸽了… 不过这就当作我已经差不多读完了范畴论的书了吧.

Category = Objects + Morphisms

b首先是一个范畴论的快速入门?

Objects

Definition: \(T\) is a terminal object in category \(C\) if, \(∀ A ∈ C, ∃! f : A → T\), aka., \(|Hom(A, T)| = 1\).
- unit set is a terminal object in set
Definition: \(I\) is an initial object in category \(C\) if, \(∀ A ∈ C, ∃! f : I → A\), aka., \(|Hom(I, A)| = 1\).
- \(∅ → A\) is a initial object in set, though it is not callable

不一定正确的理解

从图上看, 大概所有点都汇聚 (指向) \(T\), 从 \(I\) 指向所有点, 但是只有一条.

/_img/cat-for-ai/digraph-terminal-initial.svg

Definition: object operations
- cartesian product (笛卡尔积) \(A, B ∈ \boldsymbol{C}, A × B ∈ \boldsymbol{C}\)
  - projection morphisms \(p_A : A × B → A, p_B : A × B → B\)
  - coproduct \(A \coprod B\)
- exponential object: \(B^A\)
  
  where:
  - \(v : B^A × A → B\)
  - \(e : X × A → B\)
  - unique morphism \(u : X → B^A\)
  example:
  - Currying: \(f : X × Y → Z, f_c : X → Z^Y\)
```
(defun curry-rest-args (fn &rest args)
  "Curry arguments with `args'.

Example:

     ##################
     #         +---+  #
 arg ----------*   |  #
     #         | f *------ result
     # args ---*   |  #
     #         +---+  #
     ##################
  (curry-rest-args f args)
"
  (lambda (arg) (apply fn (cons arg args))))

;; Example:
(let ((1+ (curry-rest-args #'+ 1))
      (2* (curry-rest-args #'* 2)))
  (list (funcall 1+ 1) (funcall 1+ 2)
        (funcall 2* 1) (funcall 2* 2)))
```
```
(2 3 2 4)
            
```
    一个想法
    所以应该是学会了数学再去学应用, 还是学会了应用再去学数学呢? 虽然我现在总有一种: 感觉不如先学点理论对应的实物再去接触理论. 比如在看了 SDF (Software Designed for Flexibility) 的第一章后, 感觉对这个 curry 完全可以理解, 反而是直接看理论的说明看不懂…
    可是难道不看这个理论, 我会想到有这种联系?

Examples

object (type) 的运算 (example from cats4progs):

tuple types
能让我想到的大概就两个:
- list of values
- multiple-values

sum types

能让我想到的:

class and superclasses
```
(defclass b (a1 a2 a3) ())
```

type declaration

(lambda (x)
  (declare ((or symbol list) x))
  ;; ...
  x)

当然也可以用:

(deftype basic-component ()
  '(or symbol list))

不过虽然在 Programming with Categories 这本书里面用的是 Haskell, 并且感觉里面注重的内容是强调 Haskell 的 type 是 object. 实际上范畴的概念应该可以用到各种地方吧.

exponential objects

如上, 是 carray

同理, 有 uncarray

(defun uncarray (fn)
  "uncarray : (x -> (y -> z)) -> ((x, y) -> z)"
  (lambda (a b)
    (funcall (funcall fn a) b)))

Morphisms

Definition: A morphism \(f : A → B\) is an epimorphism (epic morphism) if, for any morphism pair \(g, h : B → C\):
\[(h ˆ f = g ˆ f) ⇒ (h = g) \]
Definition: A morphism \(f : A → B\) is an monomorphism if, for any morphism pair \(g, h : B → C\):
\[(f ˆ h = f ˆ g) ⇒ (h = g)\]
\(C\) 和 \(C^{op}\) 中, epimorphism 和 monomorphism 互相对应
set category 中的例子
- epimorphism 就是满射 (surjections)
- monomorphism 就是单射 (injections)
所以这里的操作就相当于是在用范畴论的方式把集合范畴中的满射和单射一般化. 但是需要注意的是, 因为范畴, 或者关系图中的箭头是单方向有向的, 所以并不显然存在 “反” 向的箭头, 即不存在显然的 “一一映射” (isomorphism)

Examples

Definition: category Rel
- objects \(←\) sets
- morphisms \(←\) relations between sets
- \(⇒\) initial and terminal objects is \(∅\)
Definition: category Groups
Definition: category Vect
- objects are sets \(V\) (linear space) of vectors \(\boldsymbol{v}\)
- morphisms are linear transformations between these spaces

Functors

Definition: \(F : \boldsymbol{C} → \boldsymbol{D}\) maps objects in \(\boldsymbol{C}\) to \(\boldsymbol{D}\).
- \(F(\mathrm{id}_A) = \mathrm{id}_{F(A)}\)
- \(F(g ˆ f) = F(g) ˆ F(f)\)

Example: implementation in CL (from cats4prog)

a functor maps types to types, and functions to functions

;; F: f -> F(f)
(defmethod F ((f function)) ...)

;; F: integer -> F(integer)
(defmethod F ((x integer)) ...)

;; which is equal to
(defun functor (x)
  (typecase x
    (function  ...)
    (integer   ...)
    ;; else
    (otherwise ...)))

Definition: bifunctor \(\mathcal{C} × \mathcal{C} → \mathcal{C}\) (假设对于 \(\mathcal{C}\) 中所有的对象都存在 product)

(defun bifunctor (product f g)
  "Bifunctor:

    ###########
 a --- f -+   #
    #     |   #
    # product --->
    #     |   #
 b --- g -+   #
    ###########
"
  (lambda (a b)
    (funcall product (funcall f a) (funcall g b))))

;; equal to (+ (1+ 2) (2* 3))
(funcall (bifunctor #'+ #'1+ #'2*) 2 3) ; => 9

一个简单的例子:

/_img/cat-for-ai/bifunctor-example.svg

Definition: profunctors

Message passing, categorically \(←\) Graph Neural Networks are Dynamic Programmers

注: 这部分是 Slide + Paper 的组合, 不过感觉在

省流版:

monoidal categories 用于描述 parallel process

Categorical Dataflow

该部分是 Slide + Paper + Implementation 的组合, 我应该会着重 implementation 的复刻.

注: 虽然但是, 为了能够看懂这个 Monoidal 确实很花时间.

Monoidal Theory

注: 我到现在还是不太理解 Monoidal 的这个概念. 是一个局域闭包用于状态/变量值的传递/储存?

Monoial

Definition: string diagrams as category
- objects: strings
- morphisms: boxes

Example: but for programming

将 string 视为输入的参数
将 boxes 视为对参数进行运算的函数
哦, 你说多返回值? (values v1 v2 v3)

其他的一些 string 的类型:

string 不会缠绕 (就算缠在一起也视为直接连接两点)

split strings

(defun split-call (f g)
  "Split string.

     ########
     #      #
     # +- f --- out
 in ---+    #
     # +- g --- out
     ########
"
  (lambda (in)
    (values (funcall f in) (funcall g in))))

(funcall (split-call #'1+ #'2*) 2)      ; => 3, 4

end strings

(defun string-terminate (input)
  "End string.

  input ----[*]"
  (declare (ignore input)))

start strings

in 1 ------+
           +----- out
in 2 ------+

Definition: a monodial category is a tuple
\[(\boldsymbol{C}, ⊗, \boldsymbol{1}, α, λ, ρ)\]

consisting of
- category \(\boldsymbol{C}\)
- functor \(⊗\)
- object \(\boldsymbol{1} ∈ \boldsymbol{C}\) called monoidal unit
- associativity isomorphism \(α\) :
  \[(X ⊗ Y) ⊗ Z \xrightarrow[≅]{α_{X,Y,Z}} X ⊗ (Y ⊗ Z)\]
- left/right unit isomorphism \(λ_X, ρ_X\)
  \[\boldsymbol{1} ⊗ X \xrightarrow[≅]{λ_x} X \ \mathrm{and} \ X ⊗ \boldsymbol{1} \xrightarrow[≅]{ρ_X} X\]
s.t.:
- Unity Axioms:
  
  Note that \(λ_{\boldsymbol{1}} = ρ_{\boldsymbol{1}} : \boldsymbol{1} ⊗ \boldsymbol{1} \xrightarrow{≅} \boldsymbol{1}\).
  Examples
  
  一些图示:
  - \((X ⊗ Y) ⊗ W \xrightarrow{f ⊗ \boldsymbol{1}_W} Z ⊗ W \xrightarrow{g} M\)
  - \((f ˆ g) ⊗ (h ˆ i) = (f ⊗ h) ˆ (g ⊗ i)\)
- Pentagon Axiom:
Definition: comonoid

注 & Examples Graphs

这部分我感觉比 cat4prog 写得要好. 感觉学习 Haskell 的同学真难啊, 既要学一门新语言的语法, 还要学范畴论, 真是不容易啊 (笑).

Examples:

List
- \(⊗ : \mathrm{list} × \mathrm{list} → \mathrm{list}\) append
- \(\boldsymbol{1} : \mathrm{list}\) ()
\((\mathbb{R}, +, 0)\) and \((\mathbb{R}, ×, 1)\) (实数域上的加法和乘法)
\((\mathbb{B}, ∧, \mathrm{TRUE})\) (布尔代数)
\((\mathrm{Euc}, ×, \boldsymbol{1})\) (Euclid 空间) \(\mathbb{R}^N × \mathbb{R}^K \xrightarrow{f × g} \mathbb{R}^M × \mathbb{R}^L\)
范畴 FinStoch
- objects: finite sets
- morphisms: Markov kernels
- \((\mathrm{FinStoch}, ⊗, \boldsymbol{1})\)

Lens Category

Definition: the category \(\mathrm{Len}(\mathcal{C})\)
- objects: pairs \(\left( \begin{matrix}A \ A'\end{matrix} \right)\) of objects in \(\mathcal{C}\)
- morphism: \(\left( \begin{matrix}A \ A'\end{matrix} \right) → \left( \begin{matrix}B \ B'\end{matrix} \right)\) is a pair \(\left( \mathrm{view}, \mathrm{upd} \right)\) where: \(\begin{matrix} \mathrm{view} : A → B \ \mathrm{and} \ \mathrm{upd} : A × B' → A' \end{matrix}\) are morphisms in \(\mathcal{C}\).
将其记为 \(\left( \begin{matrix}A \ A'\end{matrix} \right) \xrightarrow{\mathrm{view}, \mathrm{upd}} \left( \begin{matrix}B \ B'\end{matrix} \right)\).
- compositiion: \(\left( \left( \begin{matrix}A \ A'\end{matrix} \right) \xrightarrow{(v_1, u_1)} \left( \begin{matrix}B \ B'\end{matrix} \right) \right) ˆ \left( \left( \begin{matrix}B \ B'\end{matrix} \right) \xrightarrow{(v_2, u_2)} \left( \begin{matrix}C \ C'\end{matrix} \right) \right) = \left( \begin{matrix}A \ A'\end{matrix} \right) \xrightarrow{(v, u)} \left( \begin{matrix}C \ C'\end{matrix} \right)\)
  - \(v = v_2 ˆ v_1\)
  - \(u(a, c) = u_1(a, u_2(v_1(a), c))\)
Lens category is a category where morphisms have a forward and a backward component
差不多的感觉就是 Lens 包含前向和反向.

好了, 理论到此为止, 来点实际应用的:

X as Lens

Derivatives as Lens: \(\left( \begin{matrix} \mathbb{R}^2 \ \mathbb{R}^2 \end{matrix} \right) \xrightarrow{(f, ∇ f)} \left( \begin{matrix} \mathbb{R} \ \mathbb{R} \end{matrix} \right)\)
- Chain Rule as Lens composition
- Backprop: functor \(\mathrm{Euc} → \mathrm{Lens}(\mathrm{Euc})\)
Optimisers as Lens: \(\left( \begin{matrix}\mathbb{R}^p × \mathbb{R}^p \ \mathbb{R}^p × \mathbb{R}^p\end{matrix} \right) \xrightarrow{(v, u)} \left( \begin{matrix}\mathbb{R}^p \ \mathbb{R}^p\end{matrix} \right)\)
- Gradient Descent
  - \(v(w) = w\)
  - \(u(w, Δ w) = w - α Δ w\)
- Momentum
  - \(\mathrm{view}(v_{t-1}, w_t) = w_t - γ v_{t - 1}\)
  - \(\mathrm{upd}(v_{t-1}, w_t, Δ w_t) = (v_t, w_{t+1})\)
  - \(v_t = γ v_{t-1} + d Δ w_t\)
  - \(w_{t+1} = w_t - v_t\)

If you want to implement Lens

感觉实现 Lens 并不是一个非常困难的事情, 虽然理论听起来很复杂, 但是实现起来竟然异常的简单:

(defclass lens ()
  ((view :initarg :view
         :initform (error "Missing :view")
         :documentation "view : A → B")
   (upd  :initarg :upd
         :initform (error "Missing `:upd'")
         :documentation "upd : A × B' → A'"))
  (:documentation
   "Monomorphic lenses, stored as a pair of maps. "))

(defun lens (view upd)
  "Make a `lens' object. "
  (declare (function view upd))
  (make-instance 'lens :view view :upd upd))

(defmacro let-lens (bindings &body body)
  "Bindings like (view upd lens). "
  `(let ,(loop for (v u lens) in bindings
               collect `(,v (slot-value ,lens 'view))
               collect `(,u (slot-value ,lens 'upd)))
     ,@body))

;; f ○ g = f(g(x))
(defmethod compose ((f function) (g function))
  (lambda (x) (funcall f (funcall g x))))

;; Lens1 ○ Lens2 : v = v2 ○ v1; u(a, c') = u1(a, u2(v1(a), c))
(defmethod compose ((lens1 lens) (lens2 lens))
  (let-lens ((v1 u1 lens1) (v2 u2 lens2))
    (lens (compose v2 v1)
          (lambda (x y) (funcall u1 x (funcall u2 (funcall v1 x) y))))))

(defmethod call ((lens lens) pair)
  (let-lens ((v u lens))
    (let ((a (car pair))
          (b (cdr pair)))
      (cons (funcall v a)
            (funcall u a b)))))

Derivatives as lens

(flet ((f (vec)
         (apply (lambda (x1 x2) (+ (* 3 (* x1 x1)) (* 7 x2))) vec))
       (df (vec dy)
         (apply (lambda (x1 x2) (declare (ignore x2))
                  (list (* 6 x1 dy) (* 7 dy)))
                vec)))
  (call (lens #'f #'df) (cons (list 1 2) 0.1))) ; (17 . (0.7 0.1))

Chain rule as lens composition

(flet ((f (vec)
         (apply (lambda (x1 x2) (+ (* 3 (* x1 x1)) (* 7 x2))) vec))
       (df (vec dy)
         (apply (lambda (x1 x2) (declare (ignore x2))
                  (list (* 6 x1 dy) (* 7 dy)))
                vec))
       (dcos (x dy)
         (* (sin x) dy)))
  (call (compose (lens #'f   #'df)
                 (lens #'cos #'dcos))
        (cons (list 1 2) 0.1))) ; (-0.27516335 . (-0.5768385 -0.6729783))

即想要构造一个 backprop, 只需要构造:
Optimisers as lenses

注: 但是我并不觉得这个的实现做得比较好, 之后还是得想点方法来重新实现一下. 不过这种写法你完全可以直接迁移到其他编程语言, 比如 C 语言中去:

typedef struct lens {
  void* (*view) (void *arg);
  void* (*upd)  (void *arg);
} Lens;

虽然感觉会很麻烦就是了.

注: 这里的实现并不好用, 实际使用的实现见 cat4ai.

Optics Category

Definition: the category \(\mathrm{Optic}(\mathcal{C})\)
- objects: pairs \(\left( \begin{matrix}A \ A'\end{matrix} \right)\) of objects in \(\mathcal{C}\)
- morphism: \(\left( \begin{matrix}A \ A'\end{matrix} \right) \xrightarrow{(M, f, l)} \left( \begin{matrix}B \ B'\end{matrix} \right)\)
  - \(M ∈ \mathcal{C}\) object of \(\mathcal{C}\)
  - \(f : A → M ⊗ B\) morphisms
  - \(l : M ⊗ B' → A'\) morphisms
- compose:
  - \((M_1, f_1, l_1) ˆ (M_2, f_2, l_2) ⇒ (M_1 ⊗ M_2, f_1 ˆ (f_2 ⊗ M), (M ⊗ l_2) ˆ l_1)\)
WHY Optics?
- probabilitst bidrectional transformations
- bidirectional transformations with side-effect
- bidirectional transformations that operate on containers
Examples

Categorical Foundations of Gradient-Based Learning

接下来的具体的代码实现可以参考 cat4ai 仓库.

Definition: Parametrized Category para
Definition: Lenses Category lens

Definition: Parametric Lenses Category param-lens

其他吐槽

理论理论大好き?
在导师点评的时候我意识到了这样的一个 “理论之所以学起来痛苦” 原因. 大家想要知道/学的东西是那种非常具体的东西. 比如说, 有人说自己搞出了万物理论, 或者做了很牛逼的东西.

但是在向你介绍的时候, 每快要逼近这个最终的结论的时候, 就要说一句: “等等, 先让我介绍一下原理/前提”. 哇, 这种也太痛苦了…

这简直就是拼多多的再砍一刀啊. 这么搞很容易就会让大家失去兴趣了… 要是有能够将理论和实践结合起来的方法就好了. 比如讲完实例后, 再说: “欸, 其实我们刚刚做的东西的背后是这个理论. “, 然后再举几个例子, 最后再一般化…

但是感觉要求很高… 而且感觉, 尤其是数学方面折腾的例子, 感觉和 “应用” 和 “事例” 关系并不大. 真是难呢.

File Configures

Graphviz Configures

rankdir=LR;
bgcolor="transparent";
fontname="Arial";
color="#888888";
node [shape="circle", fontname="Arial", color="#888888"];
edge [fontname="Arial", color="#888888"];

color="#888888:invis:#888888"

感觉自己纯手工敲这些 Graphviz 的绘图代码实在是太蠢了, 不过感觉自己以后应该也用不上画交换图这种技能, 这里暂时就忍了.