About

这是一个 Categories for AI 的快速阅读笔记, 目标是写一个简单的实现以检验我自己真的学会了.

注: 为了快速了解, 我没有选择看视频而是看 Slides 与对应的 paper. 可能会有部分内容有缺漏或者不正确.

免责声明: 不是数学系的, 也不是计算机系的, 所以数学证明或者相关的部分讲究一个差不多得了和一眼 obvious, 计算机部分讲究一个能跑就行别问太多.

全文省流版

  1. Theory
    • Category = Object + Morphism \(→\) 提供高层的抽象
    • Fuctor: Cat \(→\) Cat
    • Monoidal: abstract for parallel process

2. 3.

Essential Building Blocks

Category Therory

省流版:

  1. 范畴 = 对象 + 对象间的映射关系 (morphism)

    四要素: [identity, compose, id compose, (h * (g * f)) = (h * g) * f]

    • 不同种类的 object
      • initial object I -> {a1, a2, a3}
      • terminal object {a1, a2, a3} -> T
    • 不同种类的 morphism
      • epimorphism (满射)
      • monomorphism (单射)
  2. 函子: 范畴间的映射
小节

哦, 对了, 这学期开始的时候貌似还想要​读点范畴论的数学书? 虽然最后鸽了… 不过这就当作我已经差不多读完了范畴论的书了吧.

Category = Objects + Morphisms

b首先是一个范畴论的快速入门?

Category

  • Definition: a category is a universe of objects, and morphisms between them, s.t.:

    所谓的​_范畴_​包含了​*对象*​以及对象间的​*映射(态射)*​关系, 其满足:

    • \(∀ A\) (\(A\) is an object), there is a unique identity morphism \(\mathrm{id}_A : A → A\)
    • \(∀ f : A → B, g : B → C\) (\(f\) and \(g\) is morphism), there is a composition \(g ˆ f : A → C\)
    • \(∀ f : A → B\), it holds that \(\mathrm{id}_B ˆ f = f ˆ \mathrm{id}_A = f\)
    • \(∀ f, g, h\), we have \(h ˆ (g ˆ f) = (h ˆ g) ˆ f\) (结合律)
    不正确的简单解释考虑这样的一个有向图:

    /_img/cat-for-ai/digraph-of-category.svg

    • 所有的节点都有从自己出发回到自己的通路
    • 若从一个节点出发可以到达另一个节点, 那么这两个节点就该被连在一起

了解, 咱研究的是对象间的关系而非对象本身.

  • Definition: the collection of morphisms between \(A\) and \(B\) is often denoted \(\mathrm{Hom}(A, B)\) (read as home-set).
  • 通常使用交换图 (commutative diagrams) 描述
  • 研究的是结构的关系 (connections) 而非对象的性质 (do not assume knowledge of what's in the objects)
  • Definition: the opposite category \(C^{\mathrm{op}}\) for a given category \(C\) is:
    • objects 相同
    • \(∀ f : A → B\) in \(C\), \(∃ f : B → A\) in \(C^{\mathrm{op}}\)
    • \(∀ g ˆ f : A → C\) in \(C\), \(∃ f ˆ g : C → A\) in \(C^{\mathrm{op}}\)

    相当于是交换图上把箭头全部反向

Example: set category and some implementation
  • objects are sets
  • morphisms are functions between them
  • Definition: a unit is a set with one element, denoted as \(()\)
    • \(f : A_i → ()\)
    • \(g_i : () → B\) (使得 \(()\) 的元素映射到 \(B\) 上, 类似于 \(\{ g_i () \} ≅ B\)).

为什么用 set (集合) 来作为例子? 因为熟, 并且范畴和对象无关, 所以用集合来理解完全没问题.

那么我们可以用程序来进行一个类比:

  • objects are values sets of different types, for example, in CL:
    • (unsigned-byte 32)
    • double-float
    • integer
    • null (Note: (class-of nil) got me common-lisp:null in SBCL)
  • morphisms are functions on objects, for example:
    (lambda (x) x)                          ; id morphism
    
  • morphism composition
    (defun compose (f g)                    ; morphism compose
      (lambda (&rest args)
        (funcall f (apply g args))))
    
  • example from cats4progs compose:
    (str:words "Lucky Me")                  ; => ("Lucky" "Me")
    
    (defun concat (strings)
      (str:join "" strings))
    (concat '("Lucky" "Me"))                ; => "LuckyMe"
    
    (funcall (compose #'concat #'str:words) "Lucky Me") ; "LuckyMe"
    ;; equal to (concat (str:words "Lucky Me"))
    

    虽然可能看起来非常的奇怪, 但是在 CLOS 中的 :after 方法就类似这样的操作:

    (defmethod print-object :after ((hist histogram) stream)
      (format stream "Hello World will be print afer"))
    

    或者你也可以用 :around 或者 Python 中的 decorator 来模拟类似的操作:

    def compose_with(f):
        def compose(g):
            def h(x):
                return f(g(x))
            return h
        return compose
    
    def one_plus(x):
        return x + 1
    
    @compose_with(one_plus)
    def two_times(x):
        return x * 2
    
    two_times(2)                    # => 5
    

Objects

  • Definition: \(T\) is a terminal object in category \(C\) if, \(∀ A ∈ C, ∃! f : A → T\), aka., \(|Hom(A, T)| = 1\).
    • unit set is a terminal object in set
  • Definition: \(I\) is an initial object in category \(C\) if, \(∀ A ∈ C, ∃! f : I → A\), aka., \(|Hom(I, A)| = 1\).
    • \(∅ → A\) is a initial object in set, though it is not callable
不一定正确的理解

从图上看, 大概所有点都汇聚 (指向) \(T\), 从 \(I\) 指向所有点, 但是只有一条.

/_img/cat-for-ai/digraph-terminal-initial.svg

  • Definition: object operations
    • cartesian product (笛卡尔积) \(A, B ∈ \boldsymbol{C}, A × B ∈ \boldsymbol{C}\)
      • projection morphisms \(p_A : A × B → A, p_B : A × B → B\)
      • coproduct \(A \coprod B\)

      /_img/cat-for-ai/cartesian-product.svg

    • exponential object: \(B^A\)

      /_img/cat-for-ai/exponential-objects.svg

      where:

      • \(v : B^A × A → B\)
      • \(e : X × A → B\)
      • unique morphism \(u : X → B^A\)

      example:

      • Currying: \(f : X × Y → Z, f_c : X → Z^Y\)
        (defun curry-rest-args (fn &rest args)
          "Curry arguments with `args'.
        
        Example:
        
             ##################
             #         +---+  #
         arg ----------*   |  #
             #         | f *------ result
             # args ---*   |  #
             #         +---+  #
             ##################
          (curry-rest-args f args)
        "
          (lambda (arg) (apply fn (cons arg args))))
        
        ;; Example:
        (let ((1+ (curry-rest-args #'+ 1))
              (2* (curry-rest-args #'* 2)))
          (list (funcall 1+ 1) (funcall 1+ 2)
                (funcall 2* 1) (funcall 2* 2)))
        
        (2 3 2 4)
                    
        一个想法所以应该是学会了数学再去学应用, 还是学会了应用再去学数学呢? 虽然我现在总有一种: 感觉不如先学点理论对应的实物再去接触理论. 比如在看了 SDF (Software Designed for Flexibility) 的第一章后, 感觉对这个 curry 完全可以理解, 反而是直接看理论的说明看不懂…

        可是难道不看这个理论, 我会想到有这种联系?

Examples

object (type) 的运算 (example from cats4progs):

  • tuple types

    能让我想到的大概就两个:

    • list of values
    • multiple-values
  • sum types

    能让我想到的:

    • class and superclasses
      (defclass b (a1 a2 a3) ())
      
    • type declaration
      (lambda (x)
        (declare ((or symbol list) x))
        ;; ...
        x)
      

      当然也可以用:

      (deftype basic-component ()
        '(or symbol list))
      

不过虽然在 Programming with Categories 这本书里面用的是 Haskell, 并且感觉里面注重的内容是强调 Haskell 的 type 是 object. 实际上范畴的概念应该可以用到各种地方吧.

  • exponential objects
    • 如上, 是 carray
    • 同理, 有 uncarray
      (defun uncarray (fn)
        "uncarray : (x -> (y -> z)) -> ((x, y) -> z)"
        (lambda (a b)
          (funcall (funcall fn a) b)))
      

Morphisms

  • Definition: A morphism \(f : A → B\) is an epimorphism (epic morphism) if, for any morphism pair \(g, h : B → C\):

    \[(h ˆ f = g ˆ f) ⇒ (h = g) \]

  • Definition: A morphism \(f : A → B\) is an monomorphism if, for any morphism pair \(g, h : B → C\):

    \[(f ˆ h = f ˆ g) ⇒ (h = g)\]

  • \(C\) 和 \(C^{op}\) 中, epimorphism 和 monomorphism 互相对应
    set category 中的例子
    • epimorphism 就是满射 (surjections)
    • monomorphism 就是单射 (injections)

    /_img/cat-for-ai/set-surjections-and-injections.svg

    所以这里的操作就相当于是在用范畴论的方式把集合范畴中的满射和单射一般化. 但是需要注意的是, 因为范畴, 或者关系图中的箭头是单方向有向的, 所以并不显然存在 “反” 向的箭头, 即不存在显然的 “一一映射” (isomorphism)

Examples

Examples
  • Definition: category Rel
    • objects \(←\) sets
    • morphisms \(←\) relations between sets
    • \(⇒\) initial and terminal objects is \(∅\)
  • Definition: category Groups
  • Definition: category Vect
    • objects are sets \(V\) (linear space) of vectors \(\boldsymbol{v}\)
    • morphisms are linear transformations between these spaces

Functors

  • Definition: \(F : \boldsymbol{C} → \boldsymbol{D}\) maps objects in \(\boldsymbol{C}\) to \(\boldsymbol{D}\).
    • \(F(\mathrm{id}_A) = \mathrm{id}_{F(A)}\)
    • \(F(g ˆ f) = F(g) ˆ F(f)\)

    /_img/cat-for-ai/category-functor.svg

Example: implementation in CL (from cats4prog)
  • a functor maps types to types, and functions to functions
    ;; F: f -> F(f)
    (defmethod F ((f function)) ...)
    
    ;; F: integer -> F(integer)
    (defmethod F ((x integer)) ...)
    
    ;; which is equal to
    (defun functor (x)
      (typecase x
        (function  ...)
        (integer   ...)
        ;; else
        (otherwise ...)))
    
    • Definition: bifunctor \(\mathcal{C} × \mathcal{C} → \mathcal{C}\) (假设对于 \(\mathcal{C}\) 中所有的对象都存在 product)
      (defun bifunctor (product f g)
        "Bifunctor:
      
          ###########
       a --- f -+   #
          #     |   #
          # product --->
          #     |   #
       b --- g -+   #
          ###########
      "
        (lambda (a b)
          (funcall product (funcall f a) (funcall g b))))
      
      ;; equal to (+ (1+ 2) (2* 3))
      (funcall (bifunctor #'+ #'1+ #'2*) 2 3) ; => 9
      

      一个简单的例子:

      /_img/cat-for-ai/bifunctor-example.svg

    • Definition: profunctors

Message passing, categorically \(←\) Graph Neural Networks are Dynamic Programmers

注: 这部分是 Slide + Paper 的组合, 不过感觉在

省流版:

  1. monoidal categories 用于描述 parallel process

Categorical Dataflow

该部分是 Slide + Paper + Implementation 的组合, 我应该会着重 implementation 的复刻.

注: 虽然但是, 为了能够看懂这个 Monoidal 确实很花时间.

Monoidal Theory

注: 我到现在还是不太理解 Monoidal 的这个概念. 是一个局域闭包用于状态/变量值的传递/储存?

Monoial

  • Definition: string diagrams as category
    • objects: strings
    • morphisms: boxes
Example: but for programming
  • string 视为输入的参数
  • boxes 视为对参数进行运算的函数
  • 哦, 你说多返回值? (values v1 v2 v3)

其他的一些 string 的类型:

  • string 不会缠绕 (就算缠在一起也视为直接连接两点)
  • split strings
    (defun split-call (f g)
      "Split string.
    
         ########
         #      #
         # +- f --- out
     in ---+    #
         # +- g --- out
         ########
    "
      (lambda (in)
        (values (funcall f in) (funcall g in))))
    
    (funcall (split-call #'1+ #'2*) 2)      ; => 3, 4
    
  • end strings
    (defun string-terminate (input)
      "End string.
    
      input ----[*]"
      (declare (ignore input)))
    
  • start strings
    in 1 ------+
               +----- out
    in 2 ------+
        
  • Definition: a monodial category is a tuple

    \[(\boldsymbol{C}, ⊗, \boldsymbol{1}, α, λ, ρ)\]

    consisting of

    • category \(\boldsymbol{C}\)
    • functor \(⊗\)
    • object \(\boldsymbol{1} ∈ \boldsymbol{C}\) called monoidal unit
    • associativity isomorphism \(α\) :

      \[(X ⊗ Y) ⊗ Z \xrightarrow[≅]{α_{X,Y,Z}} X ⊗ (Y ⊗ Z)\]

    • left/right unit isomorphism \(λ_X, ρ_X\)

      \[\boldsymbol{1} ⊗ X \xrightarrow[≅]{λ_x} X \ \mathrm{and} \ X ⊗ \boldsymbol{1} \xrightarrow[≅]{ρ_X} X\]

    s.t.:

    • Unity Axioms:

      /_img/cat-for-ai/unity-aximons.svg

      Note that \(λ_{\boldsymbol{1}} = ρ_{\boldsymbol{1}} : \boldsymbol{1} ⊗ \boldsymbol{1} \xrightarrow{≅} \boldsymbol{1}\).

      Examples

      一些图示:

      • \((X ⊗ Y) ⊗ W \xrightarrow{f ⊗ \boldsymbol{1}_W} Z ⊗ W \xrightarrow{g} M\)

        /_img/cat-for-ai/moncat-example.svg

      • \((f ˆ g) ⊗ (h ˆ i) = (f ⊗ h) ˆ (g ⊗ i)\)

        /_img/cat-for-ai/moncat-is-functor.svg

    • Pentagon Axiom:

      /_img/cat-for-ai/pentagon-digram.svg

  • Definition: comonoid
注 & Examples Graphs

这部分我感觉比 cat4prog 写得要好. 感觉学习 Haskell 的同学真难啊, 既要学一门新语言的语法, 还要学范畴论, 真是不容易啊 (笑).

Examples:

  • List
    • \(⊗ : \mathrm{list} × \mathrm{list} → \mathrm{list}\) append
    • \(\boldsymbol{1} : \mathrm{list}\) ()
  • \((\mathbb{R}, +, 0)\) and \((\mathbb{R}, ×, 1)\) (实数域上的加法和乘法)
  • \((\mathbb{B}, ∧, \mathrm{TRUE})\) (布尔代数)
  • \((\mathrm{Euc}, ×, \boldsymbol{1})\) (Euclid 空间) \(\mathbb{R}^N × \mathbb{R}^K \xrightarrow{f × g} \mathbb{R}^M × \mathbb{R}^L\)
  • 范畴 FinStoch
    • objects: finite sets
    • morphisms: Markov kernels
    • \((\mathrm{FinStoch}, ⊗, \boldsymbol{1})\)

Lens Category

  • Definition: the category \(\mathrm{Len}(\mathcal{C})\)

    /_img/cat-for-ai/lens-graph.svg

    • objects: pairs \(\left( \begin{matrix}A \ A'\end{matrix} \right)\) of objects in \(\mathcal{C}\)
    • morphism: \(\left( \begin{matrix}A \ A'\end{matrix} \right) → \left( \begin{matrix}B \ B'\end{matrix} \right)\) is a pair \(\left( \mathrm{view}, \mathrm{upd} \right)\) where: \(\begin{matrix} \mathrm{view} : A → B \ \mathrm{and} \ \mathrm{upd} : A × B' → A' \end{matrix}\) are morphisms in \(\mathcal{C}\).

    将其记为 \(\left( \begin{matrix}A \ A'\end{matrix} \right) \xrightarrow{\mathrm{view}, \mathrm{upd}} \left( \begin{matrix}B \ B'\end{matrix} \right)\).

    • compositiion: \(\left( \left( \begin{matrix}A \ A'\end{matrix} \right) \xrightarrow{(v_1, u_1)} \left( \begin{matrix}B \ B'\end{matrix} \right) \right) ˆ \left( \left( \begin{matrix}B \ B'\end{matrix} \right) \xrightarrow{(v_2, u_2)} \left( \begin{matrix}C \ C'\end{matrix} \right) \right) = \left( \begin{matrix}A \ A'\end{matrix} \right) \xrightarrow{(v, u)} \left( \begin{matrix}C \ C'\end{matrix} \right)\)
      • \(v = v_2 ˆ v_1\)
      • \(u(a, c) = u_1(a, u_2(v_1(a), c))\)
  • Lens category is a category where morphisms have a forward and a backward component

    差不多的感觉就是 Lens 包含前向和反向.

好了, 理论到此为止, 来点实际应用的:

X as Lens

  • Derivatives as Lens: \(\left( \begin{matrix} \mathbb{R}^2 \ \mathbb{R}^2 \end{matrix} \right) \xrightarrow{(f, ∇ f)} \left( \begin{matrix} \mathbb{R} \ \mathbb{R} \end{matrix} \right)\)
    • Chain Rule as Lens composition
    • Backprop: functor \(\mathrm{Euc} → \mathrm{Lens}(\mathrm{Euc})\)
  • Optimisers as Lens: \(\left( \begin{matrix}\mathbb{R}^p × \mathbb{R}^p \ \mathbb{R}^p × \mathbb{R}^p\end{matrix} \right) \xrightarrow{(v, u)} \left( \begin{matrix}\mathbb{R}^p \ \mathbb{R}^p\end{matrix} \right)\)
    • Gradient Descent
      • \(v(w) = w\)
      • \(u(w, Δ w) = w - α Δ w\)
    • Momentum
      • \(\mathrm{view}(v_{t-1}, w_t) = w_t - γ v_{t - 1}\)
      • \(\mathrm{upd}(v_{t-1}, w_t, Δ w_t) = (v_t, w_{t+1})\)
      • \(v_t = γ v_{t-1} + d Δ w_t\)
      • \(w_{t+1} = w_t - v_t\)
If you want to implement Lens

感觉实现 Lens 并不是一个非常困难的事情, 虽然理论听起来很复杂, 但是实现起来竟然异常的简单:

(defclass lens ()
  ((view :initarg :view
         :initform (error "Missing :view")
         :documentation "view : A → B")
   (upd  :initarg :upd
         :initform (error "Missing `:upd'")
         :documentation "upd : A × B' → A'"))
  (:documentation
   "Monomorphic lenses, stored as a pair of maps. "))

(defun lens (view upd)
  "Make a `lens' object. "
  (declare (function view upd))
  (make-instance 'lens :view view :upd upd))

(defmacro let-lens (bindings &body body)
  "Bindings like (view upd lens). "
  `(let ,(loop for (v u lens) in bindings
               collect `(,v (slot-value ,lens 'view))
               collect `(,u (slot-value ,lens 'upd)))
     ,@body))

;; f ○ g = f(g(x))
(defmethod compose ((f function) (g function))
  (lambda (x) (funcall f (funcall g x))))

;; Lens1 ○ Lens2 : v = v2 ○ v1; u(a, c') = u1(a, u2(v1(a), c))
(defmethod compose ((lens1 lens) (lens2 lens))
  (let-lens ((v1 u1 lens1) (v2 u2 lens2))
    (lens (compose v2 v1)
          (lambda (x y) (funcall u1 x (funcall u2 (funcall v1 x) y))))))

(defmethod call ((lens lens) pair)
  (let-lens ((v u lens))
    (let ((a (car pair))
          (b (cdr pair)))
      (cons (funcall v a)
            (funcall u a b)))))
  • Derivatives as lens
    (flet ((f (vec)
             (apply (lambda (x1 x2) (+ (* 3 (* x1 x1)) (* 7 x2))) vec))
           (df (vec dy)
             (apply (lambda (x1 x2) (declare (ignore x2))
                      (list (* 6 x1 dy) (* 7 dy)))
                    vec)))
      (call (lens #'f #'df) (cons (list 1 2) 0.1))) ; (17 . (0.7 0.1))
    
  • Chain rule as lens composition
    (flet ((f (vec)
             (apply (lambda (x1 x2) (+ (* 3 (* x1 x1)) (* 7 x2))) vec))
           (df (vec dy)
             (apply (lambda (x1 x2) (declare (ignore x2))
                      (list (* 6 x1 dy) (* 7 dy)))
                    vec))
           (dcos (x dy)
             (* (sin x) dy)))
      (call (compose (lens #'f   #'df)
                     (lens #'cos #'dcos))
            (cons (list 1 2) 0.1))) ; (-0.27516335 . (-0.5768385 -0.6729783))
    
  • 即想要构造一个 backprop, 只需要构造:

    /_img/cat-for-ai/backprop-and-lens.svg

  • Optimisers as lenses

注: 但是我并不觉得这个的实现做得比较好, 之后还是得想点方法来重新实现一下. 不过这种写法你完全可以直接迁移到其他编程语言, 比如 C 语言中去:

typedef struct lens {
  void* (*view) (void *arg);
  void* (*upd)  (void *arg);
} Lens;

虽然感觉会很麻烦就是了.

注: 这里的实现并不好用, 实际使用的实现见 cat4ai.

Optics Category

  • Definition: the category \(\mathrm{Optic}(\mathcal{C})\)
    • objects: pairs \(\left( \begin{matrix}A \ A'\end{matrix} \right)\) of objects in \(\mathcal{C}\)
    • morphism: \(\left( \begin{matrix}A \ A'\end{matrix} \right) \xrightarrow{(M, f, l)} \left( \begin{matrix}B \ B'\end{matrix} \right)\)
      • \(M ∈ \mathcal{C}\) object of \(\mathcal{C}\)
      • \(f : A → M ⊗ B\) morphisms
      • \(l : M ⊗ B' → A'\) morphisms
    • compose:
      • \((M_1, f_1, l_1) ˆ (M_2, f_2, l_2) ⇒ (M_1 ⊗ M_2, f_1 ˆ (f_2 ⊗ M), (M ⊗ l_2) ˆ l_1)\)
  • WHY Optics?
    • probabilitst bidrectional transformations
    • bidirectional transformations with side-effect
    • bidirectional transformations that operate on containers
  • Examples

Categorical Foundations of Gradient-Based Learning

接下来的具体的代码实现可以参考 cat4ai 仓库.

  • Definition: Parametrized Category para

    /_img/cat-for-ai/para-cat-plot.svg

  • Definition: Lenses Category lens
  • Definition: Parametric Lenses Category param-lens

其他吐槽

  • 理论理论大好き?

    在导师点评的时候我意识到了这样的一个 “理论之所以学起来痛苦” 原因. 大家想要知道/学的东西是那种非常具体的东西. 比如说, 有人说自己搞出了万物理论, 或者做了很牛逼的东西.

    但是在向你介绍的时候, 每快要逼近这个最终的结论的时候, 就要说一句: “等等, 先让我介绍一下原理/前提”. 哇, 这种也太痛苦了…

    这简直就是拼多多的再砍一刀啊. 这么搞很容易就会让大家失去兴趣了… 要是有能够将理论和实践结合起来的方法就好了. 比如讲完实例后, 再说: “欸, 其实我们刚刚做的东西的背后是这个理论. “, 然后再举几个例子, 最后再一般化…

    但是感觉要求很高… 而且感觉, 尤其是数学方面折腾的例子, 感觉和 “应用” 和 “事例” 关系并不大. 真是难呢.

File Configures

Graphviz Configures

rankdir=LR;
bgcolor="transparent";
fontname="Arial";
color="#888888";
node [shape="circle", fontname="Arial", color="#888888"];
edge [fontname="Arial", color="#888888"];
color="#888888:invis:#888888"

感觉自己纯手工敲这些 Graphviz 的绘图代码实在是太蠢了, 不过感觉自己以后应该也用不上画交换图这种技能, 这里暂时就忍了.