7月 | 2013 | 技术奇异点

Archive for 2013年7月

Programming in Lua（六）－Continuation

2013/07/14

在之前的 blog 中 (三、五) 讨论了 Lua C APIs 的 continuation 概念。可以说 Lua continuation 的 VM 实现和 APIs 设计是「inevitable and perfect design」，一个支持 coroutine 的 embeded/extendable 语言就得这么设计。但是前几篇 blog 还没有完全解释 continuation 如何在整个 coroutine 机制中起作用。

图 6-1 是 Lua VM 运行时的 stack 的示例。Lua VM 是 stackless 设计，图中 Lua stack 和 C runtime stack (CRT stack) 并立，两者 stack frame 的对应关系已被标出。橙色部分表示 CRT stack 上一些 Lua VM 维护自身状态的函数，无法明确对应于 Lua stack 上的具体 frame。此外还要注意几点：

Lua stack 是 dual-stack，stack frames 和 stack entries 分别存储在 struct lua_State 中的 CallInfo 链表和 TValue* 数组中。图上只画出了 Lua stack frames，而且没有画成链表形式，没有画出 Lua stack entries。但不妨碍讨论问题。
为了简化图示，Lua VM 的一些次要函数没有在 CRT stack 上被画出，虽然在实际代码执行中它们会出现在 CRT stack 上。
图中，「CallInfo (Lua)」表示 Lua 函数的 stack frame，「CallInfo (C)」表示 Lua 调用的 C 函数的 stack frame。不管有多少「CallInfo (Lua)」，只要它们在 Lua stack 上的位置是连续的，中间不存在「CallInfo (C)」，对应到 CRT stack 上都是一级 luaV_execute()。这就是 stackless 的意义。
C 函数本身的执行不论有多少层函数调用，Lua stack 上都只有一个「CallInfo (C)」。当然 C 函数可以通过 lua_*call* [1] 来间接地调用另一 C 函数，这时 Lua stack 上会出现相邻的多个「CallInfo (C)」。但本文不涉及这种情形。

接下来看看两个 stack 在运行中如何增长和恢复：

当 luaV_execute() 取到一条 OP_CALL 指令的时候，会调用 luaD_precall()，该函数调用 next_ci() 增长 Lua stack。如果被调用的是 Lua 函数，CRT stack 保持不变。
当 luaV_execute() 取到一条 OP_RETURN 指令的时候，会调用 luaD_postcall()，该函数会恢复调用前的 Lua stack。注意只有 Lua 函数才有 OP_RETURN 指令，C 函数只是简单返回并且恢复 CRT stack。
如果 luaD_precall() 发现被调用的是 C 函数，它会调用该函数，并在其返回之后调用 luaD_postcall() 来恢复调用前的 Lua stack 状态。
在 C 函数中通过 lua_*call* API 调用 Lua 函数时，lua_*call* 会间接地调用 luaD_precall()，该函数会调用 next_ci() 来增长 Lua stack。并且会调用 luaV_execute() 运行被调用的 Lua 函数。此时 CRT stack 上会出现多个 luaV_execute() frame。只有在两种情况下 CRT stack 上出现多级 luaV_execute() frame，这是一种情况，另一种情况是 coroutine，下面说明。除此之外，CRT stack 不会出现多个 luaV_execute() frame。
如果一个 Lua 函数是被 C 函数直接调用的，它返回的时候执行它的 luaV_execute() 也会返回 [2]。这时作为 caller 的 C 函数会继续运行。

接下来考虑加入 coroutine 之后的情形。首先考虑只有 Lua 函数的情况 [3]。图 6-2 中，最初只有一个属于 main thread 的 Lua stack (在图的左下方)。在 main thread 中创建一个 coroutine，并且对其调用 coroutine.resume()。coroutine.resume() 调用 luaV_execute()，这时 CRT stack 上有两层 luaV_execute()，分别对应 main thread 和 coroutine。由于最顶层 luaV_execute() 的参数是 coroutine 的 struct lua_State，「当前的」Lua stack 从 main thread 切换为 coroutine。注意，Lua 并没有显式的数据结构表示「当前的」thread，处于 CRT stack 顶端的 luaV_execute() 所对应的就是当前的 thread 和 Lua stack [4]。

在 coroutine 中发生 yield 时 Lua VM 会调用 longjmp() 回到原来 lua_resume() 执行的地方，导致 CRT stack 恢复为 resume 时的状态，从而恢复执行 main thread 对应的 luaV_execute()。如果不出现 C 函数调用 Lua 函数的情况，而且 yield 始终发生在 Lua 函数中，那么 coroutine 的切换就是简单的调用 luaV_execute() 和 longjmp()。

现在考虑在 coroutine 中调用 C 函数。图 6-3 显示了在 coroutine 中 Lua 函数调用了一个 C 函数，后者又调用了 Lua 函数的情况。

如果此时调用 coroutine.yield()，longjmp() 会恢复 CRT stack (灰色阴影部分被销毁)，C 函数的 callstack 将丢失。如果 main thread 再次 resume coroutine 之后，stack 如图 6-4。

这时的问题在于如何处理 coroutine 中的「CallInfo (C)」。两个原因决定了这个 C 函数的状态无法恢复。第一，原来的 CRT stack frames 已经在 yield 过程中丢失；第二，此时的 luaV_execute() 是由 resume 调用的，而不是原来的 lua_*call*()，所以顶层 luaV_execute() 无法返回到 C 函数中正确的代码位置。

这时 continuation 开始发挥作用。上文提到过，两种情况会导致在 CRT stack 上出现多级 luaV_execute()：一是从 C 函数中调用 Lua 函数， luaD_precall() 调用 luaV_execute()；二是 resume。两者的不同之处在于，前者调用的 luaV_execute() 返回之后， luaD_precall() 在进行一些简单处理之后也会返回；而后者调用的 luaV_execute() 返回之后会调用 unroll() 函数进入一个循环 [5]。在循环中，会检查当前 Lua stack 顶端是否为「CallInfo (C)」，如果是的话，会调用这个 CallInfo 的 u.c.k field。这个 field 就是 lua_callk()/lua_pcallk() 接受的 continuation 参数 [6][7]。

图 6-5 表示 continuation 起作用的情形。和图 6-4 相比，coroutine Lua stack 顶端两个「CallInfo (Lua)」消失了，其对应的 Lua 函数已经返回。由于接下来的 Lua stack 顶端是「CallInfo (C)」，所以最顶层的 luaV_execute() 返回。然后 resume() 调用 unroll()，后者再调用 stack 顶端的 CallInfo 的 u.c.k。此时 coroutine 在 CRT stack 上暂时没有对应的 luaV_execute()，等到 continuation 函数执行结束返回后，unroll() 中的循环会再调用 luaV_execute() 运行 coroutine 中剩下的 Lua 代码。如图 6-6 所示。unroll() 中的循环会不断检查 Lua stack 中的 frames [8]，相应的执行 continuation 函数或者调用 luaV_execute() 运行 Lua 函数。所以在 coroutine 中不论有多少层 C 到 Lua 函数的调用，只要每次都提供正确的 continuation，就可以保证正确的 coroutine 切换。

由此可见，continuation 是专门为 coroutine 设计的概念。运行在 main thread 中的代码从 C 函数中调用 Lua 函数不用提供 continuation。

脚注：

本文用 lua_*call*() 来表示 lua_call(), lua_pcall(), lua_callk(), lua_pcallk() 一族 APIs。
一个 .lua 文件被调入 VM 之后被视为一个 Lua 函数。
Lua 程序在运行时会经常调用 C 函数，不过这些 C 函数往往会很快返回。所以通常在 yield 时 Lua stack 上不会有「CallInfo (C)」存在。
对于一个 Lua VM 来说，始终只有一个 CRT stack。
这个不同仅仅针对经过至少一次 yield 之后被重新 resume 的 coroutine。lua_State 的 status field 标示了一个 coroutine 是第一次被 resume 还是被 yield 之后重新 resume。
实际上，在 yield 时 Lua VM 会检查被破坏的 CRT stack 部分对应的所有「CallInfo (C)」是否拥有 continuation 函数，如果没有就会抛出 error。
本文讨论的 yield 发生在 Lua 函数中 (stack 上有 C 函数，但是不处于顶端)。如果 yield 发生在 C 函数中，情况类似，只不过使用的是 lua_yieldk() 而不是 lua_callk()/lua_pcallk()。
当然 unroll() 不会到检查每一个 frame。luaV_execute() 的一次执行就会「吃掉」Lua stack 顶端所有连续的「CallInfo (Lua)」frame。