#navi_header|C言語系|

お題：time(1)コマンドがプログラムを起動し、時間をカウントする仕組みを調査せよ

※この章は「デーモン君のソース探検」に載っていませんが、msakamoto-sf自身が個人的に興味を持って調べ、"Appendix"として読書メモシリーズに入れてありますのでご注意下さい。

#more||
#outline||
----
* time(1)を使ってみる

場所とmanpageの確認：
 $ which time
 /usr/bin/time
 $ man 1 time

簡単に動作確認してみる：
timetest.c:
#code|c|>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
        char buf[200];
        printf("Input 3 times.\n");
        printf("1 > ");
        if (NULL != fgets(buf, sizeof(buf), stdin)) {
                printf("1 : %s", buf);
        }
        printf("2 > ");
        if (NULL != fgets(buf, sizeof(buf), stdin)) {
                printf("2 : %s", buf);
        }
        printf("3 > ");
        if (NULL != fgets(buf, sizeof(buf), stdin)) {
                printf("3 : %s", buf);
        }
        printf("sleeping 3 seconds...\n");
        sleep(3);
        return 0;
}
||<

#pre||>
$ gcc -o timetest timetest.c
$ time ./timetest
Input 3 times.
1 > abc
1 : abc
2 > def
2 : def
3 > ghi
3 : ghi
sleeping 3 seconds...

real    0m6.912s
user    0m0.000s
sys     0m0.013s
||<

* time(1)のソースコード

ソースコードの場所は？
#pre||>
$ locate time
...
/usr/src/usr.bin/time
/usr/src/usr.bin/time/CVS
/usr/src/usr.bin/time/CVS/Entries
/usr/src/usr.bin/time/CVS/Repository
/usr/src/usr.bin/time/CVS/Root
/usr/src/usr.bin/time/CVS/Tag
/usr/src/usr.bin/time/Makefile
/usr/src/usr.bin/time/time.1
/usr/src/usr.bin/time/time.c
...
||<

そんなに長くない。ざっくりと読んでみる。main冒頭の変数宣言：
#code|c|>
int
main(argc, argv)
        int argc;
        char **argv;
{
        int pid;
        int ch, status;
        int lflag, portableflag;
        const char *decpt;
        const struct lconv *lconv;
        struct timeval before, after;
        struct rusage ru;
||<
lflag, portableflagというのは"-l", "-p"オプションに対応する。

次、オプション解析。特に難しいところは無く、素直に読み進められる。
#code|c|>
lflag = portableflag = 0;
while ((ch = getopt(argc, argv, "lp")) != -1) {
        switch (ch) {
        case 'p':
                portableflag = 1;
                break;
        case 'l':
                lflag = 1;
                break;
        case '?':
        default:
                usage();
        }
}
argc -= optind;
argv += optind;
||<

指定されたコマンドを実行し、経過時間を取得するメイン処理が続く。
#code|c|>
gettimeofday(&before, (struct timezone *)NULL);
switch(pid = vfork()) {
case -1:                        /* error */
        perror("vfork");
        exit(EXIT_FAILURE);
        /* NOTREACHED */
case 0:                         /* child */
        /* LINTED will return only on failure */
        execvp(*argv, argv);
        perror(*argv);
        _exit((errno == ENOENT) ? 127 : 126);
        /* NOTREACHED */
}

/* parent */
(void)signal(SIGINT, SIG_IGN);
(void)signal(SIGQUIT, SIG_IGN);
while (wait3(&status, 0, &ru) != pid);
gettimeofday(&after, (struct timezone *)NULL);
if (!WIFEXITED(status))
        fprintf(stderr, "Command terminated abnormally.\n");
timersub(&after, &before, &after);
||<

まずgettimeofday()で現在時刻を"before"に保存する。
続いてvfork()を呼び、子プロセスを生成する。fork()を使わず、メモリを共有するvfork()を呼んでいる理由は不明。
子プロセス側は、即座にexecvp(2)でコマンドラインで指定されたプロセスを起動している。

親プロセス側は、SIGINTとSIGQUITを無視に設定し、wait3(2)で子プロセスの終了を待機する。
wait3(2)で待機するのがポイントとなる。wait3(2)については後ほど、もう少し調べてみる。
ひとまず本体ソースを読み進めてみる。

子プロセスの終了が検出されたら、gettimeofday()をもう一度呼び、子プロセス終了時の時刻を"after"に保存する。
子プロセスの終了コードが0であれば、after - beforeの差分をtimersub()で計算している。

timersub()は sys/time.h で以下のように定義されている。
#pre||>
#define timersub(tvp, uvp, vvp)                                         \
        do {                                                            \
                (vvp)->tv_sec = (tvp)->tv_sec - (uvp)->tv_sec;          \
                (vvp)->tv_usec = (tvp)->tv_usec - (uvp)->tv_usec;       \
                if ((vvp)->tv_usec < 0) {                               \
                        (vvp)->tv_sec--;                                \
                        (vvp)->tv_usec += 1000000;                      \
                }                                                       \
        } while (/* CONSTCOND */ 0)
||<

ここまでくれば、あとは"-l"や"-p"オプションに応じて結果を出力するだけとなる。
#code|c|>
if (portableflag) {
        fprintf (stderr, "real %9ld%s%02ld\n",
                (long)after.tv_sec, decpt, (long)after.tv_usec/10000);
        fprintf (stderr, "user %9ld%s%02ld\n",
                (long)ru.ru_utime.tv_sec, decpt, (long)ru.ru_utime.tv_usec/10000);
        fprintf (stderr, "sys  %9ld%s%02ld\n",
                (long)ru.ru_stime.tv_sec, decpt, (long)ru.ru_stime.tv_usec/10000);
} else {

        fprintf(stderr, "%9ld%s%02ld real ",
                (long)after.tv_sec, decpt, (long)after.tv_usec/10000);
        fprintf(stderr, "%9ld%s%02ld user ",
                (long)ru.ru_utime.tv_sec, decpt, (long)ru.ru_utime.tv_usec/10000);
        fprintf(stderr, "%9ld%s%02ld sys\n",
                (long)ru.ru_stime.tv_sec, decpt, (long)ru.ru_stime.tv_usec/10000);
}

if (lflag) {
        int hz = (int)sysconf(_SC_CLK_TCK);
/* 省略 */
}

exit(WIFEXITED(status) ? WEXITSTATUS(status) : EXIT_FAILURE);
/* NOTREACHED */
}
/* main()終了 */
||<

あとはusage()関数が定義されて、time.cはおしまいとなる。

* wait3()で子プロセス終了待機＋rusageの取得

では、今回のポイントとなるwait3(2)について簡単に調べてみる。
 $ man 2 wait3
#pre||>
WAIT(2)                   NetBSD Programmer's Manual                   WAIT(2)

NAME
     wait, waitpid, wait4, wait3 - wait for process termination

LIBRARY
     Standard C Library (libc, -lc)

SYNOPSIS
     #include <sys/wait.h>

     pid_t
     wait(int *status);

     pid_t
     waitpid(pid_t wpid, int *status, int options);

     #include <sys/resource.h>

     pid_t
     wait3(int *status, int options, struct rusage *rusage);

     pid_t
     wait4(pid_t wpid, int *status, int options, struct rusage *rusage);
||<

wait(2)系は子プロセスの終了を検出するシステムコールだが、wait3()とwait4()だけが "struct rusage *rusage" を引数に取得している。
"struct rusage"の詳細はgetrusage(2)のmanページに記載されている。
 $ man -k rusage
 getrusage (2) - get information about resource utilization
#pre||>
GETRUSAGE(2)              NetBSD Programmer's Manual              GETRUSAGE(2)

NAME
     getrusage - get information about resource utilization

LIBRARY
     Standard C Library (libc, -lc)

SYNOPSIS
     #include <sys/resource.h>
     #define   RUSAGE_SELF     0
     #define   RUSAGE_CHILDREN     -1

     int
     getrusage(int who, struct rusage *rusage);

DESCRIPTION
     getrusage() returns information describing the resources utilized by the
     current process, or all its terminated child processes.  The who parame-
     ter is either RUSAGE_SELF or RUSAGE_CHILDREN.  The buffer to which rusage
     points will be filled in with the following structure:

     struct rusage {
             struct timeval ru_utime; /* user time used */
             struct timeval ru_stime; /* system time used */
             long ru_maxrss;          /* max resident set size */
...
||<

プロセスのリソース情報がstruct rusageに格納される。さらに、getrusage(2)の"int who"引数に応じてリソース情報の取得元を自分自身か、子プロセスか選択できるようになっている。

BSDの場合はwait3()/wait4()を使うことでgetrusage(2)を呼ばずとも、子プロセス終了のwaitと同時にリソース情報も取得できるようになっているようだ。

もう少しwait3()/wait4()を深追いしてみる。
 $ locate wait3
 /usr/share/man/cat2/wait3.0
 /usr/share/man/man2/wait3.2
 /usr/src/lib/libc/gen/wait3.c

/usr/src/lib/libc/gen/wait3.c:
#code|c|>
/* ... 省略 ... */

pid_t
wait3(istat, options, rup)
        int *istat;
        int options;
        struct rusage *rup;
{
        return (wait4(WAIT_ANY, istat, options, rup));
}
||<

NetBSD 1.6 においては、wait3(2)の実体はWAIT_ANYを指定したwait4(2)であることが分かる。
（以下の説明は [[849]] の経験を元に大幅に修正している）

wait4のシステムコールを探してみる。[[849]]の経験に基づくと、"src/sys/kern/init_sysent.c"を見ればシンボル名が分かるはず。
src/sys/kern/init_sysent.c:
#pre||>
struct sysent sysent[] = {
        { 4, s(struct sys_wait4_args), 0,
            sys_wait4 },                        /* 7 = wait4 */
||<
見つかった。"struct sys_wait4_args"は sys/syscallargs.hで定義されている。
#pre||>
struct sys_wait4_args {
        syscallarg(int) pid;
        syscallarg(int *) status;
        syscallarg(int) options;
        syscallarg(struct rusage *) rusage;
};
||<

src/sys/kern/ 内で"sys_wait4"をgrepしてみると、kern_exit.c 内で定義されていた。
#pre||>
int
sys_wait4(struct proc *q, void *v, register_t *retval)
{
struct sys_wait4_args /* {
    syscallarg(int)                 pid;
    syscallarg(int *)               status;
    syscallarg(int)                 options;
    syscallarg(struct rusage *)     rusage;
} */ *uap = v;
struct proc     *p, *t;
int             nfound, status, error, s;

(省略)
if (SCARG(uap, rusage) &&
    (error = copyout((caddr_t)p->p_ru,
    (caddr_t)SCARG(uap, rusage),
    sizeof(struct rusage))))
    return (error);
||<

"p"は sys/proc.h で定義されている struct proc, プロセス情報へのポインタ。その"p_ru"メンバは"struct rusage"構造体へのポインタとして宣言されている。
以上でwait4()によりrusage情報を取得する箇所が特定できた。

ここで一旦まとめてみる。
time(1)コマンドは、コマンドラインで指定されたプログラムをvfork(2) + execvp(2)で起動後、wait3(2)で子プロセスの終了を待機。子プロセスが正常終了したら、gettimeofday()とwait3(2)がセットしたstruct rusageを使ってreal/user/sysのそれぞれの時間を表示する。

* Linuxでの実装(CentOS 5.x)

では、Linuxではどのように実装されているか？今回はstraceを使ってざっくりと追ってみる。
#pre||>
(CentOS 5.x)
$ strace time /bin/ls
...
gettimeofday({1290408739, 104680}, NULL) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f03708) = 22855

(lsの出力が混ざって表示される)

--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={0, 999}, ru_stime={0, 999}, ...}) = 22855
gettimeofday({1290408739, 110785}, NULL) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {SIG_IGN}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL}, {SIG_IGN}, 8) = 0
write(2, "0.00", 40.00)                     = 4
write(2, "u", 1u)                        = 1
...
||<
相違点として目立つのは次の２点。それ以外はおおむねNetBSD 1.6のtime(1)と同等に思える。
- 子プロセスの起動にclone(2)を使っている
- wait4()を使っている。

まずclone(2)はLinux特有の子プロセス作成のシステムコールであり、親プロセスとリソースの一部を共有することが出来る。その点ではvfork(2)と似ている。
次にwait4()については、今回使用した CentOS 5.x でも提供されており、wait3()がマクロになっているか、あるいは直接wait4()を呼んでいるものと推測される。CentOS 5.4 でのwait3()/wait4()のprototypeを以下に示すが、NetBSD側のプロトタイプと同じである。
 pid_t wait3(int *status, int options, struct rusage *rusage);
 pid_t wait4(pid_t pid, int *status, int options, struct rusage *rusage);

** wait3()/wait4()と標準規格について

ではOpenGroupではtime(1)の実装に何か言及されているかと調べてみると、"times(2)"関数で取得される"struct tms"構造体の
 tms_utime
 tms_cutime
 tms_stime
 tms_cstime
フィールドと同等云々と書かれている。times(2)は自分とその子プロセスの時間情報をstruct tms構造体につめて返すシステムコールである。

wait3()/wait4()はどこに行ったのか？と見ると、OpenGroupではwait3()/wait4()は規定されていない。あくまでもBSDローカルな関数であることが分かる。

----
以上でtime(1)が子プロセスの時間を取得する仕組みが判明した。子プロセスの終了をwait3()/wait4()で待機しstruct rusageを取得するのがtime(1)の仕組みの基本となっている。
ただしwait3()/wait4()はOpenGroupでは規定されておらず、どのUNIXでも通用するポータブルなものにはなっていない。ポータブルにするには、times(2)やgetrusage(2)を使うことになるだろう。

今回のお題については、ここまで。

#navi_footer|C言語系|