UNIX Process Control

The concept of process

A process is a program in execution.

Each process has a unique identifier, which we refer to as "process id".

Some process identifiers are reserved for special purposes.

0 for scheduler, or the code for kernel activity.

The scheduler decides which processes to run and which should wait.

1 for init. created by the kernel after booting.

	init brings the UNIX to working condition, and may refer to rc scripts in the /etc/directory.
	The process is either in /etc or /sbin
	Now let's take a look at what Linux will do.

2 for page daemon.

	Sometimes called pager.
	A kernel process that supports virtual memory.

Process information

getpid

Get the process id.

getppid

Get the parent process id.

getuid

Get the user id of a process.

geteuid

Get the effective user id of a process.

getgid

Get the group id of a process.

getegid

Get the effective group id of a process.

Now let's write a program to print out these information.

Process creation

fork function

The fork function is the only way to create a process in a UNIX environment.

The fork function is called once, but returned twice.

A child process is created by calling the fork. The two processes are identical except for the return value.

	The parent process returns the child process id.
	The child process returns 0.

The two processes continue execution after calling fork.

The two processes DO NOT share data -- they are two copies of the same program.

Now let's try the textbook example. fork1.c

	The variables are different copies.
	The process ids are different.
	When we run it normally the "before fork" appears only once.
	When the output is directed to a disk file, the "before fork" appears twice. The reason is that when the disk I/O is fully buffered, so the contents of standard I/O buffers are copied from the parent to the child.
	The write appears only once since it is not buffered (because of write system call).

Information sharing

Some information is shared between the parent and the child.

	Open file descriptor and offsets. In that case two processes can share a open file.
	Various user and group ids.
	Working directory.
	Environment.
	Resource limits.
	Refer to table on page 192 for a complete list of entries that the child inherits from the parent.

Some data are different between parent and child.

	Process id.
	Return value from fork.
	Refer to table on page 192 for a complete list of entries that the child does not inherit from the parent.

Purpose of process creation

To create a duplicate copy.

This is done by placing different sections of code after checking the return value of the fork.

To run a different program.

	This is usually done by running a "exec" after the fork.
	The fork and the exec can be combined (sometimes called spawn) to improve efficiency.

vfork function

To run a program using the child. Notice that the child runs in the addressing space of the parent and no memory copying is necessary.

The child will run first, and the parent will wait for it.

The textbook example vfork1.c

	Notice the values the parent prints out.
	What will happen if we replace the _exit with exit?

Process termination

exit function

	To terminate a process with an exit code.
	Notice that this is a library, and _exit is a system call.

Normal termination

	The main program returns.
	The program calls exit.
	The program calls _exit.

Abnormal termination

	The program calls abort.
	The program catches a signal.

No matter how a process terminates, the same code is the kernel does the following.

	Close all open file descriptor.
	Release memory.
	Release process table entry.

The exit code

	The exit code lets the child to notify the parent about the its execution status.
	The process reports exit status, to which the kernel might add extra information and called termination status.
	The termination code can be found from the wait family from the parent.

Anomaly

When the parent terminates before the child, the init process becomes the parent of this orphan process.

When the child terminates before the parent, the child becomes a zombie.

	A zombie is a dead entity, but not completely dead. :-)
	The child must leave sufficient information in the process table so that later when its parent wants to fetch its status, it is able to do so,
	The information a zombie keeps in the process table includes process id, termination status, and accounting information.
	One can use ps to find out the status of all processes, including zombies.

Process synchronization

wait families

wait

	A process can call wait to wait for the child process to complete.
	The wait function provides a integer buffer for receiving the termination status.
	The wait function will block if no child is available.
	The return value is the process id of the child process.

waitpid

	A process can call wait to wait for a particular child process to complete.
	The waitpid can be non-blocking.

Termination status

There are a set of macros to retrieve information from the termination status.

WIFEXITED

	true is the child terminates normally.
	WEXITSTATUS tells us the actual exit code.

WIFSIGNALED

	true if the child process catches a signal and terminates.
	WTERMSIG
	WCOREDUMP

WIFSTOPPED

	true if the process is stopped.
	WSTOPSIG

Now we try the textbook example wait1.c

Can we generate the coredump file?

Three cases are tested in this example.

	normal termination with exit.
	abnormal termination with abort.
	abnormal termination with arithmetic exception.

Another textbook example fork2.c.

	The first child exits before the second, so that init will adopt the second child.
	This is useful when the parent does not want to wait for the child and we do not want the child to become a zombie either.

Race condition

Multiple processes running simultaneously could result very strange errors.

If the correctness of a program depends on the execution sequence of consisting processes, then we have a race condition.

The race condition is difficult to debug since the error may not appear when we want it to.

The (proc/fork2.c) example has a race condition.

	We cannot guarantee that the first child will exit first.
	If that happens, init will not adopt the second child.

Textbook example tellwait1.c

	The stdout is specifically changed to be unbuffered.
	The outputs from parent and child are mingled together.

Process synchronization

	To avoid race condition, we need to synchronize the processes.
	tellwait1.c gives an example of process synchronization. We will discuss its implementation when we cover IPC and signals.

Exec family

The exec family runs a specific program, which replaces the image of the calling process.

In Linux only the execve is a system call, and all the other are library that were built on top of execve. See the figure on page 211 for details.

Here is a list of all the functions.

	execl
	execv
	execle
	execve
	execlp
	execve

Here is the trick (page 209).

program filename

	nothing means pathname
	p means the file should be found from the path.

argument passing

	l means the arguments are passed as argument list.
	v means the arguments are passed as a pointer array.

environment

	nothing means the environment is from the environ variable.
	e means the environment is from the environment pointer array.

When the program is given as a filename

No slash found

Find it from PATH.

Slash found

Treated as a pathname.

Try the textbook example exec1.c

	The echoall.c echoes all command line arguments.
	The main program runs the echoall program.

Interpreter file

An interpreter file is the input to an interpreter.

An interpreter file is a text file, not a binary executable.

An interpreter file starts with "#!", then followed by the name of the interpreter, then by the optional arguments.

When the interpreter is a shell (in most cases it is), the interpreter file is usually called a shell script -- namely a script that will be run by the shell.

Try the text book example.

We write an interpreter file testinterp, which will execute the echoarg program.

Notice that the path name of the interpreter file is added as the last argument (by the kernel), and passed to the interpreter (in this case, echoarg).

Now we compile and run exec2.c.

	The exec2 executes the testinterp, which executes the echoarg.
	Notice that the prompt disappeared!

Now try to use awk as the interpreter.

awk is a very useful script interpreter. The basic syntax is "awk -f file", where file is the name of the awk script.

Now we write an awk script to print the first word of every line.

Now we try the textbook awk script that prints all the arguments.

There are several files here.

	The interpreter program awk.
	The interpreter file awkexample.

When the awkexample is evoked from a shell, the shell creates a process, and executes the interpreter file.

The interpreter files then executes the interpreter (awk), and use the '-f' mechanism to pass the name of the interpreter file name as an arguments, along with other command line arguments from shell.

Notice that the awk is given five parameters -- the '-f' from the interpreter file, the pathname of the interpreter file is given by the kernel, the last three arguments are from the command line, then passed to interpreter file awkexample.

See page 220 for a complete illustration.

Reasons for using interpreter files.

	Hide the fact that a command is a script, not a binary executable.
	Efficiency gain.
	Write shell scripts other than for sh.

system functions

A simple way to utilize system facility -- just like typing into a shell.

The return value tells whether the command is executed successfully or not.

system takes a command string and passes it to /bin/sh for execution. Now exam the following implementation (system.c).

	First we create a new process by fork.
	Then we use execl to execute /bin/sh and ask it to run the command string for us. The command string is passed to /bin/sh by way of -c option. This option directs /bin/sh to take the command from the string immediately after '-c' option.
	Finally the system process waits for sh to complete.
	Note that we use _exit instead of exit in the system process.

Now we try the textbook example (systest1.c).

system and setuid programs

A setuid program should never use system, since the effective uid can be carried into the child process.

Consider two files tsys and printuids.

	tsys is a program that uses system to run the command given to it.
	printuid is a program that prints the real and effective uids.

If tsys has setuid bit on, then the printuids process will have the effect of setuid.

Process accounting

	The superuser can turn the accounting on and record the accounting information into a file.
	These information can be retrieved from the file by simple file I/O.

Process times

The command time reports the time usage.

The system call times retrieves the time usage of a process and its children.

The function returns the wall clock time each time it is called.

Now try the textbook example (times1.c).

The function pr_times computes the difference between two tms records and report the values.