Files and Directories

 
Tar utility introduction
I assume that every has understood the concepts of file and directory. We will examine an simple file utility called tar.
The tar utility saves a set of files into an archive, and can retrieve the files from this archive.
Let's first examine the format of tar.
A tar file consists of a series of blocks.
Each file has a header (which is exactly one block), followed by the contents of the file (which are zero or more than one block).
Each regular file has a header block, plus the contents of the file.
Each directory has only the header block.
All the information from stat is stored in the header block.
The name of the file
The user id and group id of the file
The last modification time
The access bits
A checksum
Here we check out an example tar file -- man.tar, and use a binary editor to take a look.
All numbers are octal written in ASCII for compatibility.
The checksum can ensure data integrity.
A simple directory reading example to warm up
examples/myls.c
This program simply reads a directory and prints its contents.
library routines used. They are similar to open, read, and close, but on directory, not regular files.
opendir
readdir
closedir
Tar implementation
source code
tar/tar/const.h
Define constants.
In UNIX we use '/' as pathname separator.
tar/tar/err.h
tar/tar/err.c
Error handling routines.
tar/tar/misc.h
tar/tar/misc.c
tar/tar/my_tar.c
Notice that how the main program grabs the command line arguments.
The usr_fatal is similar to printf, which can take arbitrary number of arguments. We will explain the mechanism when dealing with variable number argument list.
Note the way we use command_name to give meaningful error message.
tar/tar/traverse.h
tar/tar/traverse.c
Use command_name to link to the name of the executable.
Use the function pointer to recursively call tar_it
Use opendir to open this directory.
Use dirp to obtain the list of entities under this directory one at a time.
Close the directory with closedir when we are done.
tar/tar/tar.h
Note how we implement header file that can be included multiple times.
tar/tar/tar.c
Note that we implement the tar standard header.
tar_it
We deal with only regular files and directory.
We use fileno to obtain the file descriptor, then use the file descriptor to obtain the stat information. This is only a precaution that we do not tar the tar file.
Call build_header to construct the header.
If this is a directory then we are done, other wise we read the file one block at a time and write to the tar file.
Finally, close the tar file with sufficient number of zero blocks.
build_header
For directory, we add the '/' at the end when the name is stored into the header.
We use a series of sprintf to do number conversion.
We deal with only regular files and directory.
Notice that we compute the checksum only after it was filled with blanks, as the specification says.
fill_end_blocks
Nothing serious here. We just make sure that the total file length is a multiple of TarBlockSize. See the specification for details.
Library function used
lstat
sprintf
memset
fileno
fstat
fopen
fread
fwrite
fclose
UnTar implementation
source code
tar/untar/const.h
tar/untar/err.h
tar/untar/err.c
tar/untar/misc.h
tar/untar/misc.c
tar/untar/tar.h
This time we formally put the header structure in a header file.
This is the recommended way since the tar and untar will use the same structure.
tar/untar/untar.c
This is the main driver program. It simply calls the untar to do all the jobs.
tar/untar/untar_it.h
We have three functions here. The untar is the interface to the main driver program. The untar_it will untar a block, and the function is_zero_block tests for a block consisting of all zeros.
tar/untar/untar_it.c
untar
The program checks for zero blocks. If the next block is not all zero, then it is the beginning of a file.
After no data was found. The program scans for the right number of zero blocks in the specification.
untar_it
Process a file -- it could be a directory or a regular file.
First it uses a series of sscanf to parse the data in the header block.
The checksum is verified to ensure data integrity. Note that the checksum is first read from the header, then filled with blanks and recomputed.
If this is a directory, we simply call mkdir to create it.
If this is a regular file, we read the contents out of the tar file, and create the file.
After the file is created, we restore its correct ownership, permission, and time.
ownership
Use chown to correct the user id and group id.
time information
Since the tar file only keeps the modification time, we simply use it as the last access time.
permission bits.
We should restore the permission bits with chmod here, but I forgot to do it. :-)
is_zero_block
Simply check for all zeros for a buffer.
Library function used
sscanf
mkdir
Create a new directory.
chown
Change the ownership of a file. We need to use this to restore the user and group id.
BSD enforces that only super user can do this, but System V allows anyone to do it. POSIX determines this by a system constant _POSIX_CHOWN_RESTRICTED, usually defined in . If that is the case, then one can chown only if.
He is the superuser, or
The processes has an effective user id of the file, and the user owns the file, and has an group id equals to the effective group id of the process.
utime
UNIX has three file times.
Time of last access.
we can use ls -u to check it.
Affected by, for example, read.
Time of modification.
The default time reported by ls.
Affected by write().
Time of last i-node change. (We will talk about this in the next lecture)
Check it with ls -c.
Affected by, for example, chmod.
The utime function can set these times only if you are the superuser or you own the file.
The utimebuf is used to set the time. If it is NULL the the access and modification time are set to the current time. If the pointer is not NULL, it is set to the contents of the buffer.
chmod
Change the permission bits of a file.
Caller must be the superuser or have the effective user id of the file.
The sticky bit
On a regular file it means that the file will "stick" to the swap area and can be moved back into memory much faster.
On a directory it means that the operation on that directory is carefully examined. One can remove a file only if
He owns the file.
He owns the directory.
He is the superuser.
The restriction on the deletion is to make sure no one deletes others' files, especially in those directories that everybody can write (e.g. /tmp).