Building a Minimal Shell in C
Here we will walk through the process of building a minimal shell in C, following Stephen Brennan's tutorial. We'll break down the code step-by-step, explaining the purpose and functionality of each part. By the end, you'll have a simple but functional shell that can execute commands, change directories, and more.
The code for the shell described here is available on GitHub.
It's a single main.c file with the corresponding functions for a very basic shell. It has the cd, help and exit commands.
Run it
- Clone repository
- Make sure you are in a Unix-like environment (any Linux distro will do)
- In the command console:
gcc -o main ./src/main.c
- And then:
./main
Table of Contents
- Introduction
- Required Libraries and Headers
- Main Function and the Shell Loop
- Reading Input
- Parsing Input
- Executing Commands
- Built-in Commands
- Some clarifications
- Conclusion
1. Introduction
A shell is a command-line interpreter that allows users to interact with the operating system by executing commands. In this tutorial, we'll create a basic shell that can handle simple commands and demonstrate the fundamentals of shell programming.
2. Required Libraries and Headers
At the beginning of our code, we include several standard libraries. Each serves a specific purpose:
#include <sys/wait.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
3. Main Function and the Shell Loop
Our main function initializes the shell loop, which continuously prompts for and executes commands until the user decides to exit.
int main(int argc, char **argv)
{
// call our loop fn to be executed while terminal is being used
lsh_loop();
return 0;
}
You may wonder where lsh_loop
comes from and what it really does. It handles the core loop of our shell:
void lsh_loop(void)
{
char *line;
char **args;
int status;
// here the loop fns are pretty self-explanatory
do
{
printf("> "); // print command
line = lsh_read_line(); // read command
args = lsh_split_line(line); // split command into args
status = lsh_execute(args); // execute args
// free memory
free(line);
free(args);
} while (status);
}
4. Reading Input
The lsh_read_line function reads a line of input from the user. It dynamically resizes the buffer to accommodate the entire input.
#define LSH_RL_BUFSIZE 1024
char *lsh_read_line(void)
{
int bufsize = LSH_RL_BUFSIZE;
int position = 0;
char *buffer = malloc(sizeof(char) * bufsize);
int c;
if (!buffer)
{
fprintf(stderr, "lsh: allocation error\n");
exit(EXIT_FAILURE);
}
while (1)
{
// read a character
c = getchar();
// if we hit EOF replace it with a null character and return
if (c == EOF || c == '\n')
{
buffer[position] = '\0';
return buffer;
}
else
{
buffer[position] = c;
}
position++;
// if exceeding the buffer we reallocate
if (position >= bufsize)
{
bufsize += LSH_RL_BUFSIZE;
buffer = realloc(buffer, bufsize);
if (!buffer)
{
fprintf(stderr, "lsh: allocation error\n");
exit(EXIT_FAILURE);
}
}
}
}
// Note: newer versions of the C library provide a getline() function in stdio.h that does most of the work we just implemented
5. Parsing Input
The lsh_split_line function breaks the input line into tokens (words), which are the command and its arguments.
#define LSH_TOK_BUFSIZE 64
#define LSH_TOK_DELIM " \t\r\n\a"
char **lsh_split_line(char *line)
{
int bufsize = LSH_TOK_BUFSIZE, position = 0;
char **tokens = malloc(bufsize * sizeof(char *));
char *token;
if (!tokens)
{
fprintf(stderr, "lsh: allocation error\n");
exit(EXIT_FAILURE);
}
token = strtok(line, LSH_TOK_DELIM);
while (token != NULL)
{
tokens[position] = token;
position++;
if (position >= bufsize)
{
bufsize += LSH_TOK_BUFSIZE;
tokens = realloc(tokens, bufsize * sizeof(char *));
if (!tokens)
{
fprintf(stderr, "lsh: allocation error\n");
exit(EXIT_FAILURE);
}
}
token = strtok(NULL, LSH_TOK_DELIM);
}
tokens[position] = NULL;
return tokens;
}
6. Executing Commands
The lsh_execute function checks if the command is a built-in function or an external command and executes it accordingly.
int lsh_execute(char **args)
{
int i;
if (args[0] == NULL)
{
// an empty command was entered
return 1;
}
for (i = 0; i < lsh_num_builtins(); i++)
{
if (strcmp(args[0], builtin_str[i]) == 0)
{
return (*builtin_func[i])(args);
}
}
return lsh_launch(args);
}
The lsh_launch
function is responsible for launching external commands using fork
and execvp
.
int lsh_launch(char **args)
{
//`pid_t` for uniquely identifying and managing processes in Unix-like systems
pid_t pid, wpid;
int status;
// create new processes (parent and child)
pid = fork();
if (pid == 0)
{
// child process
if (execvp(args[0], args) == -1)
{
perror("lsh");
}
exit(EXIT_FAILURE);
}
else if (pid < 0)
{
// error forking
perror("lsh");
}
else
{
// parent process
do
{
wpid = waitpid(pid, &status, WUNTRACED);
} while (!WIFEXITED(status) && !WIFSIGNALED(status));
}
return 1;
}
7. Built-in Commands
We define three built-in commands: cd, help, and exit. These commands have specific functions associated with them.
// function declarations for built-in shell commands
// forward declaration is when you declare (but don’t define) something, so that you can use its name before you define it
int lsh_cd(char **args);
int lsh_help(char **args);
int lsh_exit(char **args);
// list of built-in commands followed by their corresponding functions
char *builtin_str[] = {
"cd",
"help",
"exit"};
// built-in commands can be added simply by modifying these arrays
int (*builtin_func[])(char **) = {
&lsh_cd,
&lsh_help,
&lsh_exit};
int lsh_num_builtins()
{
return sizeof(builtin_str) / sizeof(char *);
}
// built-in functions implementation
int lsh_cd(char **args)
{
if (args[1] == NULL)
{
fprintf(stderr, "lsh: expected argument to \"cd\"\n");
}
else
{
if (chdir(args[1]) != 0)
{
perror("lsh");
}
}
return 1;
}
int lsh_help(char **args)
{
int i;
printf("Stephen Brennan's LSH\n");
printf("Type program names and arguments, and hit enter.\n");
printf("The following are built in:\n");
for (i = 0; i < lsh_num_builtins(); i++)
{
printf(" %s\n", builtin_str[i]);
};
printf("Use the man command for information on other programs.\n");
return 1;
}
int lsh_exit(char **args)
{
return 0;
}
8. Some clarifications
There may be some methods or types that aren't very clear at first glance. They are imported with the headers:
#include <sys/wait.h>
waitpid()
and associated macros //tell parent process to wait for child process
#include <unistd.h>
chdir()
change directoryfork()
create a new process. it returns twice: once in the parent process and once in the child processexec()
execute processpid_t
data type defined in Unix-like operating systems (including Linux) used for representing process IDs
#include <stdlib.h>
malloc()
allocate memoryrealloc()
reallocate memoryfree()
free up memoryexit()
exit processexecvp()
replacing in the child process the current process image with a new programEXIT_SUCCESS
,EXIT_FAILURE
exit process with success or failure
#include <stdio.h>
fprintf()
choose where to print to: a file, stderr, or any custom stream for purposes such as logging, error reporting, or output redirectionprintf()
print to consolestderr
standard input for errorgetchar()
read charperror()
print an error message to stderr
#include <string.h>
strcmp()
compare two stringsstrtok()
tokenize the input line based on the specified delimiters
Some macros:
#define LSH_TOK_BUFSIZE 64
This defines a macro LSH_TOK_BUFSIZE with a value of 64. This macro is used to specify the initial size of the buffer for storing tokens.
#define LSH_TOK_DELIM " \t\r\n\a"
This defines a macro LSH_TOK_DELIM with a string value " \t\r\n\a". This string contains delimiters used to split the input line into tokens. The delimiters include:
- Space (' ')
- Tab ('\t')
- Carriage return ('\r')
- Newline ('\n')
- Alert/bell ('\a')
Some function parameters here may be not very clear, for example:
int main(int argc, char **argv)
{
//code...
}
Here the **
in char **argv
indicates that argv
is a pointer to a pointer of type char. Let's break down what this means in detail:
char *
: A single pointer to a character, often used to represent a string in C.char **
: A pointer to a pointer of type char. In the context of argv, it is used to represent an array of strings (or an array of pointers to characters).
9. Conclusion
Here, we have built a minimal shell in C. This shell can read input, parse it, and execute commands, including some basic built-in commands. This is a great starting point for understanding how shells work and can be expanded with more features and commands.
By following this step-by-step guide, you should now have a basic shell that you can customize and build upon. Happy coding!