Building a Minimal Shell in C

Here we will walk through the process of building a minimal shell in C, following Stephen Brennan's tutorial. We'll break down the code step-by-step, explaining the purpose and functionality of each part. By the end, you'll have a simple but functional shell that can execute commands, change directories, and more.

The code for the shell described here is available on GitHub.

It's a single main.c file with the corresponding functions for a very basic shell. It has the cd, help and exit commands.

Run it

Clone repository
Make sure you are in a Unix-like environment (any Linux distro will do)
In the command console: gcc -o main ./src/main.c
And then: ./main

Introduction
Required Libraries and Headers
Main Function and the Shell Loop
Reading Input
Parsing Input
Executing Commands
Built-in Commands
Some clarifications
Conclusion

1. Introduction

A shell is a command-line interpreter that allows users to interact with the operating system by executing commands. In this tutorial, we'll create a basic shell that can handle simple commands and demonstrate the fundamentals of shell programming.

2. Required Libraries and Headers

At the beginning of our code, we include several standard libraries. Each serves a specific purpose:

#include <sys/wait.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

3. Main Function and the Shell Loop

Our main function initializes the shell loop, which continuously prompts for and executes commands until the user decides to exit.

int main(int argc, char **argv)
{
    // call our loop fn to be executed while terminal is being used
    lsh_loop();
    return 0;
}

You may wonder where lsh_loop comes from and what it really does. It handles the core loop of our shell:

void lsh_loop(void)
{
    char *line;
    char **args;
    int status;

    // here the loop fns are pretty self-explanatory
    do
    {

        printf("> ");                // print command
        line = lsh_read_line();      // read command
        args = lsh_split_line(line); // split command into args
        status = lsh_execute(args);  // execute args

        // free memory
        free(line);
        free(args);

    } while (status);
}

4. Reading Input

The lsh_read_line function reads a line of input from the user. It dynamically resizes the buffer to accommodate the entire input.

#define LSH_RL_BUFSIZE 1024
char *lsh_read_line(void)
{
    int bufsize = LSH_RL_BUFSIZE;
    int position = 0;
    char *buffer = malloc(sizeof(char) * bufsize);
    int c;

    if (!buffer)
    {
        fprintf(stderr, "lsh: allocation error\n");
        exit(EXIT_FAILURE);
    }

    while (1)
    {
        // read a character
        c = getchar();

        // if we hit EOF replace it with a null character and return
        if (c == EOF || c == '\n')
        {
            buffer[position] = '\0';
            return buffer;
        }
        else
        {
            buffer[position] = c;
        }
        position++;

        // if exceeding the buffer we reallocate
        if (position >= bufsize)
        {
            bufsize += LSH_RL_BUFSIZE;
            buffer = realloc(buffer, bufsize);
            if (!buffer)
            {
                fprintf(stderr, "lsh: allocation error\n");
                exit(EXIT_FAILURE);
            }
        }
    }
}
// Note: newer versions of the C library provide a getline() function in stdio.h that does most of the work we just implemented

5. Parsing Input

The lsh_split_line function breaks the input line into tokens (words), which are the command and its arguments.

#define LSH_TOK_BUFSIZE 64
#define LSH_TOK_DELIM " \t\r\n\a"
char **lsh_split_line(char *line)
{
    int bufsize = LSH_TOK_BUFSIZE, position = 0;
    char **tokens = malloc(bufsize * sizeof(char *));
    char *token;

    if (!tokens)
    {
        fprintf(stderr, "lsh: allocation error\n");
        exit(EXIT_FAILURE);
    }

    token = strtok(line, LSH_TOK_DELIM);
    while (token != NULL)
    {
        tokens[position] = token;
        position++;

        if (position >= bufsize)
        {
            bufsize += LSH_TOK_BUFSIZE;
            tokens = realloc(tokens, bufsize * sizeof(char *));
            if (!tokens)
            {
                fprintf(stderr, "lsh: allocation error\n");
                exit(EXIT_FAILURE);
            }
        }
        token = strtok(NULL, LSH_TOK_DELIM);
    }
    tokens[position] = NULL;
    return tokens;
}

6. Executing Commands

The lsh_execute function checks if the command is a built-in function or an external command and executes it accordingly.

int lsh_execute(char **args)
{
    int i;

    if (args[0] == NULL)
    {
        // an empty command was entered
        return 1;
    }

    for (i = 0; i < lsh_num_builtins(); i++)
    {
        if (strcmp(args[0], builtin_str[i]) == 0)
        {
            return (*builtin_func[i])(args);
        }
    }
    return lsh_launch(args);
}

The lsh_launch function is responsible for launching external commands using fork and execvp.

int lsh_launch(char **args)
{
    //`pid_t` for uniquely identifying and managing processes in Unix-like systems
    pid_t pid, wpid;
    int status;

    // create new processes (parent and child)
    pid = fork();
    if (pid == 0)
    {
        // child process
        if (execvp(args[0], args) == -1)
        {
            perror("lsh");
        }
        exit(EXIT_FAILURE);
    }
    else if (pid < 0)
    {
        // error forking
        perror("lsh");
    }
    else
    {
        // parent process
        do
        {
            wpid = waitpid(pid, &status, WUNTRACED);
        } while (!WIFEXITED(status) && !WIFSIGNALED(status));
    }
    return 1;
}

7. Built-in Commands

We define three built-in commands: cd, help, and exit. These commands have specific functions associated with them.

// function declarations for built-in shell commands
// forward declaration is when you declare (but don’t define) something, so that you can use its name before you define it
int lsh_cd(char **args);
int lsh_help(char **args);
int lsh_exit(char **args);

// list of built-in commands followed by their corresponding functions
char *builtin_str[] = {
    "cd",
    "help",
    "exit"};

// built-in commands can be added simply by modifying these arrays
int (*builtin_func[])(char **) = {
    &lsh_cd,
    &lsh_help,
    &lsh_exit};

int lsh_num_builtins()
{
    return sizeof(builtin_str) / sizeof(char *);
}

// built-in functions implementation
int lsh_cd(char **args)
{
    if (args[1] == NULL)
    {
        fprintf(stderr, "lsh: expected argument to \"cd\"\n");
    }
    else
    {
        if (chdir(args[1]) != 0)
        {
            perror("lsh");
        }
    }
    return 1;
}

int lsh_help(char **args)
{
    int i;
    printf("Stephen Brennan's LSH\n");
    printf("Type program names and arguments, and hit enter.\n");
    printf("The following are built in:\n");

    for (i = 0; i < lsh_num_builtins(); i++)
    {
        printf(" %s\n", builtin_str[i]);
    };

    printf("Use the man command for information on other programs.\n");
    return 1;
}

int lsh_exit(char **args)
{
    return 0;
}

8. Some clarifications

There may be some methods or types that aren't very clear at first glance. They are imported with the headers:

`#include <sys/wait.h>`

waitpid() and associated macros //tell parent process to wait for child process

`#include <unistd.h>`

chdir() change directory
fork() create a new process. it returns twice: once in the parent process and once in the child process
exec() execute process
pid_t data type defined in Unix-like operating systems (including Linux) used for representing process IDs

`#include <stdlib.h>`

malloc() allocate memory
realloc() reallocate memory
free() free up memory
exit() exit process
execvp() replacing in the child process the current process image with a new program
EXIT_SUCCESS, EXIT_FAILURE exit process with success or failure

`#include <stdio.h>`

fprintf() choose where to print to: a file, stderr, or any custom stream for purposes such as logging, error reporting, or output redirection
printf() print to console
stderr standard input for error
getchar() read char
perror() print an error message to stderr

`#include <string.h>`

strcmp() compare two strings
strtok() tokenize the input line based on the specified delimiters

Some macros:

`#define LSH_TOK_BUFSIZE 64`

This defines a macro LSH_TOK_BUFSIZE with a value of 64. This macro is used to specify the initial size of the buffer for storing tokens.

`#define LSH_TOK_DELIM " \t\r\n\a"`

This defines a macro LSH_TOK_DELIM with a string value " \t\r\n\a". This string contains delimiters used to split the input line into tokens. The delimiters include:

Space (' ')
Tab ('\t')
Carriage return ('\r')
Newline ('\n')
Alert/bell ('\a')

Some function parameters here may be not very clear, for example:

int main(int argc, char **argv)
{
    //code...
}

Here the ** in char **argv indicates that argv is a pointer to a pointer of type char. Let's break down what this means in detail:

char *: A single pointer to a character, often used to represent a string in C.
char **: A pointer to a pointer of type char. In the context of argv, it is used to represent an array of strings (or an array of pointers to characters).

9. Conclusion

Here, we have built a minimal shell in C. This shell can read input, parse it, and execute commands, including some basic built-in commands. This is a great starting point for understanding how shells work and can be expanded with more features and commands.

By following this step-by-step guide, you should now have a basic shell that you can customize and build upon. Happy coding!

A minimal shell written in C