C Argument Parsing

GNU Getopt Reference

Argument parsing in C can sound like an intimidating task. C is formidable language for beginning and novice programmers, and extensions beyond the core language can be hard to wrap your hands around. Fortunately, the POSIX getopt() standard and the GNU getopt_long() extension provide a relatively easy way to implement argument parsing.

This article will describe how to handle all variations of argument input using the C programming language. It is accompanied by fully documented GPLv3 demonstration code that you can grab off of github:

git clone git://github.com/retrospectivelyobvious/argparse-all-over.git

Handling Short Form Options

Short form options are those that are represented by a single character (e.g. ‘-c’). Arguments in C are most easily handled by using the ‘getopt’ function. To use getopt(), you need a basic program structure that looks like the following:

#include <unistd.h> //needed for getopt()
 
int main(int argc, char** argv){
    /* (1) Setup Code */
    /* (2) The getopt() Call */
    while( (arg = getopt(argc, argv, "abc:")) != -1 {
        switch(arg){
            /* (3) Argument Processing */
        }
    }
    /* (4) Extra Argument Handling */
 
    /* main program logic here */
 
    return 0;
}

FULL EXAMPLE

getopt()’s Guts

Let’s walk through this scaffold one piece at a time. The getopt() function is designed to have the program’s argc/argv pair passed directly to it, along with a simple specification string identifying which arguments to recognize. You call the function in a loop, and with each successive iteration it recognizes one of the arguments from the command line. The recognized argument is returned by the function and processed with a simple switch statement that usually saves the information off to some data structure for later usage.

We’ll start by looking at getopt()’s function signature and arguments:

int getopt(int argc, char** argv, const char* options)
Type Name Purpose
int (return val) The return value from getopt() is the character/argument recognized by the function, or a ‘?’ if an error occurs
int argc The number of input arguments, this is usually passed through from the main() program’s argc
char** argv The argument string for getopt() to parse, this is usually passed through from the main() program’s argv
const char* options This variable dictates what characters getopt() will recognize as options, and whether or not they take parameters

The ‘options’ String

The return value and the first two arguments are simple enough to handle. The third argument, the options string, requires a little decoding. The options string is a concatenation of the switches that the program will need to accept, with some decorators to indicate behavior. The two distinguishing features of the option string are a series of characters, and the use of a colon. Each character in the options string indicates that the given character can be passed as an option to the program. The inclusion of a colon after the character within the string indicates that the particular option takes a textual argument as well. This is most easily illustrated by example.

Examples:

Program Call Appropriate Options String
prog -c -a -q somefile.txt “acq:”
prog -a 12M -b outputfile -c “a:b:c”
prog -a -b gofast -c “ab:c”

Other getopt() I/O

In addition to arguments passed directly to the function, there are a set of global variables that influence getopt()’s behavior. These variables are defined in the ‘unistd.h’ header which you’ll need to include in order to use getopt().

Direction Type Name Function
Input int opterr
  • If this value is non-zero (default), getopt() will print an error message when it encounters an unrecognized option.
  • If this value is 0, getopt() will not print an error message when it encounters an unrecognized option.
Output int optopt When getopt() encounters an unknown argument, it returns a ‘?’ and populates optopt with the unknown argument character.
Output int optind Index (with respect to argc) of the next argument to be processed. If getopt() has finished processing arguments, this will contain the index of the first non-option argument.
Output char* optarg If the option includes a parameter (-o somefile.txt), optarg will point to parameter string (somefile.txt)

Supporting Structure

There’s a minimal bit of scaffolding we need to setup around getopt() in order to make it to go. We’ll walk through a larger example piece by piece here. You can grab the source for all of these examples off of Github.

First, variables for storing information returned from getopt() are established. This includes setting opterr, which indicates to getopt() how we want it to handle unrecognized options. If opterr’s value is non-zero (default), getopt() will print an error message when it encounters an unrecognized option. If the value is zero, on the other hand, getopt() will not print an error message when it encounters unrecognized options.

int main (int argc, char **argv){
 
    int a = 0, b = 0;
    char *cstr = NULL;
    int arg;
    opterr = 0;

Next is the main getopt() processing loop. Each time we call getopt(), it processes a single argument from the argv list. getopt() compares that argument against the list of possible options passed to it in the options string, and returns either (1) the recognized character or (2) the ‘?’ character. Using the return value from the function, the program can determine which arguments were passed.

    while ((arg = getopt (argc, argv, "abc:")) != -1){
        switch (arg){
            case 'a':
                // Detected the 'a' option, mark a flag to indicate
                a = 1;
                break;
            case 'b':
                // Detected the 'b' option, mark a flag to indicate
                b = 1;
                break;
            case 'c':
                // Detected the 'c' option, save the location of its
                // argument (which getopt populates in the 'optarg' global).
                cstr = optarg;
                break;
            case '?':
                // Encountered an unknown or improperly formatted flag,
                // deal with these appropriately.
                //
                // getopt() populates the flag character into the 'optopt' 
                // global.
                if (optopt == 'c'){
                    fprintf(stderr, "-c requires an argument.\n");
                }else{
                    fprintf(stderr, "Unknown flag '%c'.\n", optopt);
                }
                //Fall-through here is intentional 
            default:
                //Hit some unexpected case - quit.
                exit(EXIT_FAILURE);
        }
    }

The remainder of the program deals with accounting for unhandled arguments. These are arguments that were passed to the program, but were not connected to any of the specified option flags. It is important to know that the optind variable is populated by getopt() to contain the (argc)index of the next argument to be parsed after each call to the function. Since getopt() is finished parsing the variable points to the first non-option argument.

    fprintf(stdout, "a = %d\nb = %d\ncstr = %s\n", a, b, cstr);
 
    for (int i = optind; i < argc; i++){
        printf("Additional argument %s\n", argv[i]);
    }
 
    exit(EXIT_SUCCESS);
    return 0;
}

A full, compile-able version of this sample program is available off Github.

Handling Long Form Options

Long form options (e.g. –long-option) are not much harder to process than their short form cousins. The main difference is that we use a slightly different function, getopt_long(). We’ll also need to setup some data structures to communicate the available options to getopt_long(). Calling the function and processing the return codes are nearly identical to getopt().

We’ll start again with the function signature:

int getopt_long(int argc, char** argv, const char* shortopts, const struct option* longopts, int* indexptr)
Type Name Purpose
int (return val) The return value from getopt() is the character/argument recognized by the function, or a ‘?’ if an error occurs
int argc The number of input arguments, this is usually passed through from the main() program’s argc
char** argv The argument string for getopt() to parse, this is usually passed through from the main() program’s argv
const char* shortopts This indicates which short options will be recognized, it follows the same conventions as the getopt() function.
const struct option* longopts This indicates which long options will be recognized. The format of the structure is detailed below.
int* indexptr This must point to an already allocated int which getopt_long() will use to return the index of the found argument’s entry in the longopts structure.

Let’s walk through another example:

#include <getopt.h>
int main (int argc, char **argv){
    int a = 0, b = 0, e = 0;
    char *cstr = NULL, *dstr = NULL;
    int arg;
    int index = 0;
 
    opterr = 0;
 
    struct option opts[] = /* const char*   int                int*    int 
                              name          has_arg            flag    val    */
                         {  { "alpha",      no_argument,       NULL,   'a' },
                            { "bravo",      no_argument,       NULL,   'b' },
                            { "charlie",    required_argument, NULL,   'c' },
                            { "delta",      required_argument, NULL,   'd' },
                            { "echo",       no_argument,       NULL,   'e' },
                            { NULL,         0,                 NULL,    0  }
                         };

Note first that we require an additional header – getopt.h – as the getopt_long() function is not part of the standard libraries. Everything is the same as in the previous example, except that we’ve added a few more carrier variables for option information and have inserted the data structure which describes the options. Let’s go through the structure’s fields:

Type Member Name Purpose
const char* name The option name string
int has_arg A flag indicating if the option takes an argument. These flags are predefined in the header: ‘no_argument’ indicates that the option does not take an argument while ‘required_argument’ indicates that one must be provided.
int* flag This is safe to ignore for now – set it NULL & check out the manual if you’re interested in some extra fancy features
int val This field is the character that will be returned by getopt_long() when it recognizes this option.

Note that the structure needs to be terminated by a NULL entry as shown in the example.

The processing for getopt_long() is nearly identical to getopt(). First notice that the function can process both short and long options, so it takes the ‘shortopt’ string which is identical to getopt()’s options field. For the long options, note that the defining structure includes a ‘val’ field, which indicates the character to be returned when the option is recognized. The ‘val’ codes can overlap those from the short options string – allowing for short and long versions of the same option. The index argument acts as another output for the function. By providing it the address of a holding variable, the function can indicate which entry in the long options structure it matched. It will write the index of that entry within the longopts structure in the ‘index’ variable. Be aware that if a long option is not found (i.e. if a short option is found, or no option at all is recognized), the index field will not be changed/reset.

    while ((arg = getopt_long(argc, argv, "abc:d", opts, &index)) != -1){
        switch (arg){
            case 'a':
                a = 1;
                break;
            case 'b':
                b = 1;
                break;
            case 'c':
                cstr = optarg;
                break;
            case 'd':
                fprintf(stdout, "Got long argument: %s.\n", opts[index].name);
                dstr = optarg;
                break;
            case 'e':
                fprintf(stdout, "Got long argument: %s.\n", opts[index].name);
                e = 1;
                break;
            case '?':
                if (optopt == 'c'){
                    fprintf(stderr, "-c requires an argument.\n");
                }else if (optopt == 'd'){
                    fprintf(stderr, "--delta requires an argument.\n");
                }else{
                    fprintf(stderr, "Unknown flag '%c'.\n", optopt);
                }
                //Fall-through here is intentional 
            default:
                //Hit some unexpected case - quit.
                exit(EXIT_FAILURE);
        }
    }

The remained of the program is no different from the short version:

    /* Display the values of the found arguments */
    fprintf(stdout, "a = %d\nb = %d\ncstr = %s\ndstr = %s\ne = %d\n",
            a, b, cstr, dstr, e);
 
    // Print out any remaining arguments unhandled by getopt() 
    for (int i = optind; i < argc; i++){
        printf("Additional argument %s\n", argv[i]);
    }
 
    exit(EXIT_SUCCESS);
    return 0;
}
 
<a href="http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US" rel="license"><img src="http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png" alt="Creative Commons License" /></a>
This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US" rel="license">Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License</a>.

Post a Comment

Your email address will not be published. Required fields are marked *

*

Please type the characters of this captcha image in the input box

Please type the characters of this captcha image in the input box