Getting Interactive Input in C

Previous Index Next

In C, input and output is modelled via files. There is a file for the input stream called stdin, and another one called stdout for the output stream. The standard I/O library stdio provides a multitude of function for accessing these streams and reading or writing data from resp. to them.

However, there are a couple of pitfalls in using these streams correctly, especially if one wants to write a fairly robust program. On this page, I'll discuss some of them.

Contents

  1. Reading an Integer
  2. Getting Rid of Type-Ahead Input
  3. Prompting
  4. Safely Reading a double

Reading an Integer

We'll use this simple problem to demonstrate the problems in connection with getting input from a user. Of course, everything said here applies equally well to getting other input - a sequence of characters (e.g. a file name), or floating point values, or anything else.

Using scanf to get input

This is the simple way of getting nearly any input from the user. It's convenient, especially for quick'n'dirty hacks, and most C programmers "know" it - they've learned about printf, and scanf is sooo similar...

Don't be fooled.

Let's look at some differences between printf and scanf first.

scanf needs pointers as arguments

Simple as it may seem, this is a very common error beginners make. They've seen printf used like this:
int user_id;
printf ("User number: %d\n", user_id);
and now they think that they can use scanf in the same way to read input:
scanf ("%d", user_id); /* This is wrong! */
Of course, this doesn't work. scanf expects pointers to the variables into which it should store the values read:
scanf ("%d", &user_id);
Handling illegal input

The above statement will work - as long as the user does enter a number. But what happens if s/he does not? Suppose we want to read in several numbers in a loop:
#define N   10

int numbers[N];
int i;

for (i = 0; i < N; i++) {
  scanf ("%d", &numbers[i]);
}
What happens if the user enters an invalid input in the first scanf? Suppose the user entered
hello<return>
instead of a number. This doesn't match the format of a number, so scanf fails and does not set numbers[i]. It also leaves hello<return> in the input stream! Therefore, the second time round, the same thing happens! And so on, until we've run 10 times through the loop. Not a single number was read!

What can we do? We could check scanf's return value, which tells us how many items have been read. But we'd still be left with the problem of getting rid of that erroneous input. We might try something like this:
i = 0;
while (i < N) {
  items_read = scanf ("%d", &numbers[i]);
  if (items_read) {
    i++;
  } else {
    /**** Erroneous input, get rid of it and retry! */
    scanf ("%*[^\n]");
  }
}
Right, this works. The format string "%*[^\n]" means: read any character as long as it's not a newline ('\n'), and then just discard the characters read (the '*' in the format specifier). But the user still might enter the following:
1 2 3 4 5 6 7 8 9 10<return>
although we had wanted him or her to enter one number each time only. The two cases are indistinguishable for our little input loop. And if the input is
1 2 3 4 5 hello 7 8 9 10<return>
things get even hairier.

These (and related problems) are the reason why one usually reads in a whole line of input into a buffer and only then parses it using other techniques even in only halfway serious applications. This approach will be discussed next.

Reading Into a Buffer

This is a very general technique for handling input. Instead of dealing with formatted input directly, we simply read in a whole line, which we then analyze. Reading in a line is simple:

#define BUFFER_LENGTH   40

char line[BUFFER_LENGTH];
char *p;

p = gets (line);  /* Uh-oh. Don't panic. See below. */
if (p == NULL) {
  /* Some error occurred, or we reached end-of-file. No input was read. */
} else {
  /* Parse the line. */
}

That's the general approach. But what happens if the user enters more than BUFFER_LENGTH characters? The gets() function cannot guard against this case - it doesn't know how long the array pointed to by its argument is. Therefore it will simply write beyond line's boundary when the user enters more than BUFFER_LENGTH characters, thus corrupting memory.

Luckily, the workaround for this problem is simple: we just have to use fgets instead of gets. fgets has one big advantage: we can tell it how long our buffer is! We just change the first line to

p = fgets (line, BUFFER_LENGTH, stdin);

And voil�, we have safe input! fgets reads at most BUFFER_LENGTH-1 characters, and appends a NUL character as string terminator. There is a small problem though: fgets also reads the <return> the user typed to terminate his or her input! Quite often this is not what is wanted. So how can we get rid of an eventual '\n' at the end of line?

The Newline Character

There are several ways to lose that newline character. The simplest one is this:

char *get_line (char *s, size_t n, FILE *f)
{
  char *p = fgets (s, n, f);

  if (p != NULL) {
    s[strlen(s)-1] = '\0';  /* This is wrong! */
  }
  return p;
}

And that's wrong! The string read by fgets not necessarily contains a '\n'! If the input is longer than n-1 characters, the last character before the terminating NUL character is some other character than '\n', and most probably we don't want to lose it!

The correct way is therefore the following:

char *get_line (char *s, size_t n, FILE *f)
{
  char *p = fgets (s, n, f);

  if (p != NULL) {
    size_t last = strlen (s) - 1;

    if (s[last] == '\n') s[last] = '\0';
  }
  return p;
}

Now this may seem trivial to you, but it's a surprisingly common error!

Don't Be Too Tricky!

There is a quite common "trick" to get rid of an eventual trailing newline, illustrated by the following implementation of my get_line function:

char *get_line (char *s, size_t n, FILE *f)
{
  char *p = fgets (s, n, f);

  if (p != NULL) strtok (s, "\n");
  return p;
}
(Note that the second argument of strtok really is a string, not a character! It's not a typo!)

If you study the strtok function, you'll see that this really overwrites a trailing '\n' with a NUL character if there is a newline, and does nothing if there isn't one.

But this is dangerous ground! For strtok is not reentrant, and thus calling get_line in a loop may have the doggonest effects if that loop somehow also uses strtok. Consider the following example:

We implement a command loop. There may be multiple commands on a line, we'll treat them one after the other. Some commands may require more user interaction. (Note: this is rather poor interface design, but unfortunately I can't come up with a more convincing example right now. Would you just believe me that similar situations can easily occur in real life? Thank you.)

#define CMD_LENGTH  100
#define NAME_LENGTH  50

char cmd[CMD_LENGTH], name[NAME_LENGTH];

while (get_line (cmd, CMD_LENGTH, stdin)) {
  /* Now parse the command line. Commands are separated by semicolons. */
  curr_cmd = strtok (cmd, ";");                       /* (1) */
  while (curr_cmd != NULL) {
    if (strcmp (curr_cmd, "new_client") == 0) {
      printf ("Enter the name of the new client: ");
      fflush (stdout);
      if (get_line (name, NAME_LENGTH, stdin)) {      /* (2) */
         ...
      }
      ...
    } else if (...) {
      ...
    }
    ...
    curr_cmd = strtok (NULL, ";");  /* Find next command (3) */
  }
  ...
}

This won't work because strtok is not reentrant. The calls at the points marked (1) and (3) belong together - they are supposed to operated both on cmd. However, the intervening call to strtok hidden in get_line at point (2) scrambles the data structures set up at (1), with the effect that the call at (3) erroneously operates on name!

Because of such hidden effects that can prove desastrous I usually avoid strtok and in particular the strtok (str, "\n") trick. The function is simply poorly designed and is not suitable for real applications.

Skipping the Rest of the Line

In one of the examples above, we've already seen how to skip a line in the input. However, there are a few pitfalls to avoid when doing this in a general setting.

The basic method of skipping input is to use the '*' to indicate assignment suppression:

scanf ("%*[^\n]");

The format specification [^\n] matches any sequence of characters up to a newline. The '*' tells scanf to skip the matching input without assigning it to any variabe (consequently, there is no corresponding variable in the argument list of scanf).

But this doesn't read the newline itself! We have to read it explicitly ourselves. I'll show you first a wrong way to attempt to do this:

scanf ("%*[^\n]%*c"); /* This is wrong! */

Why is this wrong? Well, what happens if the rest of the line contains only a newline character? In that case, there's no matching input for the [^\n] format specifier, and therefore scanf fails before it even processes the %c specification! This means that the newline is still left in the input...

The common way to solve this problem is simply to use two calls. Either we use

scanf ("%*[^\n]"); scanf ("%*c");

or

scanf ("%*[^\n]"); (void) getchar ();

Putting all this together, we can now design a function which safely reads as much as it can into a fixed-size buffer and then discards the rest of that line of the input:

char *read_line (char *buf, size_t length, FILE *f)
  /**** Read at most 'length'-1 characters from the file 'f' into
        'buf' and zero-terminate this character sequence. If the
        line contains more characters, discard the rest.
   */
{
  char *p;

  if (p = fgets (buf, length, f)) {
    size_t last = strlen (buf) - 1;

    if (buf[last] == '\n') {
      /**** Discard the trailing newline */
      buf[last] = '\0';
    } else {
      /**** There's no newline in the buffer, therefore there must be
            more characters on that line: discard them!
       */
      fscanf (f, "%*[^\n]");
      /**** And also discard the newline... */
      (void) fgetc (f);
    } /* end if */
  } /* end if */
  return p;
} /* end read_line */

Getting Rid of Type-Ahead Input

This is an eternal debate. How do you get rid of type-ahead input, i.e. input the user typed before your program even prompted him or her to do so?

There is no solution to this problem in standard C. It's not possible. In particular, the following solutions are no good:

fflush (stdin)
Forget about that immediately. fflush is defined only on output streams, and stdin is an input stream. The effects of flushing an input stream are undefined - anything or nothing at all might happen.

rewind (stdin)
This also is a hack that you should forget about right now. First of all, it would be the wrong paradigm: rewinding a stream means to go back to its beginning, so if anything at all should happen, it would mean to un-read all input read so far, so that subsequent reads re-read the stream from the beginning. Obviousely, that's not what's wanted.

But there's a more compelling reason not to try it: the effects simply are undefined. rewind(f) is equal to (void)fseek (f, 0L, SEEK_SET), and fseek is defined only on files that can support positioning requests. The standard explicitly says [ISO 7.9.3, 1st paragraph]:
"If a file can support positioning requests (such as a disk file, as opposed to a terminal), ..."
So it won't work. Get used to it. (And don't tell me "but it works on my machine". It's not C.)

Simply read the type-ahead input
This looks like a good idea, but it suffers from a major problem: how do you recognize when you're done reading?? Even if you could tell, how would you know that there weren't any characters left in some obscure buffers in the operating system?

Making stdin unbuffered
No, I don't think you really mean this. You'd have to handle all kinds of control characters yourself (e.g. deleting input when the user hits the backspace key). Also, there's no standard way of making stdin unbuffered. True, there is the setvbuf function, but all the standard says about unbuffered input is [ISO 7.9.3, 3rd paragraph]:
"When a stream is unbuffered, characters are intended to appear from the source or to the destination as soon as possible."
Now, what does "as soon as possible" mean? While you might turn off buffering in the C library using setvbuf, there's no standard way to tell e.g. your terminal not to buffer characters. And quite often terminals do buffer input. Of course, there might be system specific routines that let you tell your terminal not to buffer input, but this hasn't got anything to do with standard C.

So, usually you'll have to resort to ways specific to your operating system to get truly unbuffered input - if this is at all possible! And if it is, you'll get a whole load of other problems with it.

Prompting

When your program expects input from the user, it usually will not just sit there and wait for him or her to type something. Normally, it'll tell the user that s/he is expected to input something. The program will print a "prompt" on the screen (or more correctly: to stdout).

Now if stdout is buffered (which is the normal case), there's no guarantee whatsoever that this prompt appears immediately. This can be particularly annoying when one wants to print some kind of progress indicator (e.g. a dot for each kilobyte of data that has been processed).

Normally, prompting may not cause problems: most implementations try to do the right thing and flush the buffer of stdout before a call to an input function on stdin. But not necessarily, and that doesn't help for the progress indicator case. The solution is simple, but apparently many novices don't know it: flush stdout's buffers "by hand" by calling fflush whenever you really want your output to appear. Just like in the following example

char buf[80], *p;

printf ("Enter your name, please: ");
fflush (stdout);
p = fgets (buf, 80, stdin);
...

The same also is true for a progress indicator:

while (things_to_do) {
   do_a_bit_of_work ();
   putc ('.', stdout);
   fflush (stdout);
}

Without the fflush, the dots would accumulate in the buffer of stdout and only be written once that buffer was full. This wouldn't give a good impression of progress, would it?


Safely Reading a double

Now let's put all this together to see if and how we can safely read a double. Most of the techniques we've seen above, this is just an integration - though there are some tricky details, as you'll see.

Our goal is to write a function that:

We'll assume to have the following declarations available:

#include <errno.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>

#define BUF_SIZE   42

#define SUCCESS     1
#define FAILURE     0

static char buf[BUF_SIZE];

int error_msg (const char *format, ...)
{
  va_list args;
  int     res;

  va_start (args, format);
  res = vfprintf (stderr, format, args);
  va_end (args);
  return res;
}

Given the above list of features, it is quite clear that our function must look more or less like this:

int read_double (double *d)
  /**** Returns SUCCESS or FAILURE, if FAILURE, the value of '*d' is
        undefined. */
{
  size_t  length;

  for (;;) {
    /**** Prompt */
    printf ("Please enter a floating point number: ");
    fflush (stdout);
    /**** Read into buffer */
    if (!fgets (buf, sizeof (buf), stdin)) return FAILURE;
    /**** Remove trailing newline, if any */
    length = strlen (buf); /* (1) */
    if (buf[length-1] == '\n') buf[--length] = '\0';
    /**** Convert the value, setting error indicators. */
    ...
    if (/* no error */ ...) break;
    /**** Print error message */
    ...
  } /* end for */
  return SUCCESS;
} /* end read_double */

Now all that's left is to fill in the tree empty places marked above. Ah yes, and there's a remark to make (position marked by the number in the comment):

  1. length is guaranteed to be >= 1 here. Why? Well, even if the user tries to enter an empty string, we actually have the newline in the buffer, i.e., strcmp (buf, "\n") == 0. There is no way for the user to enter something that'll leave us with no characters at all in the string unless s/he signals EOF on the input stream. (This can be done by typing Ctrl-D on some terminals.) But if we reach EOF without having read at least one little character, fgets returns NULL, and thus we leave the whole loop.

One way to convert the buffer's contents is sscanf. We can use the return value to find out whether a conversion actually happened, i.e. if the string started with a floating point number. However, we also want to catch an input of "67o" (instead of "670") as an error, but the "%lf" format specifier would just convert the "67" into a double. To find out which part of the buffer actually was converted, we'll also have to use the "%n" specifier. This specifier says that the corresponding argument is a pointer to an int into which the number of characters consumed so far will be stored. If all characters of the string in buf have been consumed,it's ok, otherwise we have an error.

That gives us the following:

  ...
  {
    int n_items, n_chars;

    /**** Convert the value, setting error indicators. */
    n_items = sscanf (buf, "%lf%n", d, &n_chars); /* (1) */
    /**** Break the loop if no error. */
    if ((n_items == 1) && (n_chars == length)) break;
    /**** Print an error message. */
    if (!n_items) n_chars = 0; /* (2) */
    error_msg ("Illegal input: there's an error at the position "
               "indicated below:\n");
    error_msg ("    %s\n", buf);
    error_msg ("    %*c\n", n_chars+1, '^'); /* (3) */
  }
  ...
} /* end for */

Note three points (marked above by numbers in comments):

  1. We pass d, not &d, as argument to sscanf. d already is a pointer to a double, so there's no need to take its address. We don't want to pass a pointer to a pointer to double!
  2. If no conversion at all was done, sscanf terminated before it even processed the "%n" directive, so n_chars is still undefined. This can only happen if the string in buf doesn't begin with a valid floating point number, e.g. if it is just "hello". Therefore, we set n_chars to zero in this case.
  3. We print a little caret (^) under the character where we found the error. To align this caret correctly, we print it in a field of width n_chars + 1, and since the character will be right-aligned by default, it'll be printed at the correct place. The "*" in the format specifier means that the field width is the next argument.

Another way to achieve nearly the same effect is to use strtod, giving the following:

  ...
  {
    char *end;

    errno = 0;
    /**** Convert the value, setting error indicators. */
    *d = strtod (buf, &end);
    /**** Break the loop if no error. */
    if (!errno && length && !*end) break;  /* (1) */
    /**** Print an error message. */
    if (errno != 0) { /* (2) */
      error_msg ("Illegal input: %s\n", strerror (errno));
      error_msg ("The error was detected at the position "
                 "indicated below:\n");
    } else {
      error_msg ("Illegal input: there's an error at the position "
                 "indicated below:\n");
    } /* end if */
    error_msg ("    %s\n", buf);
    error_msg ("    %*c\n", (int) (end - buf) + 1, '^'); /* (3) */
  }
} /* end for */

Again, note the following points:

  1. The test on length is to catch empty inputs, which we regard as an error. If *end == '\0', the whole string was consumed.
  2. errno is set by strtod if there was an error, so we can use it to detect error conditions, provided we set it to zero before the call to strtod!
  3. end points to the first character that wasn't converted anymore. Using the difference between end and buf, we can find out how many characters were converted. But note: the result of this pointer difference is of type ptrdiff_t, which is not necessarily int! Thus we need to manually cast that value to type int.

Also note that we can't use errno for error detection in the sscanf case because sscanf is not required to set it to any meaningful value, in fact, it is allowed to set it to any value it likes:

[ISO 7.1.4, p.97, last paragraph]
"The value of errno may be set to nonzero by a library function call whether or not there is an error, provided the use of errno is not documented in the description of the function in this International Standard."

And errno is never mentioned in the descriptions of sscanf or fscanf...

In particular, since we cannot rely on errno, we have no way to detect overflows if we're using sscanf. Using strtod we can, because this function sets errno to ERANGE upon overflow. The value assigned to *d in both cases is HUGE_VAL.

One small problem remains: what if the user enters more than BUF_SIZE characters? We can detect this by the fact that there's no newline in the buffer, and then just swallow the extra characters to avoid confusing the next reads. (Note that this is something completely different than getting rid of type-ahead!)

This gives us finally the following function:

int read_double (double *d)
  /**** Returns SUCCESS or FAILURE, if FAILURE, the value of '*d' is
        undefined. */
{
  size_t length;

  for (;;) {
    /**** Prompt */
    printf ("Please enter a floating point number: ");
    fflush (stdout);
    /**** Read into buffer */
    if (!fgets (buf, sizeof (buf), stdin)) return FAILURE;
    /**** Remove trailing newline, if any */
    length = strlen (buf);
    if (buf[length-1] == '\n') {
      buf[--length] = '\0';
      /**** Attempt the conversion. */
      {
        char *end;

        errno = 0;
        /**** Convert the value, setting error indicators. */
        *d = strtod (buf, &end);
        /**** Break the look if no error. */
        if (!errno && length && !*end) break;
        /**** Print an error message. */
        if (errno != 0) {
          error_msg ("Illegal input: %s\n", strerror (errno));
          error_msg ("The error was detected at the position "
                     "indicated below:\n");
        } else {
          error_msg ("Illegal input: there's an error at the position "
                     "indicated below:\n");
        } /* end if */
        error_msg ("    %s\n", buf);
        error_msg ("    %*c\n", (int) (end - buf) + 1, '^');
      }
    } else {
      /**** There was no newline in the buffer: swallow extra
            characters. */
      scanf ("%*[^\n]");
      /**** We have the newline as dessert... */
      (void) getchar ();
      /**** Tell the user not to try to trick us: */
      error_msg ("Input too long. Don't type more than %d characters!\n",
                 BUF_SIZE);
    } /* end if */
  } /* end for */
  return SUCCESS;
} /* end read_double */