Exercise 1-22

Write a program to “fold” long input lines into two or more shorter lines after the last non-blank character that occurs before the n-th column of input. Make sure your program does something intelligent with very long lines, and if there are no blanks or tabs before the specified column.

Our first instinct might be to create a variable that keeps track of which column we are on, and print a newline if it exceeds a certain constant.

    #include <stdio.h>
    
    #define MAXLEN 80   /* column limit for output lines */
    #define TABWIDTH 4  /* indent size */
    
    main()
    {
        int c, i;
        int col;    /* current column */
    
        col = 0;
        while ((c = getchar()) != EOF) {
            if (col > MAXLEN) {
                putchar('\n');
                col = 0;
            }
            putchar(c);
            ++col;
            if (c == '\t')
                col = col + (TABWIDTH - col % TABWIDTH);
            else if (c == '\n')
                col = 0;
        }
        return 0;
    }
However, this approach does not work as intended as it also places line breaks in be
tween words.

In order for our program to function correctly, it needs to be able to recognize words. A technique that comes in handy when developing programs of higher complexity is to write the steps of a program in pseudocode. Let us try doing that with our program.

    while (character is not end-of-file indicator)
        read in text until whitespace or until equal to column limit
        if (printing the text passes column limit)
            move to next line
        print text
        while (next character is blank or tab and column limit has not been passed)
            read in and print character
        if (column limit has been passed)
            read in remaining whitespace
            move to next line

It may be a bit unclear as to how we want our program to work, so let us look at a few examples. For now, assume that the column limit is ten.

    > hello, world
    
    hello, 
    world

We first read in hello, and check if printing it means crossing the column limit. This is not the case, so we just print the text. Next, we print the space. Finally, we read in world. If we were to print it out, we would pass the column limit, so we move to the next line before printing world.

Let us see how we will deal with words longer than the column limit.

    > abcdefghijklm nopqrstuvwxyz
    
    abcdefghij 
    klm 
    nopqrstuvw 
    xyz

To begin, we only read in abcdefghijk, because at that point, the length of the text is equal to the column limit. Printing it out does not mean we pass the column limit, so we just print the string. The next character is not a whitespace, so the entire bottom half of the loop gets skipped. Then, we read in klm. Printing it out would mean crossing the column limit, so we move to the next line and then print it. We then print out the whitespace. Next, we read in nopqrstuvw. Printing it would cross the column limit, so like before, we move to the next line prior to printing it out. There is no whitespace afterward so the next part gets skipped. Finally, we read in xyz. Once again, printing it would mean passing the column limit, so we move to the next line first.

What about when we have long sequences of whitespace?

    > Lorem                    \
    ipsum                    dolor

    Lorem     
    ipsum     
    dolor

First, we read in and print Lorem. Then we keep reading in and printing the spaces until we pass the column limit, at which point we exit the loop. Then, we skip past the remaining whitespace. Next, we read in ipsum. Printing it would mean crossing the column limit, so we print a newline first. We then read in as much of the whitespace as we can, and then skip past the rest. Like before, printing dolor would mean passing the column limit, so we move to the next line first. Notice how all the examples so far leave trailing whitespace. We could try and avoid this, but that would only complicate things since we would need to read one word ahead in order to know if the whitespace will be trailing. Plus, we have already written a program that deals with that situation!

Now that we have the steps down, we need to convert our pseudocode into actual code. We start with the following template.

    #include <stdio.h>
    
    #define MAXLEN 80   /* column limit for output lines */
    #define TABWIDTH 4  /* indent size */
    
    main()
    {
        while ((c = getchar()) != EOF)
            /* code */;
        return 0;
    }

In order to read in a word, we create the character array word to store all the characters we read in up until a whitespace character, or if the string's length exceeds MAXLEN. We can also create the integer variable len to store a word's length, which increments for every character read in. Finally, we must also not forget to null-terminate our string after it is finished being read into!

    #include <stdio.h>
    
    #define MAXLEN 80   /* column limit for output lines */
    #define TABWIDTH 4  /* indent size */
    
    main()
    {
        int c;
        char word[MAXLEN + 1];  /* current word */
        int len;                /* current word length */
    
        while ((c = getchar()) != EOF) {
            for (len = 0; c != ' ' && c != '\t' && c != '\n' && len < MAXLEN;
                ++len) {
                word[len] = c;
                c = getchar();
            }
            word[len] = '\0';
        }
        return 0;
    }

Note: it is good practice to keep lines of your code shorter than a certain character limit to keep it readable on smaller displays (eighty characters is generally a good benchmark.) Remember, unlike some other languages, C is not particularly strict about whitespace and many constructs can span over multiple lines. After we finish writing this program, you can have it do this for you!

Note: the size of word is MAXLEN + 1 in order to allocate space for the additional null character.

Printing the word is fairly straightforward: we use printf and the %s format specifier. We will also need to create the integer variable col to keep track of our column by incrementing it by len for every word. Before printing a word, if col + len is greater than MAXLEN, we will need to print a newline and reset col to zero.

    #include <stdio.h>
    
    #define MAXLEN 80   /* column limit for output lines */
    #define TABWIDTH 4  /* indent size */
    
    main()
    {
        int c;
        int col;                /* current column */
        char word[MAXLEN + 1];  /* current word */
        int len;                /* current word length */
    
        col = 0;
        while ((c = getchar()) != EOF) {
            for (len = 0; c != ' ' && c != '\t' && c != '\n' && len < MAXLEN;
                ++len) {
                word[len] = c;
                c = getchar();
            }
            word[len] = '\0';
            if (col + len > MAXLEN) {
                putchar('\n');
                col = 0;
            }
            printf("%s", word);
            col = col + len;
        }
        return 0;
    }

After the word is printed, we need to deal with the whitespace that comes after it. We want our program to be able to deal with sequences of whitespace, so we use a loop that continues to run as long as the next character is a blank or tab and col is less than or equal to the column limit. During every iteration, we print the whitespace character, increment col by the corresponding amount, and read in the next character.

    #include <stdio.h>
    
    #define MAXLEN 80   /* column limit for output lines */
    #define TABWIDTH 4  /* indent size */
    
    main()
    {
        int c;
        int col;                /* current column */
        char word[MAXLEN + 1];  /* current word */
        int len;                /* current word length */
    
        col = 0;
        while ((c = getchar()) != EOF) {
            for (len = 0; c != ' ' && c != '\t' && c != '\n' && len < MAXLEN;
                ++len) {
                word[len] = c;
                c = getchar();
            }
            word[len] = '\0';
            if (col + len > MAXLEN) {
                putchar('\n');
                col = 0;
            }
            printf("%s", word);
            col = col + len;
            for (len = 0; (c == ' ' || c == '\t') && col <= MAXLEN; ++len) {
                if (c == ' ')
                    ++col;
                else if (c == '\t')
                    col = col + (TABWIDTH - col % TABWIDTH);
                putchar(c);
                c = getchar();
            }
        }
        return 0;
    }
Finally,
    if (column limit has been passed)
        read in remaining whitespace
        move to next line
directly translates to
    if (col > MAXLEN) {
        while (c == ' ' || c == '\t')
            c = getchar();
        putchar('\n');
        col = 0;
    }
so we can add that to our program.
    #include <stdio.h>
    
    #define MAXLEN 80   /* column limit for output lines */
    #define TABWIDTH 4  /* indent size */
    
    main()
    {
        int c;
        int col;                /* current column */
        char word[MAXLEN + 1];  /* current word */
        int len;                /* current word length */
    
        col = 0;
        while ((c = getchar()) != EOF) {
            for (len = 0; c != ' ' && c != '\t' && c != '\n' && len < MAXLEN;
                ++len) {
                word[len] = c;
                c = getchar();
            }
            word[len] = '\0';
            if (col + len > MAXLEN) {
                putchar('\n');
                col = 0;
            }
            printf("%s", word);
            col = col + len;
            for (len = 0; (c == ' ' || c == '\t') && col <= MAXLEN; ++len) {
                if (c == ' ')
                    ++col;
                else if (c == '\t')
                    col = col + (TABWIDTH - col % TABWIDTH);
                putchar(c);
                c = getchar();
            }
            if (col > MAXLEN) {
                /* read in remaining whitespace if column limit is reached */
                while (c == ' ' || c == '\t')
                    c = getchar();
                putchar('\n');
                col = 0;
            }
        }
        return 0;
    }

Let's test our program with the test cases at the start of this page. For now, we can change MAXLEN to ten. If we enter hello, world, the output we get is

    hello, 
    orld

with no newline after d. That does not match with what we want. Let us try to trace the issue. First, our program reads in hello,, and then prints it out. At this point, c is equal to a space. Then, we print the space and 'w' is assigned to c. Now, we see where the first problem is: we run getchar again at the start of the next iteration. To fix this, we need to remove the function call in the while-loop, and instead call getchar at the start of the program to get it running.

    #include <stdio.h>
    
    #define MAXLEN 10   /* column limit for output lines */
    #define TABWIDTH 4  /* indent size */
    
    main()
    {
        int c;
        int col;                /* current column */
        char word[MAXLEN + 1];  /* current word */
        int len;                /* current word length */
    
        col = 0;
        c = getchar();
        while ((c = getchar()) != EOF) {
        while (c != EOF) {
        ...
    }
Now, the output we get is
    hello,
    (waiting for input)

Now world does not get printed at all. Let us continue tracing our program from before. After world is read in, the next character is a newline and it is what gets carried over to the next iteration. This is not what we want. We want to print the newline, and have the next character—EOF in this case—carry over to the next iteration. Now we see why the input does not terminate: it never receives the EOF. We can solve this by also running the code inside the last if-statement when we come across a newline, in addition to reading in the next character. Using the same if-statement also means input with trailing whitespace will also cause no issues since the while-loop will skip past it.

    #include <stdio.h>
    
    #define MAXLEN 80   /* column limit for output lines */
    #define TABWIDTH 4  /* indent size */
    
    main()
    {
        int c;
        int col;                /* current column */
        char word[MAXLEN + 1];  /* current word */
        int len;                /* current word length */
    
        col = 0;
        c = getchar();
        while (c != EOF) {
            for (len = 0; c != ' ' && c != '\t' && c != '\n' && len < MAXLEN;
                ++len) {
                word[len] = c;
                c = getchar();
            }
            word[len] = '\0';
            if (col + len > MAXLEN) {
                putchar('\n');
                col = 0;
            }
            printf("%s", word);
            col = col + len;
            for (len = 0; (c == ' ' || c == '\t') && col <= MAXLEN; ++len) {
                if (c == ' ')
                    ++col;
                else if (c == '\t')
                    col = col + (TABWIDTH - col % TABWIDTH);
                putchar(c);
                c = getchar();
            }
            if (col > MAXLEN || c == '\n') {
                /* read in remaining whitespace if column limit is reached */
                while (c == ' ' || c == '\t')
                    c = getchar();
                putchar('\n');
                col = 0;
                if (c == '\n')
                    c = getchar();  /* c carries over to next iteration */
            }
        }
        return 0;
    }

Now, all three of our inputs work as intended. For future exercises, keep in mind that a good strategy to tackle them is to write the program in pseudocode first.