Our first instinct might be to create a variable that keeps track of which column we are on, and print a newline if it exceeds a certain constant.
#include <stdio.h>
#define MAXLEN 80 /* column limit for output lines */
#define TABWIDTH 4 /* indent size */
main()
{
int c, i;
int col; /* current column */
col = 0;
while ((c = getchar()) != EOF) {
if (col > MAXLEN) {
putchar('\n');
col = 0;
}
putchar(c);
++col;
if (c == '\t')
col = col + (TABWIDTH - col % TABWIDTH);
else if (c == '\n')
col = 0;
}
return 0;
}
However, this approach does not work as intended as it also places line
breaks in between words.
In order for our program to function correctly, it needs to be able to recognize words. A technique that comes in handy when developing programs of higher complexity is to write the steps of a program in pseudocode. Let us try doing that with our program.
while (character is not end-of-file indicator)
read in text until whitespace or until equal to column limit
if (printing the text passes column limit)
move to next line
print text
while (next character is blank or tab and column limit has not been passed)
read in and print character
if (column limit has been passed)
read in remaining whitespace
move to next line
It may be a bit unclear as to how we want our program to work, so let us look at a few examples. For now, assume that the column limit is ten.
> hello, world hello, world
We first read in hello, and check if printing it means crossing the column limit. This is not the case, so we just print the text. Next, we print the space. Finally, we read in world. If we were to print it out, we would pass the column limit, so we move to the next line before printing world.
Let us see how we will deal with words longer than the column limit.
> abcdefghijklm nopqrstuvwxyz abcdefghij klm nopqrstuvw xyz
To begin, we only read in abcdefghijk, because at that point, the length of the text is equal to the column limit. Printing it out does not mean we pass the column limit, so we just print the string. The next character is not a whitespace, so the entire bottom half of the loop gets skipped. Then, we read in klm. Printing it out would mean crossing the column limit, so we move to the next line and then print it. We then print out the whitespace. Next, we read in nopqrstuvw. Printing it would cross the column limit, so like before, we move to the next line prior to printing it out. There is no whitespace afterward so the next part gets skipped. Finally, we read in xyz. Once again, printing it would mean passing the column limit, so we move to the next line first.
What about when we have long sequences of whitespace?
> Lorem \ ipsum dolor Lorem ipsum dolor
First, we read in and print Lorem. Then we keep reading in and printing the spaces until we pass the column limit, at which point we exit the loop. Then, we skip past the remaining whitespace. Next, we read in ipsum. Printing it would mean crossing the column limit, so we print a newline first. We then read in as much of the whitespace as we can, and then skip past the rest. Like before, printing dolor would mean passing the column limit, so we move to the next line first. Notice how all the examples so far leave trailing whitespace. We could try and avoid this, but that would only complicate things since we would need to read one word ahead in order to know if the whitespace will be trailing. Plus, we have already written a program that deals with that situation!
Now that we have the steps down, we need to convert our pseudocode into actual code. We start with the following template.
#include <stdio.h>
#define MAXLEN 80 /* column limit for output lines */
#define TABWIDTH 4 /* indent size */
main()
{
while ((c = getchar()) != EOF)
/* code */;
return 0;
}
In order to read in a word, we create the character array
word to store all the characters we read in up until a
whitespace character, or if the string's length exceeds
MAXLEN
. We can also create the integer variable
len to store a word's length, which increments for every
character read in. Finally, we must also not forget to null-terminate
our string after it is finished being read into!
#include <stdio.h>
#define MAXLEN 80 /* column limit for output lines */
#define TABWIDTH 4 /* indent size */
main()
{
int c;
char word[MAXLEN + 1]; /* current word */
int len; /* current word length */
while ((c = getchar()) != EOF) {
for (len = 0; c != ' ' && c != '\t' && c != '\n' && len < MAXLEN;
++len) {
word[len] = c;
c = getchar();
}
word[len] = '\0';
}
return 0;
}
Note: it is good practice to keep lines of your code shorter than a certain character limit to keep it readable on smaller displays (eighty characters is generally a good benchmark.) Remember, unlike some other languages, C is not particularly strict about whitespace and many constructs can span over multiple lines. After we finish writing this program, you can have it do this for you!
Note: the size of word is MAXLEN + 1
in order to allocate space for the additional null character.
Printing the word is fairly straightforward: we use
printf
and the %s
format specifier. We will
also need to create the integer variable col to keep track
of our column by incrementing it by len for every word.
Before printing a word, if col + len
is greater than
MAXLEN
, we will need to print a newline and reset
col to zero.
#include <stdio.h>
#define MAXLEN 80 /* column limit for output lines */
#define TABWIDTH 4 /* indent size */
main()
{
int c;
int col; /* current column */
char word[MAXLEN + 1]; /* current word */
int len; /* current word length */
col = 0;
while ((c = getchar()) != EOF) {
for (len = 0; c != ' ' && c != '\t' && c != '\n' && len < MAXLEN;
++len) {
word[len] = c;
c = getchar();
}
word[len] = '\0';
if (col + len > MAXLEN) {
putchar('\n');
col = 0;
}
printf("%s", word);
col = col + len;
}
return 0;
}
After the word is printed, we need to deal with the whitespace that comes after it. We want our program to be able to deal with sequences of whitespace, so we use a loop that continues to run as long as the next character is a blank or tab and col is less than or equal to the column limit. During every iteration, we print the whitespace character, increment col by the corresponding amount, and read in the next character.
#include <stdio.h>
#define MAXLEN 80 /* column limit for output lines */
#define TABWIDTH 4 /* indent size */
main()
{
int c;
int col; /* current column */
char word[MAXLEN + 1]; /* current word */
int len; /* current word length */
col = 0;
while ((c = getchar()) != EOF) {
for (len = 0; c != ' ' && c != '\t' && c != '\n' && len < MAXLEN;
++len) {
word[len] = c;
c = getchar();
}
word[len] = '\0';
if (col + len > MAXLEN) {
putchar('\n');
col = 0;
}
printf("%s", word);
col = col + len;
for (len = 0; (c == ' ' || c == '\t') && col <= MAXLEN; ++len) {
if (c == ' ')
++col;
else if (c == '\t')
col = col + (TABWIDTH - col % TABWIDTH);
putchar(c);
c = getchar();
}
}
return 0;
}
Finally,
if (column limit has been passed)
read in remaining whitespace
move to next line
directly translates to
if (col > MAXLEN) {
while (c == ' ' || c == '\t')
c = getchar();
putchar('\n');
col = 0;
}
so we can add that to our program.
#include <stdio.h>
#define MAXLEN 80 /* column limit for output lines */
#define TABWIDTH 4 /* indent size */
main()
{
int c;
int col; /* current column */
char word[MAXLEN + 1]; /* current word */
int len; /* current word length */
col = 0;
while ((c = getchar()) != EOF) {
for (len = 0; c != ' ' && c != '\t' && c != '\n' && len < MAXLEN;
++len) {
word[len] = c;
c = getchar();
}
word[len] = '\0';
if (col + len > MAXLEN) {
putchar('\n');
col = 0;
}
printf("%s", word);
col = col + len;
for (len = 0; (c == ' ' || c == '\t') && col <= MAXLEN; ++len) {
if (c == ' ')
++col;
else if (c == '\t')
col = col + (TABWIDTH - col % TABWIDTH);
putchar(c);
c = getchar();
}
if (col > MAXLEN) {
/* read in remaining whitespace if column limit is reached */
while (c == ' ' || c == '\t')
c = getchar();
putchar('\n');
col = 0;
}
}
return 0;
}
Let's test our program with the test cases at the start of this page.
For now, we can change MAXLEN
to ten. If we enter
hello, world, the output we get is
hello, orld
with no newline after d. That does not match with what we
want. Let us try to trace the issue. First, our program reads in
hello,, and then prints it out. At this point,
c is equal to a space. Then, we print the space and
'w'
is assigned to c. Now, we see
where the first problem is: we run getchar
again at the
start of the next iteration. To fix this, we need to remove the
function call in the while
-loop, and instead call
getchar
at the start of the program to get it running.
#include <stdio.h>
#define MAXLEN 10 /* column limit for output lines */
#define TABWIDTH 4 /* indent size */
main()
{
int c;
int col; /* current column */
char word[MAXLEN + 1]; /* current word */
int len; /* current word length */
col = 0;
c = getchar();
while ((c = getchar()) != EOF) {
while (c != EOF) {
...
}
Now, the output we get is
hello, (waiting for input)
Now world does not get printed at all. Let us continue
tracing our program from before. After world is read in, the
next character is a newline and it is what gets carried over to the
next iteration. This is not what we want. We want to print the newline,
and have the next character—EOF
in this case—carry over to
the next iteration. Now we see why the input does not terminate: it
never receives the EOF
. We can solve this by also running
the code inside the last if
-statement when we come across
a newline, in addition to reading in the next character. Using the same
if
-statement also means input with trailing whitespace
will also cause no issues since the while
-loop will skip
past it.
#include <stdio.h>
#define MAXLEN 80 /* column limit for output lines */
#define TABWIDTH 4 /* indent size */
main()
{
int c;
int col; /* current column */
char word[MAXLEN + 1]; /* current word */
int len; /* current word length */
col = 0;
c = getchar();
while (c != EOF) {
for (len = 0; c != ' ' && c != '\t' && c != '\n' && len < MAXLEN;
++len) {
word[len] = c;
c = getchar();
}
word[len] = '\0';
if (col + len > MAXLEN) {
putchar('\n');
col = 0;
}
printf("%s", word);
col = col + len;
for (len = 0; (c == ' ' || c == '\t') && col <= MAXLEN; ++len) {
if (c == ' ')
++col;
else if (c == '\t')
col = col + (TABWIDTH - col % TABWIDTH);
putchar(c);
c = getchar();
}
if (col > MAXLEN || c == '\n') {
/* read in remaining whitespace if column limit is reached */
while (c == ' ' || c == '\t')
c = getchar();
putchar('\n');
col = 0;
if (c == '\n')
c = getchar(); /* c carries over to next iteration */
}
}
return 0;
}
Now, all three of our inputs work as intended. For future exercises, keep in mind that a good strategy to tackle them is to write the program in pseudocode first.