We start with the following code to copy inputted text to the terminal:
#include <stdio.h>
main()
{
int c;
while ((c = getchar()) != EOF)
putchar(c);
return 0;
}
The token /*
marks the start of a comment and the token
*/
indicates the end of a comment. We create the variable
state which will be assigned the symbolic constants
IN
and OUT
to keep track of whether we are
"inside" or "outside" of a comment. When we are inside a comment, we
skip past all the characters until we reach the end, at which point we
continue copying characters to the terminal.
To implement this, when we come across a slash, we check if the next
character is an asterisk; if it is not, we print the slash and continue
copying characters. Otherwise, we set state to
IN
and enter a while
-loop that runs until
state changes to OUT
. This will occur when we
read in an asterisk followed by a slash. Once we exit the
while
-loop, we read in the next character.
#include <stdio.h>
#define IN 1 /* inside a comment */
#define OUT 0 /* outside a comment */
main()
{
int c;
int state; /* whether inside or outside of a comment */
state = 0;
while ((c = getchar()) != EOF) {
if (c == '/') {
c = getchar();
if (c == '*') {
state = IN;
while (state == IN) {
c = getchar();
if (c == '*') {
c = getchar();
if (c == '/')
state = OUT;
}
}
c = getchar();
} else
putchar('/');
}
putchar(c);
}
return 0;
}
There is one more thing we have not accounted for: string constants.
For example, if our program reads in "/*", it will treat the
/*
as the start of a comment despite it being part of a
character string. To avoid this, when we come across a double quote, we
print everything up until the closing quote, ignoring what is inside
the character string. We also have to be wary of escape sequences like
\"
, so when we come across a backslash, we print the
second character of the escape sequence and assign the character
afterward to c, essentially ignoring the escape sequence.
Finally, when we read in and print a single quote, we subsequently read
in and print the rest of the literal (which is usually two more
characters, but three more when the literal is an escape sequence)
before moving on to the next iteration of the loop. This is to avoid
issues with the character literals '"'
and
'\"'
.
#include <stdio.h>
#define IN 1 /* inside a comment */
#define OUT 0 /* outside a comment */
main()
{
int c;
int state; /* whether inside or outside of a comment */
state = OUT;
while ((c = getchar()) != EOF) {
if (c == '/') {
c = getchar();
if (c == '*') {
state = IN;
while (state == IN) {
c = getchar();
if (c == '*') {
c = getchar();
if (c == '/')
state = OUT;
}
}
} else
putchar('/');
}
if (c == '"') {
putchar(c);
while ((c = getchar()) != '"') {
putchar(c);
/* ignore second character of escape sequence */
if (c == '\\')
putchar(getchar());
}
}
putchar(c);
if (c == '\'') { /* avoid treating '"' as start of string */
if ((c = getchar()) == '\\') {
putchar(c);
c = getchar();
}
c = getchar();
putchar(c);
putchar(getchar());
}
}
return 0;
}
Note: the statement putchar(getchar())
might look
strange at first, but it is very much valid because
getchar
returns—and is therefore equal to the next
character in the input stream. We are simply passing that value as an
argument for putchar
.
Note: notice how we do not use an
"else
-if
" statement when checking for
double-quotes. This is by design. To understand why, try testing the
input /* comment */"/* string" with and without the
else
-clause. Do you see why the behavior changes?