Keeping in mind that a valid C program should have a closing sequence
for every opening sequence (e.g. {...}
,
"..."
, /* ... */
, etc), we know that a valid
program must have an equal number of opening and closing sequences.
Therefore, every time we come across a {
, for example, we
increment the braces counter, and when we come across its
closing sequence, }
, we decrement the braces
counter. We do this for each type of bracket. If the program is valid,
all of these counters should be equal to zero in the end, and at no
point should they be less than zero. For the two types of quotes, we
use variables that switch between the constants IN
and
OUT
. When the program stops reading input, these variables
should be equal to OUT
if the program is valid. We do a
similar thing with comments, except for them we need to read two
characters using nested if
-statements. Finally, we also
make sure to skip past escape sequences.
#include <stdio.h>
#define IN 1 /* inside a comment or string */
#define OUT 0 /* outside a comment or string */
main()
{
int c;
int parens, brackets, braces; /* counter for unbalanced brackets */
int character, string, comment; /* boolean values set to IN or OUT */
parens = brackets = braces = 0;
character = string = comment = OUT;
while ((c = getchar()) != EOF) {
if (c == '\\') { /* skip escape sequences */
getchar();
c = getchar();
}
if (c == '/') {
if ((c = getchar()) == '*')
comment = IN;
} else if (c == '*')
if ((c = getchar()) == '/')
comment = OUT;
if (c == '(')
++parens;
else if (c == ')')
--parens;
else if (c == '[')
++brackets;
else if (c == ']')
--brackets;
else if (c == '{')
++braces;
else if (c == '}')
--braces;
else if (c == '\'') {
if (character == IN)
character = OUT;
else
character = IN;
} else if (c == '"') {
if (string == IN)
string = OUT;
else
string = IN;
}
if (parens < 0) {
printf("Error: unbalanced parentheses\n");
parens = 0; /* avoid printing message for every character read */
} else if (brackets < 0) {
printf("Error: unbalanced brackets\n");
brackets = 0;
} else if (braces < 0) {
printf("Error: unbalanced braces\n");
braces = 0;
}
}
if (parens != 0)
printf("Error: unbalanced parentheses\n");
if (brackets != 0)
printf("Error: unbalanced brackets\n");
if (braces != 0)
printf("Error: unbalanced braces\n");
if (character == IN)
printf("Error: single quote expected\n");
if (string == IN)
printf("Error: double quote expected\n");
if (comment == IN)
printf("Error: '*/' expected\n");
return 0;
}
Note: we check for comments first because if the program
contained a slash followed by a character that is not an asterisk, or
vice versa, the second character still needs to go through the series
of if
-statements.
Note: notice how the if
-statement that checks for a
slash has braces after it. They are necessary because otherwise, the
else
-clause afterward would correspond to the nested
if
-statement instead.
However, as you might have realized already, this approach is rather inaccurate. For example, the input {(}) is considered valid, but '}' is not. Let us go back to the drawing board and approach this problem in a different way.
The most intuitive solution to this exercise involves using recursion.
While it has not yet been explicitly covered in the book, the idea is
fairly easy to grasp. Recursion is what happens when you call a
function within its own body. One might think this would cause the
compiler to throw an error, but instead, the program will enter an
infinite loop. In order to avoid this, the function call needs to be
wrapped in an if
-statement, so that when the condition is
no longer true, the loop breaks. To illustrate this, the code
int i;
for (i = 0; i < 3; ++i)
printf("%d\n", i);
would be written as follows when using a recursive approach:
void recursive(int i)
{
if (i < 3) {
++i;
printf("%d\n", i);
recursive(i);
}
}
recursive(0);
Note: the order of the statements in recursion is of utmost
importance. For example, if ++i
came after the
recursive
function call, it would never be reached.
What if we moved the recursive
call before the
printf
call?
void recursive(int i)
{
if (i < 3) {
++i;
recursive(i);
printf("%d\n", i);
}
}
recursive(0);
Then, we get the output
3 2 1
because recursive
calls itself until i is equal
to three, at which point it executes the printf
statement.
Afterward, the program goes back to the second call of
recursive
(when i was equal to two) and prints
i, and then the program goes to the first call of
recursive
(when i was equal to one) and prints
i. In essence, it runs as follows, except whenever
i increments, it only applies to that instance of the
function call:
void recursive(int i)
{
if (i < 3) {
++i;
if (i < 3) {
++i;
if (i < 3) {
++i;
if (i < 3) { /* the condition is not met */
++i;
recursive(i);
printf("%d\n", i);
}
printf("%d\n", i);
}
/* go back to second call of recursive */
printf("%d\n", i);
}
/* go back to first call of recursive */
printf("%d\n", i);
}
}
recursive(0);
Let us get back to the problem at hand. The first thing we do is create
a function called read_block
, which reads in a code block
until its closing character or EOF
is reached and returns
the symbolic constant YES
if the block is valid. If we
disregard strings and comments for now, a block is valid if it meets
three requirements:
- Its closing sequence is found
- There are no extra closing brackets
- All the blocks within it are also valid
The first two requirements are easy to implement:
#define YES 1
#define NO 0
/* read_block: reads a block until its closing sequence or EOF is reached;
returns YES block is valid, NO if block is invalid */
int read_block(int end)
{
int c;
while ((c = getchar()) != EOF) {
if (c == end)
return YES;
/* check for closing braces without a corresponding opening brace */
else if (c == ')' || c == ']' || c == '}')
return NO;
}
return NO;
}
The first block we read in will have the closing sequence
EOF
; in other words, it is the entire program. Then, every
time we come across an opening bracket, we call read_block
recursively and assign its return value to a variable called
valid, which keeps track of whether the current block is
valid. Whenever a block is no longer valid, we exit the
while
-loop and return NO
. This way, if any
instance of read_block
ever returns NO
, all
its parent blocks are rendered invalid too.
#include <stdio.h>
#define YES 1
#define NO 0
int read_block(int end);
main()
{
int valid; /* whether the C program is valid or not */
if ((valid = read_block(EOF)) == NO)
printf("syntax error!\n");
printf("syntax error!\n");
}
/* read_block: reads a block until its closing sequence or EOF is reached;
returns YES block is valid, NO if block is invalid */
int read_block(int end)
{
int c;
int valid; /* whether current block is valid */
valid = YES;
while ((c = getchar()) != EOF && valid == YES) {
if (c == end)
return YES;
/* check for closing braces without a corresponding opening brace */
else if (c == ')' || c == ']' || c == '}')
return NO;
else if (c == '(')
valid = read_block(')');
else if (c == '[')
valid = read_block(']');
else if (c == '{')
valid = read_block('}');
}
if (c != EOF)
return NO;
else
return valid;
}
Note: we normally exit the while
-loop when a block
is not valid. The one exception is the caller in main
,
which will also exit the loop when the entire program is finished being
read. This is the reason for the if
-statement at the
bottom.
Most of the hard work has now been done and adding support for strings
and comments is relatively easy. We create the two functions
read_cmt
and read_str
which when called read
in all the characters up until their closing sequences. In the case of
read_str
, we also pass in the kind of quote as an argument
and make sure to ignore escape sequences. Both functions return
YES
if their closing sequence is found, and
NO
if they read in an EOF
. Similar to the
brackets, the functions get called when their opening sequences are
found in read_block
, and their return value is assigned to
valid.
#include <stdio.h>
#define YES 1
#define NO 0
int read_block(int end);
int read_str(char quote);
int read_cmt(void);
main()
{
int valid; /* whether the C program is valid or not */
if ((valid = read_block(EOF)) == NO)
printf("syntax error!\n");
return 0;
}
/* read_block: reads a block until its closing sequence or EOF is reached;
returns YES block is valid, NO if block is invalid */
int read_block(int end)
{
int c;
int valid; /* whether current block is valid */
valid = YES;
while ((c = getchar()) != EOF && valid == YES) {
if (c == '/') {
c = getchar();
if (c == '*')
if ((valid = read_cmt()) == YES)
c = getchar();
}
if (c == end)
return YES;
/* check for closing braces without a corresponding opening brace */
else if (c == ')' || c == ']' || c == '}')
return NO;
else if (c == '(')
valid = read_block(')');
else if (c == '[')
valid = read_block(']');
else if (c == '{')
valid = read_block('}');
else if (c == '\'')
valid = read_str('\'');
else if (c == '"')
valid = read_str('"');
}
if (c != EOF)
return NO;
else
return valid;
}
/* read_str: reads a quoted string and returns YES if it terminates */
int read_str(char quote)
{
int c;
while ((c = getchar()) != EOF) {
if (c == quote)
return YES;
/* ignore second character in escape sequence */
if (c == '\\')
getchar();
}
return NO; /* end quote missing */
}
/* read_cmt: reads a comment and returns YES if it terminates */
int read_cmt(void)
{
int c;
while ((c = getchar()) != EOF)
if (c == '*')
if ((c = getchar()) == '/')
return YES;
return NO; /* comment never terminates */
}