3.1. Lexical Elements

Programming languages, such as C, can be regarded in much the same way as linguistic languages, such as English, French, or Spanish. The lexical elements of a language are the words, punctuation, and grammar rules of the language.

3.1.1. Introduction

syntax

The rules for writing the language. Putting together the words and punctuation of a program in a legal and correct manner. Could also be called the grammar of the language.

semantics

The branch of linguistics and logic concerned with meaning. To learn the semantics of a programming language is to learn how to articulate an algorithm using the language. Semantics of computer programming are generally harder to learn than syntax, but more easily transfered to other languages.

A major goal of this class is to teach semantics, which often takes considerable practice to learn. However, in this specific section, we are concerned primarily with syntax.

compiler

Checks the syntax and translates the source code into object code.

tokens

The characters in a program which are grouped by the compiler. The compiler basically collects the characters of the program into tokens or syntactic units. There are six kinds of tokens:

  1. keywords
  2. identifiers
  3. constants
  4. string constants
  5. operators
  6. punctuators

lexical dissection

The process of collecting the characters of a program into syntactic units or tokens.

Note

A comment block is considered a syntactic unit, but it is not a token of C language because the compiler ignores comments.

3.1.2. Example

Consider this short section of code.

int main( void )
{
   int a, b, sum;

   a = 6;
}

In this example, the compiler groups the characters into these tokens.

identifiers:

main a b sum

operators:

( ) =

punctuators:

, { ;

keywords:

int void

constants:

6

3.1.3. Comments

  1. Comments are text placed between the delimiters /* and */.

  2. They are ignored by the compiler.

  3. /*
     * Comments blocks like this explain what the following code attempts
     * to do.  These comments often come directly from the design
     * statement.
     */
    
  4. /* comments like this explain a single line of code */

  5. // This is also a valid comment on most systems, although it was not
    // a part of the original C standard.  C++ introduce this type of
    // comment and most C compilers now recognize it as a coment.
    // The rest of the line after the // is considered as a comment.
    

3.1.4. C’s Tokens

3.1.4.1. keywords

These are words that are explicitly reserved and have strict meaning to the C compiler. They can not be redefined or redeclared. The meaning of these words is mostly related to the flow of control. There are 32 reserved words (a small number compared to some other languages).

  • auto
  • break
  • case
  • char
  • const
  • continue
  • default
  • do
  • double
  • else
  • enum
  • extern
  • float
  • for
  • goto
  • if
  • int
  • long
  • register
  • return
  • short
  • signed
  • sizeof
  • static
  • struct
  • switch
  • typedef
  • union
  • unsigned
  • void
  • volatile
  • while

3.1.4.2. Identifiers

Identifiers are names given to variables and functions.

  1. A sequence of letters, digits or underscore ‘_’ characters.
  2. Can not begin with a digit.
  3. Case sensitive. So Total, total and TOTAL are all different.
  4. Keywords can not be used as identifiers.
  5. Special characters are not allowed ( #, -, +, % ).
  6. Identifiers should improve the readability of the program by having meaningful names.
  7. Try to avoid using identifiers that begin with an underscore character.

3.1.4.3. Constants

Either integer or decimal numbers. If a variable of type int is assigned to a constant, it can accept an integer constant.

string constants

A sequence of 1 to 256 characters. They are enclosed in double quotes. Character constants are always one digit in length and are different than a character string which is one digit long.

"/* ---- */"   \\--- a string constant.
/* " --- " */  \\--- a comment.
"\\ \" \t \" " \\--- \ " (tab) "

3.1.4.4. Operators

Binary arithmetic operators are: + - / % * Unary arithmetic operators are: ++ -- The assignment operators are: = += -= *= /= %=

The Relational and logical operator discussed in topic 1 are also operators. What does the following do?

int a=1, b=2, c=3, d=4;
a = (b = 2) + (c = 3); --- a = 5; b = 2; c = 3;
a * b/c;               --- (a*b)/c;              = 0
a * b % c + 1;         --- ((a*b) % c) + 1;      = 3
++a * b - c--;         --- ((++a) * b) - (c--);  = 1
7 - -b * ++d;          --- 7 - ((-b) * (++d));   = 17
7 - -b * d++;          --- 7 - ((-b) * (d++));   = 15

3.1.4.5. Punctuators

Special characters , ; ( ) They are used to satisfy the rules of syntax.

3.1.5. Preprocessor

The preprocessor directives like #include and #define are strictly speaking not syntactic units because the compiler never sees them. The preprocessor replaces them with another file in the case of #include and with constants in the case of #define.