3.1. Lexical Elements¶
Programming languages, such as C, can be regarded in much the same way as linguistic languages, such as English, French, or Spanish. The lexical elements of a language are the words, punctuation, and grammar rules of the language.
3.1.1. Introduction¶
syntax
The rules for writing the language. Putting together the words and punctuation of a program in a legal and correct manner. Could also be called the grammar of the language.semantics
The branch of linguistics and logic concerned with meaning. To learn the semantics of a programming language is to learn how to articulate an algorithm using the language. Semantics of computer programming are generally harder to learn than syntax, but more easily transfered to other languages.
A major goal of this class is to teach semantics, which often takes considerable practice to learn. However, in this specific section, we are concerned primarily with syntax.
compiler
Checks the syntax and translates the source code into object code.tokens
The characters in a program which are grouped by the compiler. The compiler basically collects the characters of the program into tokens or syntactic units. There are six kinds of tokens:
- keywords
- identifiers
- constants
- string constants
- operators
- punctuators
lexical dissection
The process of collecting the characters of a program into syntactic units or tokens.
Note
A comment block is considered a syntactic unit, but it is not a token of C language because the compiler ignores comments.
3.1.2. Example¶
Consider this short section of code.
int main( void ) { int a, b, sum; a = 6; }
In this example, the compiler groups the characters into these tokens.
identifiers:
main a b sumoperators:
( ) =punctuators:
, { ;keywords:
int voidconstants:
6
3.1.3. Comments¶
Comments are text placed between the delimiters
/*
and*/
.They are ignored by the compiler.
/* * Comments blocks like this explain what the following code attempts * to do. These comments often come directly from the design * statement. */
/* comments like this explain a single line of code */
// This is also a valid comment on most systems, although it was not // a part of the original C standard. C++ introduce this type of // comment and most C compilers now recognize it as a coment. // The rest of the line after the // is considered as a comment.
3.1.4. C’s Tokens¶
3.1.4.1. keywords¶
These are words that are explicitly reserved and have strict meaning to the C compiler. They can not be redefined or redeclared. The meaning of these words is mostly related to the flow of control. There are 32 reserved words (a small number compared to some other languages).
- auto
- break
- case
- char
- const
- continue
- default
- do
- double
- else
- enum
- extern
- float
- for
- goto
- if
- int
- long
- register
- return
- short
- signed
- sizeof
- static
- struct
- switch
- typedef
- union
- unsigned
- void
- volatile
- while
3.1.4.2. Identifiers¶
Identifiers are names given to variables and functions.
- A sequence of letters, digits or underscore ‘_’ characters.
- Can not begin with a digit.
- Case sensitive. So Total, total and TOTAL are all different.
- Keywords can not be used as identifiers.
- Special characters are not allowed ( #, -, +, % ).
- Identifiers should improve the readability of the program by having meaningful names.
- Try to avoid using identifiers that begin with an underscore character.
3.1.4.3. Constants¶
Either integer or decimal numbers. If a variable of type int is assigned to a constant, it can accept an integer constant.
-
string constants
A sequence of 1 to 256 characters. They are enclosed in double quotes. Character constants are always one digit in length and are different than a character string which is one digit long.
"/* ---- */" \\--- a string constant. /* " --- " */ \\--- a comment. "\\ \" \t \" " \\--- \ " (tab) "
3.1.4.4. Operators¶
Binary arithmetic operators are:
+ - / % *
Unary arithmetic operators are:++ --
The assignment operators are:= += -= *= /= %=
The Relational and logical operator discussed in topic 1 are also operators. What does the following do?
int a=1, b=2, c=3, d=4; a = (b = 2) + (c = 3); --- a = 5; b = 2; c = 3; a * b/c; --- (a*b)/c; = 0 a * b % c + 1; --- ((a*b) % c) + 1; = 3 ++a * b - c--; --- ((++a) * b) - (c--); = 1 7 - -b * ++d; --- 7 - ((-b) * (++d)); = 17 7 - -b * d++; --- 7 - ((-b) * (d++)); = 15
3.1.4.5. Punctuators¶
Special characters , ; ( ) They are used to satisfy the rules of syntax.
3.1.5. Preprocessor¶
The preprocessor directives like #include
and #define
are
strictly speaking not syntactic units because the compiler never
sees them. The preprocessor replaces them with another file in the
case of #include
and with constants in the case of
#define
.