CS334 Lecture 5 Read carefully Chapter 4 in text as we will not cover all aspects in class. Just skim lightly over sections 4.6 & 4.7.

Major elements of programming languages:
Syntax, Semantics, Pragmatics

Syntax: Readable, writable, easy to translate, unambiguous, ...

Formal Grammars: Backus & Naur, Chomsky

First used in ALGOL 60 Report - formal description
Generative description of language.
Language is set of strings. (E.g. all legal ALGOL 60 programs)

Example

<expression>  ::= <term> | <expression> <addop> <term>
<term>        ::= <factor> | <term> <multop><factor>
<factor>      ::= <identifier> | <literal> | (<expression>)
<identifier>  ::= a | b | c | d
<literal>     ::= <digit> | <digit> <literal>
<digit>       ::= 0 | 1 | 2 | ... | 9
<addop>       ::= + | - | or
<multop>      ::= * | / | div | mod | and

Generates: a + b * c + b - parse tree

Grammar gives precedence and which direction op's associate

Extended BNF handy:

item enclosed in square brackets is optional

<conditional> ::= if <expression> then <statement> [else <statement>]

item enclosed in curly brackets means zero or more occurrences
```
<literal>::= <digit> { <digit> }
```

Syntax diagrams - alternative to BNF,
Syntax diagrams are never recursive, use "loops" instead.

Problems with Ambiguity

Suppose given grammar:

	<statement> ::= <unconditional> | <conditional>
	<unconditional> ::= <assignment> | <for loop> | 

                              begin {<statement>} end

	<conditional> ::= if <expression> then <statement> |

                              if <expression> then <statement> else <statement>

How do you parse: if exp1 then if exp2 then stat1 else stat2 ?

Could be

if exp1 then (if exp2 then stat1 else stat2) or
if exp1 then (if exp2 then stat1) else stat2

I.e. What happens if exp1 is true and exp2 is false?

Ambiguous

Pascal rule: else attached to nearest then

To get second form, write:

	if exp1 then 
		begin 
			if exp2 then stat1 
		end 
	else 
		stat2

C has similar ambiguity.

MODULA-2 and ALGOL 68 require "end" to terminate conditional:

if exp1 then if exp2 then stat1 else stat2 end end
if exp1 then if exp2 then stat1 end else stat2 end

(Algol 68 actually uses fi instead of end)

Why isn't it a problem in ML?

Ambiguity in general is undecidable

Chomsky developed mathematical theory of programming languages:

type 0: recursively enumerable
type 1: context-sensitive
type 2: context-free
type 3: regular

BNF (or syntax diagrams) = context-free can be recognized by push-down automata

Not all aspects of programming language syntax are context-free.

Examples include declaration before use and Go to statement.

Formal description of syntax allows:

programmer to generate syntactically correct programs
parser to recognize syntactically correct programs

Parser-generators (also lexical analysis)- LEX, YACC (available in C and many other languages, e.g., ML), Cornell Program Synthesizer Generator

Abstraction

Programming language creates a virtual machine for programmer

Dijkstra: Originally we were obligated to write programs so that a computer could execute them. Now we write the programs and the computer has the obligation to understand and execute them.

Progress in programming language design marked by increasing support for abstraction.

Computer at lowest level is set of charged particles racing through wires w/ memory locations set to one and off - very hard to deal with.

In computer organization look at higher level of abstraction: interpret sequences of on/off as data (reals, integers, char's, etc) and as instructions.

Computer looks at current instruction and contents of memory, then does something to another chunk of memory (incl. registers, accumulators, program counter, etc.)

When write Pascal (or other language) program - work with different virtual machine.

Integers, reals, arrays, records, etc. w/ associated operations

Language creates the illusion of more sophisticated virtual machine.

Pure translators

Assembler:

Compiler:

Preprocessor:

Execution of program w/ compiler:

Interpreter:

We will speak of virtual machine defined by a language implementation.

Machine language of virtual machine is set of instructions supported by translator for language.

Layers of virtual machines on Mac: Bare 680x0 chip, OpSys virtual machine, MacPascal (or Lightspeed Pascal) machine, application program's virtual machine.

We will describe language in terms of virtual machine

Slight problem:

Different implementors may have different conceptions of virtual machine
Different computers may provide different facilities and operations
Implementors may make different choices as to how to simulate elts of virtual computer

May lead to different implementations of same language - even on same machine.

Problem : How can you ensure different implementations result in same semantics?

Sometimes virtual machines made explicit:

Pascal P-code and P-machine,
Modula-2 M-machine and M-code,
Java virtual machine.

Compilers and Interpreters

While exist few special purpose chips which execute high-level languages (LISP machine) most have to be translated into machine language.

Two extreme solutions:

Pure interpreter: Simulate virtual machine (our approach to run-time semantics)

	REPEAT
		Get next statement
		Determine action(s) to be executed
		Call routine to perform action
	UNTIL done

Pure Compiler:

Translate all units of program into object code (say, in machine language)
Link into single relocatable machine code
Load into memory

Comparison of Compilation vs Interpretation

compiler	interpreter
Only translate each statement once.	Translate only if executed.
Speed of execution.	Error messages tied to source. More supportive environment.
Only object code in memory when executing. May take more space because of expansion.	Must have interp. in memory when executing (but source may be more compact)

Rarely have pure compiler or interpreter.

Typically compile source into more easy to interpret form.
Ex. Remove white space & comments, build symbol table, or parse each line and store in more compact form (e.g. tree)

Can go farther and compile into intermediate code (e.g., P-code) and then interpret.

In FORTRAN, Format statements (I/O) are always interpreted.

Overview of structure of a compiler

Two primary phases:

Analysis:: Break into lexical items, build parse tree, generate simple intermediate code (type checking)
Synthesis:: Optimization (look at instructions in context), code generation, linking and loading.

Lexical analysis:

Break source program into lexical items, e.g. identifiers, operation symbols, key words, punctuation, comments, etc. Enter id's into symbol table. Convert lexical items into internal form - often pair of kind of item and actual item (for id, symbol table reference)

Syntactical analysis:

Use formal grammar to parse program and build tree (either explicitly or implicitly through stack)

Semantic analysis:

Update symbol table (e.g., by adding type info). Insert implicit info (e.g., resolve overloaded ops's - like "+"), error detection - type-checking, jumps into loops, etc. Traverse tree generating intermediate code

Optimization:

Catch adjacent store-reload pairs, eval common sub-expressions, move static code out of loops, allocate registers, optimize array accesses, etc.
Example:

		for i := .. do ...
			for j:= 1 to n do
				A[i,j] := ....

Code generation:

Generate real assembly or machine code (now sometimes generate C code)

Linking & loading:

Get object code for all pieces of program (incl separately compiled modules, libraries, etc.). Resolve all external references - get locations relative to beginning location of program. Load program into memory at some start location - do all addressing relative to base address.

Symbol table: Contains all identifier names, kind of id (vble, array name, proc name, formal parameter), type of value, where visible, etc. Used to check for errors and generate code. Often thrown away at end of compilation, but may be held for error reporting or if names generated dynamically.

Like to have easily portable compilers front-end vs back-end

Front-end generate intermediate code and do some peep-hole optimization

Back-end generate real code and do more optimization.

Semantics

Meaning of a program (once know it is syntactically correct).

Operational semantics for most of course.
How would an interpreter for the language work on virtual machine?

Work with virtual (or abstract) machine when discuss semantics of programming language constructs.

Represent Code and Data portions of memory
Has instruction pointer, ip , incremented by one after each command if not explicitly modified by the instruction.

Run program by loading it into memory and initializing ip to beginning of program

Official language definitions: Standardize syntax and semantics - promote portability.

All compilers should accept the same programs (i.e. compile w/o errors)
All legal programs should give the same answers (modulo round-off errors, etc)
Designed for compiler writers and as programmer reference.

Often better to standardize after experience. -- Ada standardized before a real implementation.

Common Lisp, Scheme, ML now standardized, Fortran '9x.

Good formal description of syntax, semantics still hard.

Backus, in Algol 60 Report promised formal semantics.

Said forthcoming in few months - still waiting.
Years after introduction still problems and ambiguities remaining.

Major elements of programming languages: Syntax, Semantics, Pragmatics