Distributed in Tuesday class after break. Due on Friday promptly at beginning of class.
We cannot use a stack-based discipline for function calls in a functional languagebecause of difficulties in returning functions as values from other functions.
As a result, activation records must be allocated from a heap. Similar difficulties in passing around closures result in most object-oriented languages relying on heap allocated memory for objects. Because it is often not clear when memory can be safely freed, such languages usually rely on an automatic mechanism for recycling memory.
In this lecture we discuss methods for automatically managing and reclaiming free space. We being with the simpler task of managing free space.
When a request is made (e.g., via a "new" statement) for a block of memory, some strategy will be undertaken to allocate a block of memory of the desired size. For instance one might search on the list of free space for the first block which is at least as large as the block desired or one might look for a "best fit" in the sense of finding a block which is as small as possible, yet large enough to satisfy the need.
Whichever technique is chosen, only enough memory as is needed will be allocated, with the remainder of the block returned to the stack of available space.
Unless action is taken, the heap will eventually be composed of smaller and smaller blocks of memory. In order to prevent this, the operating system will normally attempt to merge or coalesce adjacent blocks of free memory. Thus whenever a block of memory is ready to be returned to the heap, the adjacent memory locations are examined to determine whether they are already on the heap of available space. If either (or both) are, then they are merged with the new block and put in the heap.
Even with coalescing, the heap can still become fragmented, with lots of small blocks of memory being used alternating with small blocks in the heap of available space.
This can be fixed by occasionally compacting memory by moving all blocks in use to one end of memory and then coalescing all the remaining space into one large block. This can be very complex since pointers in all data structures in use must be updated. (The Macintosh requires the use of handles in order to accomplish this!)
Aside from the manual reclamation of storage using operations like "dispose", there are two principle automatic mechanisms for reclaiming storage: reference counting and garbage collection. The first is an eager mechanism while the second is lazy.
Thus an assignment of pointers of the form p := q is executed then the block of memory that p originally points to has its reference count decreased by one, while that pointed to by q would have its count increased by one.
If the count on a block of memory is reduced to zero, it should be returned to the heap of available space. However, if it has pointers to other blocks of memory, those blocks should also have their reference counts reduced accordingly.
One drawback of this system is that each block of memory allocated must have sufficient space available to maintain its reference count. However a more serious problem is the existence of circular lists. Even if nothing else points to the circular list, each item of the list will have another item of the list pointing to it. Thus even if a circular list is inaccessible from the program, the reference counts of all of its will still be positive and it will not be reclaimed.
The mark and sweep algorithm for garbage collection starts with all objects accessible from the current environment (or symbol table), marks them and then does the same with all objects accessible from those, etc. After this phase the algorithm sweeps through all of memory, collecting those blocks which were not marked in the first phase (and unmarking the rest). Normal processing then resumes.
There are two problems with this technique. The first is the space necessary in order to hold the mark (though this can just be one bit). A more serious problem is that this algorithm requires two passes through memory: The first to mark and the second to collect. This can often take a significant amount of time (notice the delays in emacs, for example), making this sort of garbage collection unsuitable for real-time systems. This disadvantage has led to this method being abandoned by most practical systems (though still described in texts).
There have been several recent improvements in garbage collection algorithms. The first is sometimes known as a copying collector.
In this algorithm the memory is first divided into two halves, the working half and the free half. When memory is exhausted in the working half, live nodes are copied to free half of memory, and the roles of the two halves can be switched. Notice that the collector only looks at live cells, rather than all. Can be done incrementally, so that very little cost is paid at any one time (less than 50 instructions, probably). This tends to work well with a virtual memory system.
Another strategy is to use a generational collector in which only bother to garbage collect recently allocated blocks of memory. Older blocks are moved into stable storage and not collected as often. Studies have shown that most reclaimed garbage comes from more recently allocated blocks of memory.
In highly parallel architectures can have garbage collection take place in background, minimizing or eliminating delays.
State of computer corresponds to contents of memory and any external devices (I/O)
State sometimes called "store"
Note distinction between "state" and "environment". Environment is mapping between identifiers and values (including locations). State includes mapping between locations and values.
Values in store or memory are "storable" versus "denotable" (or "bindable")
Symbol table depends on declarations and scope - static
Environment tells where to find values - dynamic
State depends on previous computation - dynamic
If have compiler, use symbol table when generating code to determine meaning of all identifiers. At run-time, symbol table no longer needed (hard coded into compiled code), but state and environment change dynamically.
In interpreter, may have to keep track of symbol table, environment, and state at run-time. (In fact could avoid using state if there is no "aliasing" in the language.)
Order of evaluation can be important, especially if there are side-effects. Usually left-side evaluated first, then right-side.
A[f(j)] := j * f(j) + j --difficult to predict value if f has side effect of changing j
Two kinds of assignments:
1. assignment by copying and
2. assignment by sharing (often handy w/dynamic typing or OOL's)
Most statements are actually control structures for combining other expressions and statements:
FORTRAN started with very primitive control structures:
Very close to machine instructions
Why need repetition - can do it all with goto's?
"The static structure of a program should correspond in a simple way with the dynamic structure of the corresponding computation." Dijkstra letter to editor.
ALGOL 60 more elaborate:
BAROQUE, all expressions re-eval each time through loop:
3, 7, 11, 12, 13, 14, 15, 16, 8, 4, 2, 1, 2, 4, 8, 16, 32.
clear & efficient, construct jump table, optimize depending on size,
self-documenting.
Modula 2 improved by adding otherwise clause
iteration specification loop loop body end loop.where iteration specification can be:
(Note: loop vble implicitly declared - restricted scope)
Also provide exit when ...., syntactic sugar for if .. then exit
Can also exit from several depths of loops
Interesting theoretical result of Bohm and Jacopini (1966) that every flowchart can be programmed entirely in terms of sequential, if, and while commands.
With commands must keep track of store: locations -> storable values.
If expressions can have side-effects then must update rules to keep track of effect on store. Rewriting rules now have conclusions of form (e, rho, s) >> (v, s') where v is a storable value, rho is an environment (mapping from identifiers to denotable values - including locations), s is initial state (or store), and s' is state after evaluation of e.
(b, rho, s) >> (true, s') (e1, rho, s') >> (v, s'') ------------------------------------------------------ (if b then e1 else e2, rho, s) >> (v, s'')Thus if evaluation of b and e1 have side-effects on memory, then show up in "answer".
Axioms - no hypotheses!
(id, rho, s) >> (s(loc), s) where loc = rho(id)Note s[loc:=v+1] is state, s', identical to s except s'(loc) = v+1.(id++, rho, s) >> (v, s[loc:=v+1]) where loc = rho(id), v = s(loc)
(e1, rho, s) >> (v1, s') (e2, rho, s') >> (v2, s'') ------------------------------------------------------ (e1 + e2, rho, s) >> (v1 + v2, s'')When evaluate a command, "result" is a state only.
E.g.,
(e, rho, s) >> (v, s') ------------------------------ where rho(x) = loc (x := e, rho, s) >> s'[loc:=v] (C1, rho, s) >> s' (C2, rho, s') >> s'' ------------------------------------------ (C1; C2, rho, s) >> s'' (b, rho, s) >> (true, s') (C1, rho, s') >> s'' ------------------------------------------------ (if b then C1 else C2, rho, s) >> s''+ similar rule if b false
(b, rho, s) >> (false, s') --------------------------- (while b do C, rho, s) >> s' (b, rho, s) >> (true, s') (C, rho, s') >> s'' (while b do C, rho, s'') >> s''' ------------------------------------------------ (while b do C, rho, s) >> s'''
Notice how similar definition of semantics for
while E do Cis to
if E then begin C; while E do C end
for c : char in string_chars(s) do ...where have defined:
string_chars = iter (s : string) yields (char); index : Int := 1; limit : Int := string$size (s); while index <= limit do yield (string$fetch(s, index)); index := index + 1; end; end string_chars;
Behave like restricted type of co-routine.
Now available in Sather, C++, and Java.
Example: Using a stack, and try to pop element off of empty stack.
Clearly corresponds to mistake of some sort, but stack module doesn't know how to respond.
In older languages main way to handle is to print error message and halt or include boolean flag in every procedure telling if succeeded. Then must remember to check!
Another option is to pass in a procedure parameter which handles exceptions.
Call program robust if recovers from exceptional conditions, rather than just halting (or crashing).
Typical exceptions: Arithmetic or I/O faults (e.g., divide by 0, read int and get char, array or subrange bounds, etc.), failure of precondition, unpredictable conditions (read past end of file, end of printer page, etc.), tracing program flow during debugging.
When exception is raised, it must be handled or program will fail!
Attach exception handlers to subprogram body, package body, or block.
Ex:
begin C exception when excp_name1 => C' when excp_name2 => C'' when others => C' end
When raise an exception, where do you look for handler? In most languages, start with current block (or subprogram). If not there, force return from unit and raise same exception to routine which called current one, etc., up the dynamic links until find handler or get to outer level and fail. (Clu starts at calling routine.)
Semantics of raising and handling exceptions is dynamic rather than static!
Handler can attempt to handle exception, but give up and call another exception.
In Ada, return from the procedure (or unit) containing the handler - called termination model.
PL/I has resumption model - go back to re-execute statement where failure occurred (makes sense for read errors, for example) unless GOTO in handler code.
Eiffel (an OOL) uses variant of resumption model.
Exceptions in ML can pass parameter to exception handlers (like datatype defs). Otherwise very similar to Ada.
Example:
datatype 'a stack = EmptyStack | Push of 'a * ('a stack); exception empty; fun pop EmptyStack = raise empty | pop(Push(n,rest)) = rest; fun top EmptyStack = raise empty | top (Push(n,rest)) = n; fun IsEmpty EmptyStack = true | IsEmpty (Push(n,rest)) = false; exception nomatch; fun buildstack nil initstack = initstack | buildstack ("("::rest) initstack = buildstack rest (Push("(",initstack)) | buildstack (")"::rest) (Push("(",bottom)) = bottom | buildstack (")"::rest) initstack = raise nomatch | buildstack (fst::rest) initstack = buildstack rest initstack; fun balanced string = (buildstack (explode string) = EmptyStack) handle nomatch => false;Notice awkwardness in syntax. Need to put parentheses around the expression to which the handler is associated!
Some would argue shouldn't use exception nomatch since really not unexpected situation. Just a way of introducing goto's in code!