CS 334
Programming Languages
Spring 2000

Lecture 11


Exam: Chapters 1-7 (except 7.4.2), 10 and extra material covered in class and homework. Sections 4.6, 10.6-7 will be covered only lightly.

Distributed in Tuesday class after break. Due on Friday promptly at beginning of class.

Dynamic Language - Dynamic Scoping & Typing

Implementation of dynamic types:

Keep type descriptor of each variable available at run-time

Since type can change dynamically, so can size and contents of descriptor (e.g. # dim's and bounds).

Activation record contains ptr to descriptor which contains ptr to vble.

All accesses provide for run-time check on type - slow.

Implementation of dynamic scope:

Static link now unnecessary, find closest activation record w/name.

Name of vble must be stored in activation record (w/ ptr to descriptor).

Example:

	program A
		var B:integer
		procedure C;
			begin
				..... B ....
			end;
		procedure D(B:integer);
			begin
				C;
			end;
		begin
			D(12);
		end.
Trace stack during execution.

Costs: More space, slower access.

Gains: Flexibility.

Possible to implement by keeping table of loc'ns of active variables.

Overhead when entering and leaving procedures.

DYNAMIC MEMORY MANAGEMENT

(See section 10.8 in text)

We cannot use a stack-based discipline for function calls in a functional languagebecause of difficulties in returning functions as values from other functions.

As a result, activation records must be allocated from a heap. Similar difficulties in passing around closures result in most object-oriented languages relying on heap allocated memory for objects. Because it is often not clear when memory can be safely freed, such languages usually rely on an automatic mechanism for recycling memory.

In this lecture we discuss methods for automatically managing and reclaiming free space. We being with the simpler task of managing free space.

Memory management in the heap

A heap is usually maintained as a list or stack of blocks of memory. Initially all of the free space is maintained as one large block, but requests (whether explicit or implicit) for storage and the subsequent recycling of blocks of memory will eventually result in the heap being broken down into smaller pieces.

When a request is made (e.g., via a "new" statement) for a block of memory, some strategy will be undertaken to allocate a block of memory of the desired size. For instance one might search on the list of free space for the first block which is at least as large as the block desired or one might look for a "best fit" in the sense of finding a block which is as small as possible, yet large enough to satisfy the need.

Whichever technique is chosen, only enough memory as is needed will be allocated, with the remainder of the block returned to the stack of available space.

Unless action is taken, the heap will eventually be composed of smaller and smaller blocks of memory. In order to prevent this, the operating system will normally attempt to merge or coalesce adjacent blocks of free memory. Thus whenever a block of memory is ready to be returned to the heap, the adjacent memory locations are examined to determine whether they are already on the heap of available space. If either (or both) are, then they are merged with the new block and put in the heap.

Even with coalescing, the heap can still become fragmented, with lots of small blocks of memory being used alternating with small blocks in the heap of available space.

This can be fixed by occasionally compacting memory by moving all blocks in use to one end of memory and then coalescing all the remaining space into one large block. This can be very complex since pointers in all data structures in use must be updated. (The Macintosh requires the use of handles in order to accomplish this!)

Reclamation of free storage

Aside from the manual reclamation of storage using operations like "dispose", there are two principle automatic mechanisms for reclaiming storage: reference counting and garbage collection. The first is an eager mechanism while the second is lazy.

Reference Counting

Reference counting is conceptually simpler than garbage collection, but often turns out to be less efficient overall. The idea behind reference counting is that each block of memory is required to reserve space to count the number of separate pointers to it.

Thus an assignment of pointers of the form p := q is executed then the block of memory that p originally points to has its reference count decreased by one, while that pointed to by q would have its count increased by one.

If the count on a block of memory is reduced to zero, it should be returned to the heap of available space. However, if it has pointers to other blocks of memory, those blocks should also have their reference counts reduced accordingly.

One drawback of this system is that each block of memory allocated must have sufficient space available to maintain its reference count. However a more serious problem is the existence of circular lists. Even if nothing else points to the circular list, each item of the list will have another item of the list pointing to it. Thus even if a circular list is inaccessible from the program, the reference counts of all of its will still be positive and it will not be reclaimed.

Garbage Collection

Garbage collection is a more common way of handling automatic storage reclamation. The basic idea is that computation continues until there is no storage left to allocate. Then the garbage collector marks all of the blocks of memory that are currently in use and gathers the rest (the garbage) into the heap of available space.

The mark and sweep algorithm for garbage collection starts with all objects accessible from the current environment (or symbol table), marks them and then does the same with all objects accessible from those, etc. After this phase the algorithm sweeps through all of memory, collecting those blocks which were not marked in the first phase (and unmarking the rest). Normal processing then resumes.

There are two problems with this technique. The first is the space necessary in order to hold the mark (though this can just be one bit). A more serious problem is that this algorithm requires two passes through memory: The first to mark and the second to collect. This can often take a significant amount of time (notice the delays in emacs, for example), making this sort of garbage collection unsuitable for real-time systems. This disadvantage has led to this method being abandoned by most practical systems (though still described in texts).

There have been several recent improvements in garbage collection algorithms. The first is sometimes known as a copying collector.

In this algorithm the memory is first divided into two halves, the working half and the free half. When memory is exhausted in the working half, live nodes are copied to free half of memory, and the roles of the two halves can be switched. Notice that the collector only looks at live cells, rather than all. Can be done incrementally, so that very little cost is paid at any one time (less than 50 instructions, probably). This tends to work well with a virtual memory system.

Another strategy is to use a generational collector in which only bother to garbage collect recently allocated blocks of memory. Older blocks are moved into stable storage and not collected as often. Studies have shown that most reclaimed garbage comes from more recently allocated blocks of memory.

In highly parallel architectures can have garbage collection take place in background, minimizing or eliminating delays.

COMMANDS OR STATEMENTS:

Change "state" of machine.

State of computer corresponds to contents of memory and any external devices (I/O)

State sometimes called "store"

Note distinction between "state" and "environment". Environment is mapping between identifiers and values (including locations). State includes mapping between locations and values.

Values in store or memory are "storable" versus "denotable" (or "bindable")

Symbol table depends on declarations and scope - static

Environment tells where to find values - dynamic

State depends on previous computation - dynamic

If have compiler, use symbol table when generating code to determine meaning of all identifiers. At run-time, symbol table no longer needed (hard coded into compiled code), but state and environment change dynamically.

In interpreter, may have to keep track of symbol table, environment, and state at run-time. (In fact could avoid using state if there is no "aliasing" in the language.)

Assignment:

   vble := expressions

Order of evaluation can be important, especially if there are side-effects. Usually left-side evaluated first, then right-side.

		A[f(j)] := j * f(j) + j -- 
difficult to predict value if f has side effect of changing j

Two kinds of assignments:

  1. assignment by copying and

  2. assignment by sharing (often handy w/dynamic typing or OOL's)

Most statements are actually control structures for combining other expressions and statements:

Sequencing: S; T
Selection: If .. then ... else ...
Repetition: while ... do ...

FORTRAN started with very primitive control structures:

Very close to machine instructions

Why need repetition - can do it all with goto's?

"The static structure of a program should correspond in a simple way with the dynamic structure of the corresponding computation." Dijkstra letter to editor.

ALGOL 60 more elaborate:

Pascal expanded but simplified: Ada like Pascal but more uniform loop with exit
		iteration specification loop 
			loop body
		end loop.
where iteration specification can be: Also can have vanilla loop which can be left w/ exit statement.

Also provide exit when ...., syntactic sugar for if .. then exit

Can also exit from several depths of loops

Interesting theoretical result of Bohm and Jacopini (1966) that every flowchart can be programmed entirely in terms of sequential, if, and while commands.

Natural Semantics for commands

Can write natural semantics for various commands:

With commands must keep track of store: locations -> storable values.

If expressions can have side-effects then must update rules to keep track of effect on store. Rewriting rules now have conclusions of form (e, rho, s) >> (v, s') where v is a storable value, rho is an environment (mapping from identifiers to denotable values - including locations), s is initial state (or store), and s' is state after evaluation of e.

    (b, rho, s) >> (true, s')    (e1, rho, s') >> (v, s'')
    ------------------------------------------------------
          (if b then e1 else e2, rho, s) >> (v, s'')
Thus if evaluation of b and e1 have side-effects on memory, then show up in "answer".

Axioms - no hypotheses!

    (id, rho, s) >> (s(loc), s)        where  loc = rho(id)

(id++, rho, s) >> (v, s[loc:=v+1]) where loc = rho(id), v = s(loc)

Note s[loc:=v+1] is state, s', identical to s except s'(loc) = v+1.
    (e1, rho, s) >> (v1, s')    (e2, rho, s') >> (v2, s'')
    ------------------------------------------------------
            (e1 + e2, rho, s) >> (v1 + v2, s'')
When evaluate a command, "result" is a state only.

E.g.,

        (e, rho, s) >> (v, s')
    ------------------------------   where rho(x) = loc
    (x := e, rho, s) >> s'[loc:=v]

    (C1, rho, s) >> s'    (C2, rho, s') >> s''
    ------------------------------------------
             (C1; C2, rho, s) >> s''

    (b, rho, s) >> (true, s')   (C1, rho, s') >> s''
    ------------------------------------------------
          (if b then C1 else C2, rho, s) >> s''

+ similar rule if b false

     (b, rho, s) >> (false, s')
    ---------------------------
    (while b do C, rho, s) >> s'

    (b, rho, s) >> (true, s')    (C, rho, s') >> s''   
             (while b do C, rho, s'') >> s'''
    ------------------------------------------------
              (while b do C, rho, s) >> s'''

Notice how similar definition of semantics for

    while E do C
is to
    if E then begin 
        C; 
        while E do C 
    end


Back to:
  • CS 334 home page
  • Kim Bruce's home page
  • CS Department home page
  • kim@cs.williams.edu