Thoughts on programming languages and tools

So you can see on what knowledge of languages I have based my thoughts, here a list of languages I have used so far: Assembler (6502, 68000 and 8086), APL, Basic, C, C++, Java, Modula-2, Pascal, Perl, Rexx. Used means, I have actually written some (hopefully useful) programs in those languages. There are other languages I have read about, or even used to write little example programs. I did not list them, because I don't think they have had any influence on my thoughts.

I'd like to receive your comments. Tell me what you think of my ideas. Inform me about tools and languages you know, which match my ideas. Please write me: private@claudio.ch.

Modules vs. Java packages

Java has the notion of a package. Classes belonging to the same package may access non-public members of each other. The same can be said of Modula-2 or Oberon Modules. Within a module boundary, no restriction exists, while only explicitly exported types, variables and procedures can be accessed from outside a module.

But there is a fundamental difference between the Modules and Java packages. The Module is also the basic compilation unit. Thus when you collect things into a module, you know, that they are protected from outside interferences. Java choose to elect the class as basic compilation unit. Thus everybody may add classes to a package, and you loose the protection against outsiders.

Errors and error removal

The compiler has the task of converting your high level code into machine language. Thanks to additional information available in a high level language the compiler has furthermore the ability to find errors you make and I really like that. That's why I prefer languages like Modula-2, Oberon or Java over C++ and especially C. My experience showed, that with the first ones you spend a lot less time in the debugger.

The basic idea I have is, that as much rules as possible should be made explicit in the source, and thus available to the compiler. If the fact, that a procedure argument may take only values in the range 1 to 4 is stated in a comment or a separate document, then the compiler cannot help the programmer to make sure, that this rule is obeyed. I'm human, and I make errors, thus I want every help from my tools, to avoid passing the wrong value to this procedure.

The above constraint, could have been formulated in Modula-2 by using a subrange type for the procedure argument. Types are a good way, to define some rules, in a manner which can be handled by compilers very efficiently. For procedure interfaces, a more general approach is to allow the definition of pre- and postconditions. This is not only my idea. I was pleased to hear Herbert Klaeren at JMLC'97 presenting an actual implementation of this idea done in the Gardens-Point Modula-2 compiler. You can find his proposal in his paper "Executable Assertions and Separate Compilation" he wrote together with John Gough.

Precondition

I had "done" it a bit different. Please note, that "done" is just referring to the "idea-level", not that I'm actually implementing something like Mr. Klaeren managed to do.

First I'd define an ERROR statement. A very clear way to indicate, that when this statement is reached, the program is erroneous. Then instead of having just an expression acting as precondition, I would define an actual PRECODE block, preceeding the procedure body. There would be no restriction what the programmer could write into the PRECODE block. But she or he has to be aware of two differences between the PRECODE block and the normal procedure body:

  1. The code written in the PRECODE block is part of the module interface. Changing it, means a change in the interfaces, and thus invalidates all client modules.
  2. The code written in the PRECODE block is compiled during the compilation of the procedure call, and is not included in the procedure code.

The second point sounds to be a code space inefficiency, as when first hearing this, one would suspect, that the code is replicated with every call to the procedure. But in fact it is just a mean to achieve the contrary. When compiling the procedure itself, the compiler does not know with which values it is later called. So it would be forced to compile in the code, just to be sure. On the other hand, when compiling a call to the procedure the compiler might be able to find out, if the execution of the precondition code is necessary or not, and eliminate it.

In a similar fashion, a POSTCODE block could be introduced too. A POSTCODE block would also have the first property, i.e. it is part of the interface. But the code would be compiled into the module itself and be optimized with the procedure code. Being part of the interface it would be available to the compiler for evaluating, which values might be returned by the procedure.

Some more has to be written, so that all implications of these PRECODE and POSTCODE blocks are fully understood.

Error

In the precondition section I mentioned the error statement. I see it, like exception mechanisms in other programming languages, as a mean to give the programmer the possibility to extend the possible error cases. In the same way the system, or compiler says, that 1/0 is an error I want to be able to says, that in a certain circumstance i>5 is an error. So I need an error statement, which allows me to write

IF i>5 THEN ERROR; END;

The compiler should take the error statements as serious as its own errors. When I declare a 10 element array and access a[i] then the compiler will generate error checking code. But when I write a[37] it will abort compilation and tell me, that this program is wrong. I expect the same handling for the ERROR statement. Given the above test for i>5, if the compiler can determine, that i is always greater then 5, then it should terminate with an error, and not compile the program. Obviously any optimizing compiler would completely remove the statement, when it determines, that the value never will be greater than 5.

What remains in the code are all those error statements, which the compiler cannot decide, either because the analysis would be to complex, or because these cases really depend on the input values of the program. But having such possible error statements in my code just means, that it might fail. So in my imagination, the right way to come near an error free program, is to make visible these possible error cases to the programmer and give her or him the possibilities to reduce them. How this could be done is described in the Editor section as I think this has to be an interactive process.

Side effect free functions

A side effect free function is a function which has a result depending uniquely on the value of its input parameters. Or to put it differently, it is a function, that you could replace by a constant array or look-up table. I suspect, that having the possibility to define functions as side-effect free, and having a compiler, which could check that they indeed are, could be beneficial. And this in two ways.

It would indicate to me as programmer immediately, that the code I'm looking at has less surprises to offer.

The compiler would know, that it could compute the value at compile time. While Modula-2 allows constant expression to contain just the predeclared functions like ABS(), one could allow also all side-effect free functions to be used in constant expressions. In combination with the PRECODE block I defined earlier, this could be extended to functions imported from another module.

Memory protection at compile time

This idea builds upon the Modula-2/Oberon SYSTEM pseudo module. This module can be imported as any other module, but does not really exist. It is treated by the compiler in a special way, as it contains types and procedures which are very system-dependent and low-level, so they could not be expressed in Modula-2 itself, like procedures to access CPU registers etc.

As an extension of this idea, I heard, that the Modula-2 standardization committee planned to make mandatory the import from SYSTEM for all features, which are not portable, like type casting etc. So the presence or absence of the import of SYSTEM would distinguish portable modules from system dependent ones.

Out of this i got the idea, that you could move process and memory protection from runtime to compile time, thus removing the need of different privilege levels in the CPU and the need for a MMU, unless you want to have virtual memory.

If you write a program in Oberon, you cannot access memory which doesn't belong to you, unless you cast an integer to a pointer or procedure type, or use SYSTEM.PUT() or SYSTEM.PUTREG(). So the only thing to do, is to make sure no "normal user" has access to these features. And this is how I would do it.

Instead of using IMPORT SYSTEM to allow/disallow system dependent stuff, I define three flavors of modules: normal, protected and system, where the last two could be denoted by preceeding the word MODULE with PROTECTED or SYSTEM.

module type may call procedures in system modules may use system-dependent features
normal No No
protected Yes No
system Yes Yes

Three components of the OS have to behave in a defined way, to allow the mechanism to work, compiler, loader and file system.

The file system must provide the ability to mark a file as regular file, normal module, protected module or system module. Let's call it the "file type flag". Whenever a file is created or modified, the flag is reset to regular file. It has to be change explicitly to one of the other states.

The loader must load and execute only files marked as modules, and must not load a normal module which attempts to import a system module.

The compiler will make sure, that only modules marked as system modules can access features which would break the protection mechanism, like direct addressing, unsafe casting etc.

These are the basic requirements to be fulfilled. How easy it is to fulfill them, depends on the kind of OS. This becomes more difficult, if one wants to allow "untrusted users" to write compilers or even file systems, as in microkernel architectures.

Editor

Programs are usually written using a plain text editor. Even though there are editors which call themself language sensitive, the most I have seen, are editors which do special highlighting of the syntactical constructs and offer the possibility to enter some language constructs with just a few keystrokes.

What I'd like to see, is some tool, which gives me the possibility to actually treat the source as program instead of text. Let me give you two examples of what I mean. The program examples are formatted in a rather compact way, not as I would do it usually.

Example 1: Variable renaming

Given this Java class, where I have added method p

class C {
 int cnt;
 public C() { cnt=0; }
 public void p() { int cnt; for (cnt=0; cnt<9; cnt++){ ... }; }
 public void q() { System.err.println("Counter "+cnt); }
}

and now want to rename the global variable cnt to counter. I can't do that in a text editor easily, because a change cnt->counter would change my local variable in p too. What I'd like is an editor, where I could just select the global variable, enter the new name, and it would change all references to this variable to the new value.

Example 2: Changing procedure parameters

Let's say I have defined a procedure

void dot(int color,int x,int y) {}

and I have already used it in several places. Later I decide, that it would be better in line with some other procedures I wrote, if the color parameter would come last instead of first. What I'd like to have is an editor, where I can just move the parameter in the declaration of dot, and it would adjust all calls accordingly.

Even better, if the editor is connected to the revision control system, and creates an updated version of all other sources referring this procedure.

Interactive error moving

As I mentioned before, as a programmer I would like to know the points, where my program could fail, and have a mean to reduce those points of failure. This could be done in an interactive way, if the compiler would tell me where those points are.

Example

VAR a:ARRAY [0..5] OF INTEGER; i:INTEGER;
 ...
 InOut.ReadInt(i);
 a[i]:=0;
 ...

This code extract contains a possible error, as the assignment to the array really corresponds to

 IF (i<0) OR (5<i) THEN ERROR; END;
 a[i]:=0;

Even the most intelligent compiler cannot optimize away the test and error statement, as the value of i is known only at run time. What would I like to have? A compiler, which would tell me, that the a[i] statement may fail. It would tell it through the editor, by highlighting the statement, and possibly tell me even under which conditions it might fail, either with some english sentence, or by showing me explicitly the above IF.. ERROR statement.

Knowing about the error, I could then rewrite the source to look like this:

 InOut.ReadInt(i);
 WHILE (i<0) OR (5<i) DO
  InOut.Writeln("Only values in the range 1..5 are allowed.");
  InOut.ReadInt(i);
 END;
 a[i]:=0;

Then any not so dumb compiler knows, that i will be in the admissible range, when it is used to index the array, and that the error will never happen and eliminate it.

Of course, in this simple case, I would have never made the mistake in the first place, but believe me, as soon as programs get more complex, those errors occur.


Last updated: 1998-01-10 by Claudio Nieder