Subroutines, Functions, Methods

Most, if not all, programming languages allow you to create subroutines, functions, or methods, all of which follow the same principle. You can think of them as black-boxes that contain source code to perform a specific task. For example, you could create a function make_coffee() which contains source code that makes coffee. Whenever you feel in need of a steaming hot cup of coffee in your program, you just need to invoke the function.

Now, imagine you were a tiny little person locked up inside this black-box and your job was to make that coffee. Imagine you standing at a Starbucks counter, some customer walks up and says: make_coffee(). What would your immediate reaction be? Would you inquire for more specific details on that order, such as “Tall? Grande? Venti?”, “For here or to go?” and so on?

It is easy to see that a simple command such as make_coffee() is not sufficient for you to start brewing coffee (… apart from the fact that your client must have learned manners by speed-reading 🙂 ).

In computer programming, a function usually cannot “go back asking questions” after it has been invoked. So here we need a clever customer who passes all necessary information within the invocation of the function: make_coffee( “for here”, “venti”, “americano” )

The pieces of information you pass to a function at the very moment of its invocation are called parameters. A function may have zero, one, or many parameters of any data type. I have been using three parameters of String (or char* for C-Programmers) in this example.

function1Let us now take a look at how functions can return a result. As stated above, the function make_coffee(…) will eventually deliver a hot cup of coffee. Any invocation of make_coffee(…) is going to directly represent the function’s return value at runtime. Thus, a function can return only one single result value upon each invocation. If you want a function to return more than one value, e.g. multiple cups of coffee, at a time, you’ll have to make the return type a container that may contain than one item.

The way how functions are defined and invoked basically looks similar in most programming languages. The following example shows how we define a function in C.

function1This function is capable of adding two integer values (a and b). It returns the result value directly.

We can now use (invoke) this function within other functions, whenever we need to add two values.

int result = add( 5, 3 );

Note that the function invocation add( 5, 3 ) directly represents the corresponding return value. This allows us to assign it to an arbitrary variable for further use, e.g. output.

You may also invoke functions within larger formulae. Always keep in mind that the invocation of a function represents its return result.

double z = (1/3.0) - (x*( add( 5, 8 ) + add( 1, 2 )*2 ));

If you want to impress your friends, you can even invoke functions where you pass parameters to functions. This example calculates 3 + 5 + 2 + 7.

int result = add( add( add( 3, 5 ), 2 ), 7 );

Be aware, however, that code like this may significantly decrease both readability and maintainability of your source code.

Have fun!

— Andre M. Maier

Posted in Programming Essentials, Uncategorized | Tagged , , , , , , ,

Swapping Variables

Many algorithms, including the world-famous Euclidean, require to swap the values of two variables. So you probably know the following principle.

int x = 8;
int y = 123;
int tmp = x;
x = y;
y = tmp;

However, do you know how to swap the values of two variables without using a temporary variable? If you do, feel free to skip the rest of this article. If you don’t, you’re either not one of my students, or you haven’t been paying attention due to other, more thrilling activities (shame on you!).

Well, there are basically two ways how to swap variables without using temporary memory.

1. PSM (Primary School Math) 😉

x = x + y;
y = x - y;
x = x - y;

or

x = x * y;
y = x / y;
x = x / y;

Homework: Find out why this works!

2. XOR (Exclusive-OR)

An old professor of mine used to say: “There is nothing that you can’t do with XOR”, and he proved to be right.

One characteristic of the XOR function is its ability to undo/redo itself. If you do an XOR operation twice you will get back to the initial situation. If you’re not familiar with this, you may want to read The X-Factor (OR The Power of XOR), 2. OTP (One-Time Pad) Encryption first.

So what we’re gonna do in the first line is to “stir” y into x. Variable x now “contains” both x and y. To retrieve the initial value of x, which is to be written to y, we just have to do the same operation again (see second line). y now contains the initial value of x, whereas x still “contains” both values. In the third line, we retrieve the initial value of y from x by performing the same operation yet again (remember that in this line y contains the initial value of x).

x = x ^ y;
y = x ^ y;
x = x ^ y;

Homework: Try to understand the explanation above. 😉

A warning …

… to all those who are planning to use these cool and nerdy swap methods in practical life!

Please, please. Only use the very first method, involving a temporary variable, in your every day programming projects. There are multiple reasons why you shouldn’t use the “cool” methods:

  • Today’s processors have fast accessible caches (beside other technologies) that make it very unlikely for these methods to have higher runtime efficiency.
  • Saving 4-8 byte of runtime memory isn’t worth any disadvantages.
  • You can give someone a heck of a hard time debugging your program, especially if x and y do not contain the actual values but pointers instead. (Think twice: This someone could be you!)

You cannot believe that what you have learned is totally useless? Well, every discipline of science will come up with useless facts once in a while. And it is exactly this what some people (aka nerds) may find so sexy about science. 🙂

Enjoy!

— Andre M. Maier

Posted in Bit-Twiddling, Information Theory, Uncategorized | Tagged , , , ,

A Beginner’s Guide to Object Oriented Programming (1)

What’s so fancy about Object Oriented Programming?

Computers have been invented to help real-world people solving real-world problems, even though it seems today that computers actually contribute problems to the real world. Since a plain computer doesn’t know anything about the world beyond its steel case, it takes a programmer to describe a model of the real world to the computer. The programmer’s job is to overcome the semantic gap by translating human language into a programming language.

Suppose you want to calculate the average speed of a car that travels 100 miles in 2 hours. In this case, the model of the real world only consists of a simple mathematical equation. The model will get more complex, once you begin to consider fuel consumption, headwind, tire abrasion, earth rotation, temperature, theory of relativity, and whatever crazy stuff is going on in this universe.

The world that we live in is extremely complex and reality does not consist of mathematical equations or algorithms. Instead, there are objects changing their properties by interacting with each other. There are factories in which engineers use blueprints and machines to build new objects. Engineers can copy blueprints, do some modifications to them, and build objects that are similar to the original ones. Large objects can be composed of smaller objects … I think you get the idea. As early as around 1960 some computer scientists invented a programming paradigm that uses real-world thinking to describe the model: Object Oriented Programming (OOP)

Some basic vocabulary

If you want to learn how to write object-oriented programs you first need to get familiar with some terminology.

A class is the blueprint that describes what properties an object may have and what it is able to do. The properties of an object are called attributes or fields.  “What an object is able to do” is defined in methods. Methods can also be imagined as some kind of “levers” that allow other objects to interact with this object.

The act of “building” a new object from a class is called instantiation. An object is also called an instance of a class, although it is correct to use the word object for objects. 😉

Let’s have a look at a practical example: We are going to design a class which describes the model of a door that can be opened and closed.

Firstly, we need to name the class we are going to program which is easy, because it is simply the noun of what we are going to program: Door

Secondly, we need to define how other objects, Hans for example, can interact with the door. Hans shall be able to open and to close the door. The verbs in the sentence will give you an idea of what the methods are: open() and close(). Note that there are always parentheses attached to a method name.

Finally, we’ll have a look at the attributes of the class. If you think of properties that our door may have, you’ll probably come up with a huge list. The door has a weight, a height, a width, a color, a material, a specific temperature, the state (open or closed) and so on. Always keep in mind, however, that there is no model without limitations. Hence we will focus on those properties that are essential for the model to function. You can often identify them by asking yourself: “Which of the properties may be influenced by methods of the class?” In this example we need just one attribute, isOpen, which can be true (if the door is open) or false (if the door is closed).

A common mistake is to introduce redundant attributes: isOpened and isClosed, in this example. Obviously, a door cannot be open and be closed at the same time. Whenever one of the attributes is modified in a method, the redundant information would have to be updated as well. In larger projects this can result in hardly maintainable source code, a high risk of bugs, and last but not least, waste of runtime resources.

How to describe a class

UML (Unified Modeling Language) provides a variety of diagrams that allow to describe any detail of entire software projects. Classes are described by UML class diagrams. The following class diagram shows our model of a door.

A class diagram consists of three rows. The class name (always bold and centered) is in the first row. The second row contains the attributes, and the third row contains all methods. Both boolean and void are data types. It basically means that isOpen may only have two states, either true or false. void after the methods means that nothing is returned from the method. Don’t worry about this, and don’t worry about the + and the – signs, either. I’m going to explain this in one my next postings on OOP.

So now have fun creating object-oriented models of your car, of your pet, or of your boyfriend or girlfriend. 🙂

See you around,

— Andre M. Maier

Posted in Programming Essentials, Uncategorized | Tagged , ,

1.61803398…

In order to declutter my website at school I decided to move all explanatory content over to this blog. I decided to start with an interesting number I have been using in some of my programming assignments. This magic number is the numeric result of a simple problem found by Euclid about 300 BC.

Imagine a line which is described by two line segments a and b.

Euclid’s assignment is to position the point in a way that a/b is equal to (a+b)/a. In plain English. What is the ratio when a over b is the same at total length over a.

Any high school student should be able to solve the problem, because it just involves solving a quadratic equation. (In case you didn’t like math back in high school, here’s how it works 😉

Since negative lengths (probably) do not exist in real life, we may focus on the positive solution which is … tadaa … 1.61803398

So why is this number so damn awesome? Well, the are plenty of reasons.

One reason is that there are a lot of different algorithms that lead exactly to this number. For instance, consider the following sequence of numbers:

1, 1, 2, 3, 5, 8, 13, 21, 34, …

This series is called the “Fibonacci Series”. You might have noticed that each new number is determined by adding the two previous numbers.

So what happens if you choose a random number out of this series and divide it by its predecessor? Let us choose “34”: 34 devided by 21 equals 1.61905 — a number that comes very close to the above solution. If we choose higher numbers from the series we get closer and closer to what is called the “Golden Ratio“.

Ok, I must admit that there are many more other mathematical constants that can  be determined by different seemingly independent algorithms, but here are the really interesting facts about the Golden Ratio:

  • You will find rectangular shaped things (e.g. TV screens, swimming pools) more appealing, if their width/height ratio is about 1.618.
  • The Golden Ratio defines the dimensions of the human profile and even those of our teeth.
  • Many artists (most of them unconsciously) rely on the Golden Ratio to make their pictures look appealing.
  • The shell of certain snails reveal the Golden Ratio (most famous among them is the nautilus pompilius)
  • You will find the Golden Ratio in the growth pattern of certain ferns
  • … and more

Well, so if you meet an ugly person don’t call him or her ugly. Say “Your face seems to have missed the Golden Ratio by a mile.” instead.

Cheers,

— Andre M. Maier

Posted in Math, Programming Assignments, Uncategorized | Tagged , ,

Yet another “trick” that isn’t really a trick …

In my attempt to answer a question during my lecture this afternoon I typed in a quick-and-dirty program that was supposed to illustrate what I was saying. In the program I had to pass a variable number of struct pointers to a function, which I thought was a piece of cake. However, the straight-forward technique I’d been using made my poor students staring at the screen in disbelief. Although they would have had the technical background, they said they did not understand what I was doing.

What had struck them was something like this. (Note: I have changed the data type to int, and I have also simplified the code in function f to merely display the values passed, which hopefully will direct your attention to what is essential.)

#include <stdio.h>

void f( int* args ) {
  while( *args ) {
    printf( "%d ", *args );
    args++;
  }
  printf( "\n" );
}

int main( int argc, char* argv[] ) {
  f( (int[]){ 1, 5, 8, -2, 3, 6, 0 } );
  f( (int[]){ 3, -6, 2, 20, 0 } );
}

Function f is invoked by function main() twice. In both calls of f, a sequence of integer numbers (array literal) is type-casted to an array of int. Function f will now receive a pointer to a sequence (array) of int values. Hence, the argument of f must be an int pointer. Dereferencing this pointer will give us the first value. The number of values (or length of the array) is determined by the terminating zero.

This “trick” (which isn’t really a trick) is based on a principle that you have already used when passing a text strings to functions in C. Strings in C are arrays of characters, and any array is represented by a pointer that holds the address of its first element. So it doesn’t matter which data type you want to pass. Feel free to pass ints, doubles, or even struct pointers. Easy, isn’t it?

If you still do not feel comfortable with pointers, read this (Intermediate) or this (Introduction).

Have fun!

— Andre M. Maier

Posted in Pointers, Programming Essentials, Uncategorized

Privacy on the Internet? (non-technical)

Last week I had a lengthy discussion with a friend about privacy issues pertaining to social networks on the Internet. Privacy and social networks? The longer I think about it, the more I believe that our society must be sick … seriously sick.

There is people out there, who do not want any other human to know their name, date of birth, or even the obvious fact of their existence. There is nothing wrong with this decision, and I fully respect the human’s right to freedom.

What really drives me nuts, however, is the fact that such persons sign in on facebook, Google+, online stores, etc. I’m sorry, but in my personal opinion there is no other word for it than ‘digital media illiteracy’. The reason why I’m using such a harsh expression is that interrelations between privacy and worldwide computer networking should be very easy to understand. You don’t need a grade in computer science to know that if you interconnect all computers in the world by zillions of cables, any data within this network could potentially go everywhere in the world. Furthermore, any computer connected to the network may share its free resources with other computers, which results in a tremendous computing performance. Both the accessibility of data and the enormous computing power may allow “the net” to gain new information from existing data.

In other words, “the net” is capable of calculating missing information about you from information that already exists somewhere on “the net”. Similar to a CSI profiler, “the net” will follow all traces you leave while browsing websites, using search engines, registering software, checking your email and bank accounts, and so on.

Some people may argue that gathering such information implies illegal activities such as hacking. What they forgot to consider is that ‘illegal’ always refers to some legislation in some specific state, country, or part of the world. What they also did not take into account is that it is relatively easy for hackers to disguise their real identity.

The reality is dark, creepy, and terrifying. Anything else is putting lipstick on the pig!

I doubt that it would be possible to stay safe from theft of privacy in today’s world, especially for anyone participating in social life. If you want to have a try, you should never

  • own a computer
  • own a mobile phone
  • receive or send any email
  • participate in activities that may be discussed on the web
  • participate in sweepstakes or lottery
  • meet a person who might send information about you into the web
  • rely on legislation …

Even in the “good old times”, long before the invention of computer networks, people became victims of fraud or theft of information. What has changed since then is not the technical possibility, but only the means and the efficiency of collecting and processing information.

Instead of enacting hundreds of national laws that misleadingly suggest safety to the technically unversed, we should boost media education. If you are a teacher, talk to your students about this issue! If you have children, keep an eye on what they’re doing on the web!

Have a Merry Christmas and a Happy New Year, and …

… see you on “the net” 😀

— Andre M. Maier

Posted in Non-technical, Uncategorized | Tagged , ,

The X-Factor (OR The Power of XOR)

Once upon a time, a girl (non-computer scientist) had bought two pieces of cake for coffee. She asked her nerdy boy-friend: ‘Do you want this one or that one?’ He said ‘Thanks!’ and he quickly moved both pieces onto his plate. If you know why he behaved this way, you know you’re into computer science. 😀

What the girl should have said is something like that: ‘Do you want this one XOR that one?‘ (which could be translated for normal people as ‘You may freely choose either this one or that piece of cake.‘)

Most of our real-life decisions are being based on XOR logic rather than just OR. X stands for eXclusive, which means that one of the conditions (namely the condition involving ‘both‘) is excluded.

Not only in real-life, but also in computer science, the XOR is a fundamental logic function. As we will see later in this article, it is used in numerous important technical applications.

The XOR function is usually described by a logic symbol, a truth table, or a boolean equation (or all three of them 😉 ).

1. Adding numbers

In binary system 0+0=0, 0+1=1, 1+0=1, and 1+1=0 ‘carry 1’. If you compare the bold printed numbers with the x-column in the truth table above, it is easy to find out that the sum is the result of an XOR function between both summands.

Thus, a simple electronic circuit that is capable of adding two bits looks like this.

The second logic gate beside the XOR function is being used to generate the carry bit. The carry bit is only set to 1, if  both summands are 1 (which means logic AND).

When adding larger numbers you need to XOR each digit of the number, including the carry bit that is to be considered in all digits except in the least significant bit.

2. OTP (One-Time Pad) Encryption

The result of A XOR B is a measure of “how different A is from B” (or vice versa). If a bit of a word A is different from the same bit of a word B, the resulting bit is 1.

This fact is being used in cryptography. Let us assume you have a piece of plain text, e.g. FLY, which you would like to encrypt with a secret word (key), e.g. EST. This can be easily accomplished by using an XOR function between FLY and EST. So let’s do FLY xor EST (both texts written in 8bit ASCII Code, of course).

01000110 01001100 01011001 (FLY)
01000101 01010011 01010100 (EST)
-------------------------- xor
00000011 00011111 00001101 (encrypted message)
==========================

The result of FLY xor EST is the encrypted message. It is a bit pattern that doesn’t seem to make any sense. If we tried to ASCII-translate this message back into characters, we would only get a bunch of senseless control characters.

Remember, that the resulting bit pattern describes which bits are different between the original plain text and the secret key. In a binary system there aren’t too many options for a bit to be ‘different’, which allows us to decrypt the message in exactly in the same way.

01000101 01010011 01010100 (EST)
00000011 00011111 00001101 (encrypted message)
-------------------------- xor
01000110 01001100 01011001 (FLY)
==========================

This is totally fly, isn’t it?

Although this method of encryption/decryption is amazingly safe, there are some deficiencies. Firstly, it relies on a secret key which needs to be transported from the sender to the receiver. From history we know that this phase can get you into deep shit …

Secondly, if your key is shorter than your message you would probably repeat the key over and over again. Any repetition inevitably adds redundancy to your encrypted message, which makes it easier to break the code. (I won’t describe here how this works, but some of you, dear readers, will have the honor to encounter a similar assignment in your final exam. 😉 )

3. Generating Random Numbers

I know that some of you have made different experience, but your computer is a deterministic machine that always behaves in a logical and predictable way. So it is virtually impossible to use a computer for generating real random numbers.

There are hundreds of different algorithms that allow to generate pseudo-random numbers based on a so-called seed. The seed is a number or a bit pattern that is fed into an algorithm which calculates a sequence of numbers that looks pretty much random. To make the seed less predictable, it is chosen from system time, recent mouse movement, keystrokes since login, etc.

In many applications, however, a cheap and simple pseudo-random generator will do the job. All you need is a linear shift register and – look and behold – an XOR function.

The result of the XOR function is fed back into the rightmost bit of the linear shift register. The two leftmost bits of the shift register form the input of the XOR gate. Although the sequence of numbers will repeat after (2^n) – 1 states, the sequence itself seems to be random. There are numerous ways how to build a pseudo-random generator from a linear shift register that is fed back by one or more XOR gates. This one is not very sophisticated, but in many cases it will be sufficient enough.

Check this out for more information on linear feedback shift registers (LFSR): http://en.wikipedia.org/wiki/Linear_feedback_shift_register

Hint: Write a C program that simulates this principle (an excellent exercise on masking and bit shifting)!

See you,

— Andre M. Maier

Posted in Bit-Twiddling, Uncategorized | Tagged , , ,

Lost in translation (C vs. Assembly Language)

In my classes on microcontroller programming I often draw comparisons between C and assembly language. This is to show my students that either programming language has both advantages and disadvantages. While the efficiency of C code predominantly depends on how ‘clever’ the compiler is, the efficiency of assembly programs almost entirely depends on the cleverness of the human programmer. The reason is that the content of the machine code instruction has been determined by the hardware manufacturer. This offers a larger number of possible solutions for one specific problem. A C compiler, however, will translate your source code into machine code instructions that ‘seem’ reasonable and efficient in one specific situation.

Here are two examples I use in my classes. They should work for any 8051 compatible CPUs.

Example #1

if( a == 10 ) {
   // do something
}
else {
   // do something else
}

could be translated to the following assembly code.

           cjne a, #10, not_equal
           ; do something
           jmp cont
not_equal: ; do something else
cont:      ; continue program ...

The cjne instruction (Carry Jump Not Equal) compares register a (accumulator) and the decimal constant of 10. If a is not equal to 10, the CPU will jump to the label not_equal. If a is equal to 10, the next command will be executed. In this example the CPU will continue the program after the control structure. You will probably agree that the principle above is pretty much straight forward and easy to understand.

In assembly language there is a different approach to accomplish exactly the same task, though. Consider the following code.

          xrl a, #10
          jz is_equal
          ; do something else
          jmp cont
is_equal: ; do something
cont:     ; continue program ...

In order to fully understand this code, you need to have some basic knowledge on logic functions. The xrl instruction performs an XOR operation between the accumulator and the constant value of 10. An XOR function actually measures the ‘grade of differentness’ between two values. If the values are equal, the result is zero. (If you neither know nor believe this, just have a try with pencil and paper.)

So all we have to after the XOR operation is to check whether the result (that is to be found in the accumulator) is 0. If it is zero, the CPU will jump to the label named is_equal. If the result is any value other than 0, the program continues right after the JZ (jump if accumulator zero) instruction.

It is difficult to say which of the two translations is better. Which one you should prefer will depend on a variety of criteria, e.g. runtime or storage optimization, the context (how the code fits into your program), etc.

Example #2

if( a < 100 ) {
   // do something
}
else {
   // do something else
}

Again, the cjne instruction could provide the basis of a straight forward solution. cjne compares two values, but it also sets or clears the CPUs carry bit to express ‘less than’ or ‘greater than’ respectively. So we can first use cjne on a and the constant value of 100, and then check the state of the carry bit. If the carry bit is 1, a is less than 100. If the carry bit is 0, a is greater than 100. The according assembly code would go like this:

           cjne a, #100, not_equal
           ; do something else (here a==100)
           jmp cont
not_equal: jc less_than
           ; do something else (here a>100)
           jmp cont
less_than: ; do something (here a<100)
cont:      ; continue program ...

This solution includes a lot of jmp commands, which – as you might expect – will slow down your program significantly. In fact, there is a much smarter solution in assembly code.

           subb a, #100
           jc less_than
           ; do something else (a>=100 here)
           jmp cont
less_than: ; do something (a<100 here)
cont:      ; continue program ...

This smarter way relies on the fact that if you subtract any number from a smaller number you will cause an underflow, which in turn causes the CPU to set the carry bit.

There are a lot more examples that show the influence of more or less ‘intelligent’ compilers … or programmers.

In higher programming languages, such as Java, that are being employed in an environment of sophisticated operating systems and/or virtual machines, you will often see no difference in code efficiency. This is for two reasons:
1. The underlying software layers will weed out most of your programming sins
2. In systems of high (and cheap) hardware performance it often is legitimate ‘wasting’ a couple of nanoseconds by running less efficient code. After all, a lot of hardware performance is being spent on reducing the semantic gap between human and machine, anyway.

See you in class,

— Andre M. Maier

Posted in Programming Essentials, Uncategorized | Tagged , , ,

Source Code Readability

Summer holidays ahead, yay! This will be the time when my job is an enviable one. Having spent the last couple of weeks grading never ending piles of final exams and thesis projects I feel like I deserve it. Most of this time has been dedicated to reading and to understanding sets of twenty or more programs, which all were supposed to solve the same assignment, but followed different approaches, contained a huge variety of logical errors, etc. Some of these programs were extremely hard to read. Working through them can be both tedious and annoying for teachers and college professors. For companies, however, bad source code can be really expensive. Poorly readable code is a ticking time bomb, in fact.

If you are an experienced professional programmer you probably know that every programmer has his or her personal “handwriting”. You merely have to look at some lines of code and you instantly know who the author was (or wasn’t). In this posting I’m not going to preach a certain “handwriting”. It only concentrates on the most important rules to improve the readability of your code. Programming is similar to speaking a language. A slight accent can make you sound sexy, whereas a strong accent may negatively impact the intelligibility of what you are saying.

1. Choose names wisely!

Always use appropriate names when naming variables, methods, parameters, attributes, constants, classes, and so on. If a variable is used for, say, counting events, name it numberOfEvents rather than n. You will notice that code like this requires far less comments. (Reading comments is time consuming!)

// bad code
int n = 0; /* counts the number of events */

// better code
int numberOfEvents = 0;

Boolean variables may be true or false. If you put is or has in front of the actual variable name (e.g. isEmpty), the name automatically explains that if the value is true.

// bad code
boolean temp; /* true, if bottle is empty */

// better code
boolean isEmpty;

Variables should express the fact that an action is triggered when they are executed. They contain a set of instructions, so they do something. Java Code Conventions, for example, recommend to let the name begin with a verb.

// bad code
public double mean_value( double[] a ) { ...

// better code
public double calculateMeanValue( double[] a ) { ...

Find out if there are any code conventions for your programming language and/or platform and stick to them as close as possible.

Here is an example of what will happen when conventions are ignored: Let’s have a look at the following two lines from a Java program.

A.m1();
a.m2();

If you asked an experienced Java programmer which of the methods is static, he would most probably assume that the correct answer was m1. He might be right in perhaps more than 90% of all Java programs there are in the world. However, if somewhere in the program A is declared to be …

a A = new a();

… his answer is 100% wrong. According to the Sun Java Code Conventions the line above should be

A a = new A();

instead. Not paying attention to upper or lower case is a tiny little sloppiness which could significantly reduce the maintainability of your program.

2. Indent, align, and break the line!

The general idea of indentation is to kind of “visualize” the structure of a program. Classes may contain methods. Methods (or functions) may contain control structures. Control structures may contain nested control structures, … you get the point.

public void m() {
    if( !x ) {
        while( y ) {
            // some code
        }
    }
}

When closing an indented block make sure to place the closing bracket exactly at the appropriate level. If you ever had to debug a “} expected” compiler error in an unindented program with more than three nested blocks, you know what I’m talking about.

Proper alignment and use of line breaks not only makes your code look clean and tidy, but it also provides for better readability.

// bad code
double tempResult,speedLimit,positionOfTheSun
,current,variableWithReallyLongName,
someOtherVariable;
int sizeInTerBytes=4,lengthOfText=7,numberOfLines=30; 

// better code
double tempResult, speedLimit, positionOfTheSun,
       current, variableWithReallyLongName,
       someOtherVariable;

int sizeInTeraBytes = 4,
    lengthOfText    = 7,
    numberOfLines   = 30

Some companies have very strict code conventions regarding indentation, line breaks, spaces, and so on. Whatever convention you’re following, be consistent!

This particularly applies to the usage of tabs in your program. Do not mix tabs and spaces! This can drive you nuts if you reopen or print your source code file. Be aware, that most IDEs provide tab-based automatic indentation by default. If you accidentally mixed tabs and spaces, check your IDEs preferences and keep an eye out for an option that is called “convert tabs to spaces” or something like that. Some people recommend to generally avoid tabs in source code.

3. Curly Brackets – Not just a matter of taste!

There is one subject among programmers which is like discussing religion. It is the simple question whether this

if( x ) {
    // some code
}

or that

if( x )
{
    // some code
}

placement of curly brackets is better. Some people claim that the second style was easier to understand for beginners. Other people say that the opening bracket belongs to the control statement and therefore should not waste a separate line of code.

Many of my students say it doesn’t matter at all, so they seize the opportunity to try out all kinds of different styles in their final project.

public static void m( boolean x, boolean y ) {
    if( x )
    {
       m1();
    }
    else {
       while( y ) {
           y = m2(); }
    }
}

This of course does not contribute to good source code readability. As to readability it doesn’t matter which style you are using as long as you keep using one style consistently.

If you are not sure which style you should be using for your program, have a look at the source code of the libraries that come with your compiler. Also, you should be aware that if you are using a style that is not commonly used in this particular programming language, you might betray yourself as being unexperienced in that language. A Java programmer who uses the second style from above may be asked: “You don’t do Java very often, right? Looks like you normally do C or something.” So when in Rome, do as the Romans do. 🙂

In case you are interested in more styles that have been invented, check out the following link: http://en.wikipedia.org/wiki/Indent_style

4. Add comments, but add them carefully!

Back in the late 80s I had been taking Pascal programming classes at school. My teacher always said: “There has to be one comment for each line of code!”. I hate to say this about things my teachers said, but this was blatant nonsense.

When you comply with all of the previously discussed rules, you will notice that you won’t need a large number of comments in your program. A comment should always explain why you wrote the code, not what the code does from the computer’s perspective. As this is difficult to explain, I’ll show you an example.

// bad code
int counter = 12; /* assigning 12 to variable
                     counter */

// better code
int counter = 12; /* the counter will count
                     downwards starting from 12 */

Always remember that any other person who needs to understand your source code is someone who knows the syntax and semantics of your programming language. So don’t explain the obvious.

What is not directly visible from the code is both the general context (overview) and why the code has been added. Many programmers prefer to write comments before writing the code.

In some programming languages there are even strict conventions on comments. Here is an example from Java which shows you how pedantic these can be.

// bad code
/**
  * This method displays "Hello World!"
  */
public static void helloWorld() { ...

// better code
/**
  * displays "Hello World!"
  */
public static void helloWorld() { ...

Whether you are pedantic or not, you should always be familiar with the conventions that apply to your programming language. By the time when you want to have your own code added to the API of this language you have to be pedantic.

I’d be happy enough if my students followed the most basic rules, which are easy to comply with.

Well, that’s all for today. Have a great weekend!

— Andre M. Maier

Posted in Programming Essentials, Uncategorized | Tagged , , , , ,

Never waste your memory! (Huffman Encoding)

In Morse code, a simple “dit” (.) represents the letter e, while a more complex sequence “dah-dit-dah-dah” (-.–) represents the letter y. As you probably know, the occurrence of e is significantly higher than the occurrence of y in most languages. So obviously Samuel F. B. Morse tried to make his code as efficient as possible for transmission of large text documents.

Morse followed a principle which is fundamental to data compression. In an uncompressed simple text file all characters may be represented by ASCII characters. The length of each character is exactly 8 bit, regardless of what character it is and how often it occurs in the text.

David A. Huffman (1925-1999) developed a fancy method which allows us to encode symbols, e.g. characters, in a way that symbols of high occurrence will be represented by shorter bit patterns than symbols of lower occurrence. In fact, Huffman encoding is one of the most efficient ways to reduce redundancy (unnecessary data) while maintaining 100% of the information. In computer science this is therefore called lossless data compression.

At this point I’d like to skip the mathematical background, which is not as hard to understand as you might think, though. Instead, I want to show you a concrete example. The German word “ERDBEEREIS” (strawberry ice cream in English) takes 10 * 8 bit = 80 bits when stored in 8-bit ASCII code.

Let’s now try to reduce its memory consumption by applying the Huffman algorithm. What we are going to do is to build a binary tree that helps us to find out the bit pattern for each letter.

Step 1 – For each different letter in the word create a node and write down its occurrence in the word.

Step 2 Take the letters that have the lowest occurrence and combine them into a new node. Add the occurrences of both letters. (In our example we might as well could have chosen ‘D’ and ‘B’ instead of ‘I’ and ‘S’).

Step 3 – Now again choose the two nodes of lowest occurrence, combine them into a new node, and so on.

The Huffman tree is now complete. We can easily find the code for a letter starting from the root and moving down to the letter. Each time we follow the left branch, we add a 0 to the code. As you can guess, each time we follow the right branch, we add a 1 to the code.

So the code for letter ‘B’ is 1011 while ‘E’ is just a single 0.

With this tree, the word “ERDBEEREIS” is coded as
0 100 1010 1011 0 0 100 0 110 111

The compressed word only takes 24 bits of memory instead of 80 bits. Pretty cool, eh? (For those of you who feel comfortable with math: The entropy of “ERDBEEREIS” is about 23.2 bits, which means the redundance is now <1 bit.)

You think we do not need to bother about “wasting” bits in an era where we can get Terabytes of memory for no money and where we have Gigabit-Ethernet? Well, you might be wrong, because the amount of data we need to store and to transport has steadily increased (well, it literally had exploded during the last decade). Just think of television on demand, high definition movies, and so on. Data compression is and will be an issue of growing importance.

Well, now relax and get yourself some ERDBEEREIS. 🙂

Cheers,

— Andre M. Maier

Posted in Bit-Twiddling, Information Theory, Uncategorized | Tagged , , , , ,