Regular Expressions

At the very beginning of this post I’d like to point out that this is neither related to a specific programming language nor related to a specific operating system, although I’ll need to employ some “real-world” examples for better understanding. Regular expressions are supported by a myriad of programming languages, text editors, command line interpreters, software applications, calculators, electric toothbrushes, and so on. What I’m saying is that anyone who even thinks about moving his or her stubby fingers near a computer keyboard ought to know at least about the existence of regular expressions and their basic capabilities. The purpose of this post is to provide you with this vital information.

Let’s consider the following example: You just wrote some source code that prompts a user to enter his or her email address. A variable named email holds the user’s input.

Now, how are you going to check if the address format is valid? If you haven’t yet heard of regular expressions, you’d probably loop through the string to count the occurrence of certain characters: A valid email address must contain exactly one ‘@’ character, a ‘.’ character, and the address must not contain any whitespaces. Now, if you think you’re done with this program, hold on a sec. We haven’t yet made sure there is at least one character preceding the ‘@’, at least one character residing between the ‘@’ and the ‘.’, and at least two (but not more than four) characters following the ‘.’ character, well, oh what about country code second level domains, such as ‘’, … blah … blah … blah …

Writing something like this in C can get pretty awkward. You may object that you could use a programming language that provides highly sophisticated string functions or methods. Still though, the solution would involve many lines of code that are apt to compromise your program’s maintainability and as well as its extensibility.

Here is where regular expressions, often referred to as regex, come in handy. They allow us to describe the format of a string in pretty much the same way like we would describe it verbally.

Let’s simplify the example above for the sake of easier understanding.

“An email address is considered valid, if it starts with a sequence of one or more characters (only lowercase letters or numbers allowed), which is followed by an ‘@’ character, which is then followed by yet another sequence of one or more characters (again lowercase or numbers), which is succeeded by the ‘.’ (dot) character and eventually followed by at least two but not more than four lowercase letters.”

Well, you may now think of Sheldon Cooper from The Big Bang Theory, but in just a second from now you will see that this is almost exactly the way we define a regular expression pattern.

Here’s the according regex:


Let’s have a closer look on the details.

regex[a-z0-9] is called a character class. It matches either a lowercase letter (ranging from a to z) or a number (ranging from 0-9). The +, marked green, indicates “one or more occurrences” of the previously described pattern. If we removed the +  from the expression, we would say that the @ character must be preceded by exactly one single [a-z0-9] character. If we replaced + by a *, the meaning would change to “zero or more occurrences”. This would make the address part left to the @ optional.

Most characters, such as the @ character, can be used directly as a literal in the regular expression. In this example, the @ indicates that we are checking for a mandatory @ character to follow.

What’s then following is something you should be familiar with by now. Yes, correct … good old [a-z0-9]+, yet another sequence of one or more lowercase letters or numbers.

The ‘.’ (dot) representation \\. looks kind of weird. However, a simple . wouldn’t do, because this would have a special meaning, similar to the + character we’ve seen before. Such special symbols are called metacharacters. To use their literals, we need to “escape” them with \\. So if you were to check for a mandatory + character, e.g. in a phone number +49…, you’d have to do a \\+ in the regular expression.

I guess I’ll no longer need to explain [a-z]. The interesting part, however, is the number of occurrences at the end of the expression: {2,4}. This says “at least 2, but not more than 4 occurrences”. If we wrote {42}, the top level domain would have to be a sequence of exactly(!) 42 lowercase letters.

Of course there are much more metacharacters you can use in regular expressions. Various sites on the Internet offer more or less complete lists. Note that the syntax of regular expression can vary slightly between different programming or scripting languages. The principle of how a regular expression is built, most character classes, and most metacharacters are always the same, though. Learn one – know ’em all.

Now that you have seen how a regex pattern is built, you may wonder how you can actually use it to perform a format check on an email address. This step may differ between different programming languages. Plus, some languages, such as Java, offer more than one way to employ regular expressions.

The following Java program shows the simplest approach in Java to regex check an email address.

import java.util.Scanner;

public class Demo {

  public static void main( String[] args ) {

    Scanner s = new Scanner( );

    System.out.print( "Enter your email address: " );
    String input =;
    String pattern = "[a-z0-9]+@[a-z0-9]+\\.[a-z]{2,4}";
    if( input.matches( pattern ) ) {
      System.out.println( "Valid." );
    else {
      System.out.println( "Invalid." );

In Java, any String object provides a method which allows you to verify if this string matches the regex pattern that is passed as its argument. Impressive, isn’t it?

Now think of what you can do with this stuff! You could dynamically generate regex patterns during runtime. Or if you’re a homo nerdus delirum, you could use regular expressions to match regular expressions. 😉

On the Linux command line interface (aka ‘Shell’), regular expressions are used to filter information. Consider this example:

ls | egrep "^[0-9]{1,3}\.jpg$"

This command lists all files with names that start with one to three numbers and end with .jpg. The regex pattern (printed bold) contains two metacharacters we haven’t yet been looking at: The ^ character to represent the beginning of a string, and the $ character to indicate the end of a string. So the expression only matches strings like 004.jpg, but not something23.jpgsomethingelse. I could show you more 😎 stuff like this, but I don’t want to make you faint from awe, hahaha.

The last remaining question is: What are the disadvantages of using regex, if there are any?

Actually, there is only one major disadvantage, which is pretty obvious especially to most of my students ;). You really need to gain some routine to formulate complex regular expressions in little time. Like all cool things in life, mastering regex requires tedious exercise.

For rare situations, you may want to keep in mind that when using regex you rely on predefined matching algorithms. Depending on what specific regex pattern you want to match, and on the programming language you are using, these algorithms can be more or less runtime efficient than hand-coded “if-trees”. For further information on what happens behind the scenes, please refer to Russ Cox’s website on regex matching.

A neat and pretty complete regex tutorial, can be found here:

See you in class,

— Andre M. Maier

Posted in Programming Essentials, Uncategorized | Tagged , , , , , ,

Pointers and References – Part 4 (Hacker Stuff) ;)

Ok, guys … party’s over. Some of you have requested some more in-depth stuff on pointers, so here we go: Function Pointer Arrays.

Depending on your previous programming experience, Function Pointer Arrays (FPA) may seem a bit nerdy to you. Nevertheless, it is basic knowledge to anyone who seriously intends to become a hacker. 😉

FPAs allow you to call functions based on the value of a numeric option. To get a better understanding of what this means, let’s consider this simple and traditional (non FPA) example.

#include <stdio.h>

int add( int a, int b ) {
  return a + b;

int mul( int a, int b ) {
  return a * b;

int main( int argc, char* argv[] ) {

  unsigned int num1, num2, choice;
  printf( "Enter 1st number: " );
  scanf( "%d", &num1 );
  printf( "Enter 2nd number: " );
  scanf( "%d", &num2 );
  printf( "Your choice (0=add, 1=multiply): " );
  scanf( "%d", &choice );

  int result = 0;
  if( choice == 0 ) result = add( num1, num2 );
  if( choice == 1 ) result = mul( num1, num2 );

  printf( "The result is %d.\n", result );

  return 0;


As you can surely see from the code, this program asks the user to enter two numbers (num1 and num2) and a numeric option (choice). Depending on the option’s numeric value, the two numbers will either be added or multiplied. Both operations, addition and multiplication, have been outsourced to two separate functions: add(…) and mul(…). The user’s input is not checked for validity, because I wanted to keep this example to be as simple as possible.

When teaching this subject in class, sometimes I would show this example to my students and ask: “Can anyone of you think of alternative ways to call different functions depending on the value of a numeric option?” The most common answer I’d get from my students is this: “Of course! We could use switch … case statements!”

There is nothing wrong with this answer, but using switchcase is only one alternative way.

What most hobbyists, script kiddies, and first semesters would not come up with is the following solution.

#include <stdio.h>

int add( int a, int b ) {
  return a + b;

int mul( int a, int b ) {
  return a * b;

int main( int argc, char* argv[] ) {

  int (*func_ptr_array[ 2 ]) ( int param1, int param2 );

  func_ptr_array[ 0 ] = add;
  func_ptr_array[ 1 ] = mul;

  unsigned int num1, num2, choice;
  printf( "Enter 1st number: " );
  scanf( "%d", &num1 );
  printf( "Enter 2nd number: " );
  scanf( "%d", &num2 );
  printf( "Your choice (0=add, 1=multiply): " );
  scanf( "%d", &choice );

  int result = (*func_ptr_array[ choice ]) ( num1, num2 );

  printf( "The result is %d.\n", result );

  return 0;


At this point, I always pause my lecture for a moment to allow my students re-hinge their previously dropped jaws. 😉

Now, let’s take a closer look on how this program works.

The lines that cause the option based function call are printed in bold type. In the first of these lines, a function pointer array (FPA) is declared. You can imagine this to be a simple one-dimensional array of pointers, similar to the char* argv[] array in the main() function header.

  int (*func_ptr_array[ 2 ]) ( int param1, int param2 );

The only notable thing about it is this: The pointers we are going to store in this array will point to program code as opposed to data. (In a pure Harvard architecture machine, function pointers refer to what is called instruction memory).

More precisely, the pointers that we are going to store in FPAs are start addresses of functions. The size of the array here in this example is 2, because there are two different functions we want to refer to, add(…) and mul(…).

The data type of the FPA corresponds to the return type of the functions. We also need to tell the compiler that each of the functions has two int parameters.

In the next two lines, we assign the storage locations of the functions to the elements of the previously declared FPA. Note, that similar to array names in C, the function’s names already represent their addresses. So add basically stands for “memory address where the implementation of the add function is located”.

  func_ptr_array[ 0 ] = add;
  func_ptr_array[ 1 ] = mul;

We have now created an array of pointers to both functions. This allows us to “invoke” both elements in pretty much the same way as we invoke functions directly.

int result = (*func_ptr_array[ choice ]) ( num1, num2 );

What needs to be done is to choose a specific element from the FPA, to dereference it, and to use it like a function name. You can pass parameters and retrieve the return value as usual. It is just the function name that has become “dynamic”. The line above can now have two different meanings, depending on the value of choice:

int result = add ( num1, num2 );


int result = mul ( num1, num2 );

Using FPAs allows you to do really fancy stuff. For example, you could use an n-dimensional FPA to invoke functions based on a whole set of options, or you could create a function with more than one implementation (similar to interfaces in OOP). Be aware, however, that function pointers can be as error prone as any other kind of pointers. Especially(!) when working with function pointers, keep your fingers off fiddling with pointer arithmetic … except you’re 200% sure about what you are doing. Jus’ sayin’ …


— Andre M. Maier

Posted in Pointers, Programming Essentials, Uncategorized | Tagged , , ,

A Beginner’s Guide to Object Oriented Programming (2)

Long time no see. I hope you still remember my recent posting on object-oriented programming. Here is part two, which I hope you are interested in. Today, we are going to have a look at inheritance. If you do not feel familiar enough with classes and objects, please refer to my posting “A Beginner’s Guide to Object Oriented Programming (1)”.

What’s inheritance got to do with programming?

In real life scenarios, an object of a specific class may be able to play different roles. Here is an example:

I am a person, and I have all the attributes and methods that are inherent to any person on this planet. I have certain height and weight. I can sleep(), eat(), and speak(). However, I always play a particular role which depends on the situation I’m in. I’m a teacher, husband, paramedic, scuba diver, runner, motorist, air passenger, …

Now, here comes an awesome fact: Each role that I play can be considered a specialized person. I play many roles, but yet I’m still only one single person. (Luckily … because if this wasn’t the case I’d be suffering from some serious mental disorder.) 😉

There is a term computer science which describes this situation: polymorphism (from Greek polymorphia, which means multiple forms).

Note, that whatever role I’m currently playing I still have all the methods and attributes of a person. For example, my teacher alter ego also has a certain height, weight, and all the capabilities (methods) that any other person may have. But what characteristics and abilities are specific to a teacher? You’ll probably agree that a teacher is able to teach, and that a teacher has to teach a certain number of classes per week.

In order to understand, how the situation above is being modelled in object-oriented programming we need to remember this: A class is a blueprint that can be used to build (instantiate) objects. What we are going to do now is to take this blueprint, put it on a xerox machine, and manually scribble in some features to get a new (extended) blueprint. The new blueprint now allows us to build objects that have both the inherited features and the new features. Of course, we can again make a copy of the new blueprint and add even more features.

The following UML class diagram shows a Teacher class that inherits attributes and methods from a class Person.


In computer science terminology, we say that class Teacher has been derived from class Person. Teacher is the superclass of Person, and Person is a subclass of Teacher. Be aware that I just wrote “the superclass” and “a subclass”. Any class should have only one direct superclass, but it may have more than one subclasses. Java and C#, among other programming languages, enforce single inheritance. (C++ does allow multiple inheritance, but if you feel like you need to use it in your code, this is usually the sign of bad OOP design.)

Person may have a superclass, such as Creature. Likewise, we can derive further classes from Teacher. Consider a school’s principal, for example. He is still a teacher, but he has additional skills, abilities, methods, whatever you want to call it. (Yay! At least in OOP class hierarchy, the principal is down below his school’s teachers.) 😉

Benefits of inheritance

One of the benefits of inheritance is pretty obvious: Reusability, which means that you don’t have to write the same (erroneous) code multiple times. Given the fact that a person, a principal, a teacher, and a plumber eat in exactly the same way, it is enough to implement this method only once.

The second benefit I’d like to mention is very similar to what’s going on in the real world. We know how to eat() and we know about the consequences of eating (some people hate ’em). 😉 However, most of us don’t have a clue about what is going on inside the eat() method. We don’t know the “guts”, so to speak. In our daily life, we appreciate objects that we can just use, without worrying about what’s going on inside. This is called Information hiding.

The third major benefit is Extensibility. As I have mentioned above you can easily enhance your model by adding more subclasses, without impacting the functionality of any of the other existing classes.

There are further benefits that you may discover as you gain experience in object oriented programming. Polymorphism, for example, allows you to write methods that accept objects of different types. The following code in Java shows the definition of method which can be invoked by passing a person, a teacher, a principal, or anything else within the class hierarchy below Person.

public static void giveThisPersonSomethingToEat( Person p ) {;

I hope this tiny and incomplete lesson has helped you to understand the very basics of inheritance. If you have further questions, feel free to ask me in my lectures.

— Andre M. Maier

Posted in Programming Essentials, Uncategorized | Tagged , , | Leave a comment

Subroutines, Functions, Methods

Most, if not all, programming languages allow you to create subroutines, functions, or methods, all of which follow the same principle. You can think of them as black-boxes that contain source code to perform a specific task. For example, you could create a function make_coffee() which contains source code that makes coffee. Whenever you feel in need of a steaming hot cup of coffee in your program, you just need to invoke the function.

Now, imagine you were a tiny little person locked up inside this black-box and your job was to make that coffee. Imagine you standing at a Starbucks counter, some customer walks up and says: make_coffee(). What would your immediate reaction be? Would you inquire for more specific details on that order, such as “Tall? Grande? Venti?”, “For here or to go?” and so on?

It is easy to see that a simple command such as make_coffee() is not sufficient for you to start brewing coffee (… apart from the fact that your client must have learned manners by speed-reading 🙂 ).

In computer programming, a function usually cannot “go back asking questions” after it has been invoked. So here we need a clever customer who passes all necessary information within the invocation of the function: make_coffee( “for here”, “venti”, “americano” )

The pieces of information you pass to a function at the very moment of its invocation are called parameters. A function may have zero, one, or many parameters of any data type. I have been using three parameters of String (or char* for C-Programmers) in this example.

function1Let us now take a look at how functions can return a result. As stated above, the function make_coffee(…) will eventually deliver a hot cup of coffee. Any invocation of make_coffee(…) is going to directly represent the function’s return value at runtime. Thus, a function can return only one single result value upon each invocation. If you want a function to return more than one value, e.g. multiple cups of coffee, at a time, you’ll have to make the return type a container that may contain than one item.

The way how functions are defined and invoked basically looks similar in most programming languages. The following example shows how we define a function in C.

function1This function is capable of adding two integer values (a and b). It returns the result value directly.

We can now use (invoke) this function within other functions, whenever we need to add two values.

int result = add( 5, 3 );

Note that the function invocation add( 5, 3 ) directly represents the corresponding return value. This allows us to assign it to an arbitrary variable for further use, e.g. output.

You may also invoke functions within larger formulae. Always keep in mind that the invocation of a function represents its return result.

double z = (1/3.0) - (x*( add( 5, 8 ) + add( 1, 2 )*2 ));

If you want to impress your friends, you can even invoke functions where you pass parameters to functions. This example calculates 3 + 5 + 2 + 7.

int result = add( add( add( 3, 5 ), 2 ), 7 );

Be aware, however, that code like this may significantly decrease both readability and maintainability of your source code.

Have fun!

— Andre M. Maier

Posted in Programming Essentials, Uncategorized | Tagged , , , , , , ,

Swapping Variables

Many algorithms, including the world-famous Euclidean, require to swap the values of two variables. So you probably know the following principle.

int x = 8;
int y = 123;
int tmp = x;
x = y;
y = tmp;

However, do you know how to swap the values of two variables without using a temporary variable? If you do, feel free to skip the rest of this article. If you don’t, you’re either not one of my students, or you haven’t been paying attention due to other, more thrilling activities (shame on you!).

Well, there are basically two ways how to swap variables without using temporary memory.

1. PSM (Primary School Math) 😉

x = x + y;
y = x - y;
x = x - y;


x = x * y;
y = x / y;
x = x / y;

Homework: Find out why this works!

2. XOR (Exclusive-OR)

An old professor of mine used to say: “There is nothing that you can’t do with XOR”, and he proved to be right.

One characteristic of the XOR function is its ability to undo/redo itself. If you do an XOR operation twice you will get back to the initial situation. If you’re not familiar with this, you may want to read The X-Factor (OR The Power of XOR), 2. OTP (One-Time Pad) Encryption first.

So what we’re gonna do in the first line is to “stir” y into x. Variable x now “contains” both x and y. To retrieve the initial value of x, which is to be written to y, we just have to do the same operation again (see second line). y now contains the initial value of x, whereas x still “contains” both values. In the third line, we retrieve the initial value of y from x by performing the same operation yet again (remember that in this line y contains the initial value of x).

x = x ^ y;
y = x ^ y;
x = x ^ y;

Homework: Try to understand the explanation above. 😉

A warning …

… to all those who are planning to use these cool and nerdy swap methods in practical life!

Please, please. Only use the very first method, involving a temporary variable, in your every day programming projects. There are multiple reasons why you shouldn’t use the “cool” methods:

  • Today’s processors have fast accessible caches (beside other technologies) that make it very unlikely for these methods to have higher runtime efficiency.
  • Saving 4-8 byte of runtime memory isn’t worth any disadvantages.
  • You can give someone a heck of a hard time debugging your program, especially if x and y do not contain the actual values but pointers instead. (Think twice: This someone could be you!)

You cannot believe that what you have learned is totally useless? Well, every discipline of science will come up with useless facts once in a while. And it is exactly this what some people (aka nerds) may find so sexy about science. 🙂


— Andre M. Maier

Posted in Bit-Twiddling, Information Theory, Uncategorized | Tagged , , , ,

A Beginner’s Guide to Object Oriented Programming (1)

What’s so fancy about Object Oriented Programming?

Computers have been invented to help real-world people solving real-world problems, even though it seems today that computers actually contribute problems to the real world. Since a plain computer doesn’t know anything about the world beyond its steel case, it takes a programmer to describe a model of the real world to the computer. The programmer’s job is to overcome the semantic gap by translating human language into a programming language.

Suppose you want to calculate the average speed of a car that travels 100 miles in 2 hours. In this case, the model of the real world only consists of a simple mathematical equation. The model will get more complex, once you begin to consider fuel consumption, headwind, tire abrasion, earth rotation, temperature, theory of relativity, and whatever crazy stuff is going on in this universe.

The world that we live in is extremely complex and reality does not consist of mathematical equations or algorithms. Instead, there are objects changing their properties by interacting with each other. There are factories in which engineers use blueprints and machines to build new objects. Engineers can copy blueprints, do some modifications to them, and build objects that are similar to the original ones. Large objects can be composed of smaller objects … I think you get the idea. As early as around 1960 some computer scientists invented a programming paradigm that uses real-world thinking to describe the model: Object Oriented Programming (OOP)

Some basic vocabulary

If you want to learn how to write object-oriented programs you first need to get familiar with some terminology.

A class is the blueprint that describes what properties an object may have and what it is able to do. The properties of an object are called attributes or fields.  “What an object is able to do” is defined in methods. Methods can also be imagined as some kind of “levers” that allow other objects to interact with this object.

The act of “building” a new object from a class is called instantiation. An object is also called an instance of a class, although it is correct to use the word object for objects. 😉

Let’s have a look at a practical example: We are going to design a class which describes the model of a door that can be opened and closed.

Firstly, we need to name the class we are going to program which is easy, because it is simply the noun of what we are going to program: Door

Secondly, we need to define how other objects, Hans for example, can interact with the door. Hans shall be able to open and to close the door. The verbs in the sentence will give you an idea of what the methods are: open() and close(). Note that there are always parentheses attached to a method name.

Finally, we’ll have a look at the attributes of the class. If you think of properties that our door may have, you’ll probably come up with a huge list. The door has a weight, a height, a width, a color, a material, a specific temperature, the state (open or closed) and so on. Always keep in mind, however, that there is no model without limitations. Hence we will focus on those properties that are essential for the model to function. You can often identify them by asking yourself: “Which of the properties may be influenced by methods of the class?” In this example we need just one attribute, isOpen, which can be true (if the door is open) or false (if the door is closed).

A common mistake is to introduce redundant attributes: isOpened and isClosed, in this example. Obviously, a door cannot be open and be closed at the same time. Whenever one of the attributes is modified in a method, the redundant information would have to be updated as well. In larger projects this can result in hardly maintainable source code, a high risk of bugs, and last but not least, waste of runtime resources.

How to describe a class

UML (Unified Modeling Language) provides a variety of diagrams that allow to describe any detail of entire software projects. Classes are described by UML class diagrams. The following class diagram shows our model of a door.

A class diagram consists of three rows. The class name (always bold and centered) is in the first row. The second row contains the attributes, and the third row contains all methods. Both boolean and void are data types. It basically means that isOpen may only have two states, either true or false. void after the methods means that nothing is returned from the method. Don’t worry about this, and don’t worry about the + and the – signs, either. I’m going to explain this in one my next postings on OOP.

So now have fun creating object-oriented models of your car, of your pet, or of your boyfriend or girlfriend. 🙂

See you around,

— Andre M. Maier

Posted in Programming Essentials, Uncategorized | Tagged , ,


In order to declutter my website at school I decided to move all explanatory content over to this blog. I decided to start with an interesting number I have been using in some of my programming assignments. This magic number is the numeric result of a simple problem found by Euclid about 300 BC.

Imagine a line which is described by two line segments a and b.

Euclid’s assignment is to position the point in a way that a/b is equal to (a+b)/a. In plain English. What is the ratio when a over b is the same at total length over a.

Any high school student should be able to solve the problem, because it just involves solving a quadratic equation. (In case you didn’t like math back in high school, here’s how it works 😉

Since negative lengths (probably) do not exist in real life, we may focus on the positive solution which is … tadaa … 1.61803398

So why is this number so damn awesome? Well, the are plenty of reasons.

One reason is that there are a lot of different algorithms that lead exactly to this number. For instance, consider the following sequence of numbers:

1, 1, 2, 3, 5, 8, 13, 21, 34, …

This series is called the “Fibonacci Series”. You might have noticed that each new number is determined by adding the two previous numbers.

So what happens if you choose a random number out of this series and divide it by its predecessor? Let us choose “34”: 34 devided by 21 equals 1.61905 — a number that comes very close to the above solution. If we choose higher numbers from the series we get closer and closer to what is called the “Golden Ratio“.

Ok, I must admit that there are many more other mathematical constants that can  be determined by different seemingly independent algorithms, but here are the really interesting facts about the Golden Ratio:

  • You will find rectangular shaped things (e.g. TV screens, swimming pools) more appealing, if their width/height ratio is about 1.618.
  • The Golden Ratio defines the dimensions of the human profile and even those of our teeth.
  • Many artists (most of them unconsciously) rely on the Golden Ratio to make their pictures look appealing.
  • The shell of certain snails reveal the Golden Ratio (most famous among them is the nautilus pompilius)
  • You will find the Golden Ratio in the growth pattern of certain ferns
  • … and more

Well, so if you meet an ugly person don’t call him or her ugly. Say “Your face seems to have missed the Golden Ratio by a mile.” instead.


— Andre M. Maier

Posted in Math, Programming Assignments, Uncategorized | Tagged , ,