Ciao
On the other hand, what makes the book stand out and redeems it as a source of instruction, is the ingenuous presentation of the algorithms and structures - the book is full of small diagrams, slide-shows and charts which help you to understand what's going on, and there is a good proportion of exercises and tangential information to make it worth reading (the example program fragments are also quite good). In general, I would say it looks like a really good all-round book on its subject, but I'd rather have bought the English version.
Speaking of presentation, here is an interesting site I discovered yesterday. It contains a (small) gallery of visualisations of sorting algorithms as well as short annotations. It's quite interesting, especially as it allows you to compare different algorithms and scenarios side by side, but it's disadvantage is that the maximum problem size (50 elements) isn't large enough to allow the asymptotic differences between the algorithms to really kick in.
http://www.sorting-algorithms.com/
Edit: Here are a few more pages with algorithm animations listed by site/author
D. Thiebaut of Smith College - Very interesting visualization, including option to display the executed code or the number of swaps and comparisons
Thomas Baudel - Very good visualization, allows you to select a variety of starting distributions.
Hat Full Of Hollow - Static presentation of small samples (see for yourself)
Morin - Includes some algorithms I haven't heard of before
Staerk - Somewhat similar to sorting-algorithms.com, only with bigger test samples, lower latency and no explanations.
Sorting Contest - The difference in speed between the different algorithms is very obvious here, as well as just how much Bubblesort sucks. Also features bi-tonic sort
Learn the tools of the trade
The debugger is your friend. Source control is your friend. Virtual machines are your friends. Diff and grep are your friends. Regular expressions are your friends. If you know how to use the appropriate tool for the situation you can save yourself a lot of drudgery and avoid repetitive task. Seeing how programming is one part text manipulation, there is a lot to be gained from a well-placed regex or Perl script. Profilers and unit tests are also important, but I don't understand them well enough to discuss them yet.
Don't make your job more difficult
As time progresses, I can identify the most problematic places in the code as the ones I need to reread most often. Making sure your flow control makes sense, is consistent and consistent can mean a lot in the long run, especially when you're adding new features. Here are some examples:
Don't control flow with exceptions or return primitive error codes if you can avoid it. Instead of throwing a different exception from inside the function, slowing down and polluting the external code, or coming up with an ad-hoc scheme which forces you and the maintainer to remember what error-code "3" means, look for something more elegant. For example:
public enum DataRetrievalResult {
DataNotAvailable,
InvalidData,
OK
public enum DateSelectionResult {
NonExistingDate,
DateOutOfRange,
OK
Enumerations like these mean that the calling code becomes much more cleaner and self-documenting, and makes handling different cases elegant. Enums also have other nifty uses; imagine you have a bunch of functions that rely on a certain state to execute properly, and which a) can call each other and b) must restore the pre-execution state upon finishing (for example, opening and closing a file, or preparing and establishing a database connection). Here is a simple solution:
enum ConnectionState {
ConnectionNotEstablished = 0x010,
Connected = 0x100
public void Func1() {
try {
int k = Func2();
//...
finally {
public int Func2() {
try {
finally {
where MakeConnection establishes a connection and returns the state the program was before it was called (in the form of a bitwise union of ConnectionState's), and RestoreConnectionState takes a parameter ConnectionState and returns the program to that state. Clearly when calling Func2 in the brown code, it will not try to re-establish the connection that Func1 created, and its RestoreConnectionState will not destroy the connection, because it was started somewhere else (in Func1;s MakeConnection()).
Prepare for trouble
Trouble happens in code and it's best to prepare for it. You don't want to spend an exhausting afternoon chasing elusive bugs and you certainly do not want to learn that your customers are having problems and you have no idea what's going on. Implementing a centralized, sensible exception-handling system is one of the first thing you need to do when you're creating an application, and making sure you log all exceptions, stack-trace included, somewhere safe so that your customers can email it to you when problems occur. There is a lot to be said about exceptions, bur I don't think I'm qualified enough on the topic to give many opinions, but I'll just repeat - avoid using exceptions to control your program. Exceptions are meant to be unexpected, and if there is an error condition that you cannot prevent with conventional checks (with the exception of OutOfMemory and the like), it means someone somewhere made a mistake.
Don't make your job more difficult n2:
Assume as little as you can about anything. This principle really revealed its importance to me when I started event-driven programming, and probably holds just as strongly in concurrent programming.
- Never assume an event is going to be fired
- Never assume an event is not going to be fired
- Never leave event handlers do the work for you.
For each event-raiser you write, create a flag which will indicate that the event is not meant to be fired. For example, you have an OnTextChanged trigger, which is supposed to fire the event when the user types something. Chances are you do not want it to fire when you are changing the text programmatically, but even if you do, it's much better to do it explicitly, rather than allow an implementation detail (which could change) do it for you:
bool bf_text_changed;
void OnTextChanged(EventArgs e) {
TextChanged(this, e);
Another idea I've found useful is to implement a catch-all flag which determines whether the event was raised by the user or by my own code.
Project Management
Take lots of notes, always carry around a convenient notebook and utilize personal information managers (I use AM Notebook). Nothing irritates your clients more than forgetting to do something you discussed with them at length, and being able to see your all your requirements on a page makes planning that much easier.
...
Of course, none of these guidelines are absolute and I'm sure someone out there disagrees or has a salient comment regarding them. Please feel free to comment if you have something to add/discuss.
Last time, we devised an implementation of the stack structure which does not allow the user to change the contents of its elements any more than the rules of mathematics allow her to change the value of the number 10 through the variable x. To recap, the stack in question was an enumeration of elements which allows the user to get the information it contains without changing the in-memory data. As an added benefit, forking mechanism used to create new Stacks means that two or more stacks instantiated from a common ancestor will share all nodes from the point of forking down. A particular Stack, seen as a sequence of nodes, will exist in memory as long as any variable that points to a Stack constructed above it exists, thanks to the linked implementation; once all references have been popped, the C# or Java Garbage Collector will take care of all nodes which are not longer part of any stacks.
However, our forking implementation also suffers a drawback which results in suboptimal use of memory. Consider
IStack<int> stack_0 = Stack<int>.Empty.Push(0).Push(-1);
IStack<int> stack_1 = stack_0;
IStack<int> stack_2 = stack_0;
//What these queries do is basically push 10, 20, ..., 1000 on both stacks
//The brown code segments are lambda functions
Enumerable.GetRange(1, 1000).Where((i) => i%10==0).Aggregate(stack_1, (s, j) => s = s.Push(j));
Enumerable.GetRange(1, 100).Select((i)=>i*10).Aggregate(stack_2, (s, j) => s = s.Push(j));
So what happens now? We have two identical 100-element Stacks, which were however created from a common parent. Clearly, both those structures contain the same information, and since they cannot be changed, we have a redundancy of 100 elements. Ideally, we would have used stack_1 to instantiate stack_2, thereby making them share 102 instead of just 2 elements (0 and -1). However, this is not always possible. By far the most likely scenario for me is that two threads with reference to the same memory location decide to perform the same sequence of Push()'es on it, which results in awful redundancy; the possibility that the same stack segment occurs randomly is a lot more improbable than two worker threads performing the same task.
The natural solution of this problem is to memoize the results of a particular Push() and give that address to all subsequent requests we consider identical. In the perfect case, we would store each node-value at most one, initializing it the first time it is requested. However, a way to do this while preserving the structural information of each Stack does not immediately occur to me, so let's do it differently - using a separate lookup table for each IStack, which will hold references to all stacks instantiated from it.
In passing, we will also make the IStack<T> interface covariant - i.e. a function which expects a IStack<T> variable will also accept any assignment-compatible type, such as IStack<U> where T < U.
First, let's remove the instance Push() from both the interface and the Stack<T> and EmptyStack<T> implementations (note that the template parameter for EmptyStack is implied by its containing class, Stack<T>
interface IStack<T> : IEnumerable<T>
{
IStack<T> Pop();
bool IsEmpty
{
static class Stack<T>
{//..
{
throw new InvalidOperationException("Empty stack");
}
public T Peek() {
throw new InvalidOperationException("Empty stack");
}
public bool IsEmpty {
static EmptyStack _empty;
public IStack<T> Empty {
private readonly IStack<T> _tail;
private readonly T _head;
public T Peek() {
_tail = tail;
//...
//...
public static Stack<T> Push(T head, IStack<T> tail) {
What's going on? Well, quite simply, given a Stack of some type, say Middle, we can only do
IStack<Bottom> st1 = Stack<Bottom>.Push(new Bottom(), Stack<Bottom>.Empty);
st1 = Stack<Bottom>.Push(new Bottom(), st1);
IStack<Middle> st2 = Stack<Middle>.Push(new Bottom(), st1);
st2 = Stack<Middle>.Push(new Middle(), st2);
st2 = Stack<Middle>.Push(new Bottom2(), st2);
All the nodes, excuse me, stacks, which used to be Stack<Bottom> are still Stack<Bottom>. The type of the reference we have, however, has expanded to Stack<Middle> and now we can treat st2 as a stack of Middles, which, provided Bottom is indeed substitutible for Middle, is perfectly ok, even though the tail is not of the same type.
On the other hand, were we using the old instance method for Push(), the following would be impossible:
IStack<Middle> st_1 = Stack<Bottom>.Empty.Push(new Bottom());
st_1 = st_1.Push(new Bottom2());
In this case, assuming as before that Bottom and Bottom2 are siblings, this is what would happen in the constructor:
) {
_tail = tail;
Having the construction as a type-method rather than an instance-method allows us to expand the type of the structure.
And now for the memoization
private object lock1;
public IStack<T> Push(T elem) {
This code ensures that each Stack<T> object can only create a single new Stack<T> for a value of T, and since all Stacks of T stem from Stack<T>.Empty, we can never have two identical stacks who do not share their tail. Note the synchronization structures (in brown) which make sure we get predictable behavior in multi-thread scenarios.
Think about it: you don't need to worry whether the reference you have points to null, to an object which is in the middle of being modified or in an unacceptable state; you aren't passing a reference, but a definition, which is timeless and immutable; different threads or processes can compose and derive from your definition, but what you have remains the same. Seeing how concurrent execution constantly grows in importance, these are desirable features to have, as well as some others ( see if you can guess ).
One object-oriented language that has been adopting functional constructs is C#, Microsoft's derivation of C++ and Java. In a series of thoroughly enjoyable posts, their compiler engineer Eric Lippert discusses the FP approach towards data structures and how it can be implemented in contemporary C#. This discussion is going to be based on his treatment of the subject in the beginning, but I may have some more to say about it myself. I'm going to use C Sharp for the code samples, and while there won't be anything too sophisticated, I'm not going to use baby-talk either. A reasonable understanding of Java or C Sharp and some googling will get you through with no problems.
The precise term to refer to structures which are defined, rather than built, is immutable - which cannot be changed once defined; an example of this is the C#'s String type - every operation over a string produces a new instance without changing the original. However, when working with complex and potentially large structures such as stacks, the copying approach is too inefficient. The answer to this lies in the definition of the data structure - instead of instantiating a new copy with the applied, we produce a new definition. Let us describe this process:
1. Define an ADT's structure:
Stack of T = <empty> | ( <instance of T> and (Stack of T) )
2. Define its behavior:
//<T> is equivalent to "of T" for those who aren't used to generics
//Deriving IEnumerable<T> means we can use the stack object inside a foreach statement or similar constructs
interface IStack<T> : IEnumerable<T>
{
IStack<T> Pop();
IStack<T> Push(T elem);
bool IsEmpty
{
As you can see. Pop and Push
3. Implement empty singleton
static class Stack<T>
{
private static class EmptyStack : IStack<T>
{
throw new InvalidOperationException("Empty stack");
}
public T Peek() {
throw new InvalidOperationException("Empty stack");
}
public IStack<T> Push(T value) {
return new Stack<T>(value, this);
//IEnumerable<T> - don't worry if you don't understand the implementation, it does just
//what we said it should - present the stack as a sequence of values ( empty sequence in
//this case )
4. Now for the real stack
{
//...
//The real stack
public bool IsEmpty {
static EmptyStack _empty; //one object for each value of T we use
public IStack<T> Empty {
private readonly IStack<T> _tail; //readonly means they cannot be changed outside the constructor
private readonly T _head;
public T Peek() {
_tail = tail;
public IStack<T> Pop() {
public IStack<T> Push(T elem) {
//IEnumerable<T>
//See if you can follow the logic of the for loop
for (IStack<T> stack = this; !stack.IsEmpty; stack = stack.Pop()) {
yield return stack.Peek();
}
}
IEnumerator IEnumerable.GetEnumerator() { return this.GetEnumerator(); }
As you remember, the key concept here is the ability to define a "stack" using another "stack". In this case, an instance of this class is nothing more than a single value of T that can be inspected using Peek()*, and a reference to another "stacky" object - which might turn out to be the the EmptyStack, or indeed any other object which conforms to IStack<T>, including a normal mutable stack**. The upshot of this is that we cannot modify an instance of a this class - we can make it produce another one or to give a reference to its tail***, but the object behind the reference will always be the same thing. You can ask aStack.IsEmpty and you will know that the reference aStack points to a non-empty stack until you assign it to something else, no matter what context changes take place and what other entities do - nobody can touch it.
(*) As you see, the retrieval and removal operations have been implemented as different methods; since neither of those has any side effects ( and no method in an immutable class should, anyway ) we are not concerned whether the operation is atomic. Since the operations which would otherwise mutate the object now need to return a new one, it feels more natural to define separate methods for them and the rest of the class' behavior. Whether or not this is a benefit is up to each programmer
(**) Which would be bad
(***) Hopefully, also immutable
Here is an the intended usage pattern of this class:
//.....
IStack<int> aStack = Stack<int>.Empty; //The constructor is for private use only so we start from another stack
aStack = aStack.Push(10);
//Even though the reference has been modified, the object it used to address is still the same; all other references //anyone may have to it still mean the same thing.
aStack.Push(10).Push(12).Push(100);
//Both the reference and the object are the same
//The (s,n) => s.Push(n) lambda includes implicit assignment.
Enumerable.GetRange(1,1000).Aggregate(aS
//We now have
IStack<int> bStack = aStack.Push(5).Push(-100).Push(int.Max);
//aStack is unchanged; bStack is defined by 1004 small IStack<int> objects, 1001 of which are shared with //aStack.
//In some scenarios, chaining definitions instead of copying the whole object can yield a substantial memory //benefit.
foreach(int ii in bStack){
Console.Write(" ");
//10 1 2 3 4 5 6 7 8 9 10 11....
The next post will proceed to discuss some of the implications of using immutable data structures and examine more ADT implementations.
---
As an aside, did you notice that all stacks of a particular type have the same final member ?
Word 2007
RTF
Hopefully by the time I finish part 2, I will have found a normal blogging site that supports code formatting.
$regex =~ qr/(?{$i = 0})
(?:
(?{$j = 0})
(?(?{$j < $i})
(?:\w(?{$j = $j+1}) |
[^\w\n]) |
(?!))*
(\w)
[^\n]*\n?
(?{$match .= $1;
$i++})
)+/x;
This Perl regex illustrates how regular expressions correspond to traditional programming constructs. Here's what's going on:
(?{$i = 0}) #Setting a temporary variable $i which hold the row number
(?: #The (?: ... )+ construct is a loop over all rows
#We're inside the row now
(?{$j = 0}) #The temporary variable $j holds the letter number
(?(?{$j < $i}) #Condition: if the letter number is lower than the row...
(?:\w(?{$j = $j+1}) | #Either match a letter and increment $i
[^\w\n] #or match a non-letter, non-EOL character
)| #Else...
(?!) #Fail the match
)* #Do this as long as you can
(\w) #Then store the next letter in $1
[^\n]*\n? #Carve to the end of the line
(?{$match .= $1; #Append the letter to the string $match
$i++}) #And increment the line-variable
)+ #The closing parenthesis of the loop
/x; #Use the free-form flag; i.e. the pattern spaces and newlines are ignored.
Note that in [^\n]*\n? the question mark denotes optional new-line; if it were omitted, the regex would not match the last line of text and would return one less letter. Also, the (?!) (always fail) assertion is not really needed, since the conditional will cause the regex to fail eventually, but I left it for clarity's sake.
Here's an example usage:
open(INPUT, "c:/users/fairweather/desktop/input.txt");
$string = join "", <INPUT>;
$string =~ m/(?{$i = 0})(?:
(?{$j = 0})
(?(?{$j < $i})(?:\w(?{$j = $j+1})|[^\w\n]))*
(\w)
[^\n]*\n?
(?{$match .= $1;
$i++})
)+/x;
print $match;
Supposing input.txt contains Poe's Valentine, the output is:
Francessargentosgood
Which just so happens to be the girl he had written the poem for.