12 август 2008, вторник
Today we are going to dive into a real world example of using algorithms + data structures. The task is not very hard, but I think it is very useful example how important is one to have an in-depth knowledge of algorithms and data structures.
What is the problem?
These days a friend of mine asked me the following question: “How can we retrieve the different files (in the terms of different names) from given two directories, including the subdirectory files?” In other words the question is how we can get the files from one directory(or in some of its subdirectories) which doesn't exist in the other and the opposite - the files which are in the second directory(or in some of its subdirectories) and does not exist in the first one.
Yes… I know that you can solve this. You can do it even in 5 minutes using whatever data structure you want which has a contains method (or equivalent). But do you think that this approach is going to be the fastest one? Or a more scientific question – is your algorithmic complexity good enough for the needs of your enterprise application?
In general there is uncountable number of algorithmic tasks out there and most of them have more than one good algorithmic solution. I am going to show you a problem with a possible solution.
The approach is based on one classical algorithm. This algorithm merges two given sorted sequences in a new sorted sequence, which contains the elements from both sequences in linear time. On the following picture, you will see two sorted input sequences. The algorithm to get the result sequence is the following:
1)While both sequences are not empty - get the minimum of the two top elements and put it at the end of the result sequence. Remove the element from the input.
2)Now if there are any elements left in the input, they should be in only one of the input arrays. If there are some element left – add them at the end of the new sequence and remove them from the input.
In our case we first have 0 and 1. 0 < 1 therefore we put 0 in the new sequence. The first input sequence will be 3, 3, 15, 19, 76, 89, 99. We are doing this till the moment when the second sequence (which is shorter than the first one) is over. Than we just add the element 76, 89 and 99 to the end of the result sequence.
What is the modification?
You just need to get your both sequence of files in an ascending sorting manner. If the files on the top of the two sequences are equal, then just remove them. If the files are different, then take the one which is name is “less” in the terms of lexicographical order and put it at the end of the result sequence (Note: Here you should check whether the result sequence doesn’t already ends in this filename. Think of a situation in which this is required.). If one of the sequences is not empty, we get its entire elements and put them at the end of the result sequence again taking in mind the above note.
Which are the data structures?
As we saw, we access only the top elements in the input sequences. This is the major feature of the Stack data structure. The result sequence – we add elements at its end and need to peek the last added element. So we can use for example Queue or Linked List depending on the implementation which we are using.
In this post I have tried to show you one problem which can be solved optimally with an elegant algorithm. I have tried not to be bounded to any programming language and not to be clear enough, because want you guys to think of the rest! This is the way in which I believe one should learn to program – with logical thinking and researching.
Hope you learned something new!