Thursday, April 28, 2011

Algorithmic Complexity

Hi all! Today i'll tell you something about algorithm complexity

Without going into details, complexity of algorithm can be described as the behavior of run-time depending on the data sizes, for example
$T(n) = O(n)$
or
$T(n) = \Theta(n)$
or
$T(n) = \Omega(n)$
here $T(n)$ - run-time, $n$ - size of data, $O$, $\Theta$, $\Omega$ - asymptotic complexity.
$O$ - maximum complexity (at worth),
$\Theta$ - average complexity (usually),
$\Omega$ - minimum complexity (at best).

Let me show you an example


#include <cstdio>
int n;
int main(){

     scanf("%d", &n);
     printf("%d", 2 * n * n - n);

     return 0;

}


Complexity of this algorithm is $T(n) = O(1)$. When working with asymptotic complexity we do not pay attention to the constant multiplier. So, if we write:



#include <cstdio>
int n;
int main(){

     scanf("%d", &n);
     printf("%d", 2 * n * n - n);
     printf("%d", 3 * n * n * n - n + 2);
     printf("%d", 7 * n * n - 1);

     return 0;

}


Our complexity is still $O(1)$
Lets say we need to read N numbers and calculate their sum, so our code would be something like:




#include <cstdio>
#define maxn 10100
int n, A[maxn], sum;
int main(){

     sum = 0;

     scanf("%d", &n);
     for (int i = 0; i < n; ++i) scanf("%d", A + i);

     for (int i = 0; i < n; ++i) sum += A[i];
     printf("%d\n", sum);

     return 0;

}


Here our code will do "scanf" n times and "+" n times too, so our complexity is $T(n) = O(n)$
This code uses additional memory, so we have an additional complexity - memory complexity. For our sample it is $M(n) = O(n)$
Lets do this task a bit more effectively:




#include <cstdio>
int n, x, sum;
int main(){

     sum = 0;

     scanf("%d", &n);
     for (int i = 0; i < n; ++i){ 

             scanf("%d", &x);
             sum += x;

     }
     printf("%d\n", sum);

     return 0;

}



So, here our run-time complexity haven't changed (constant multiplier is reduced by about two times), but we aren't using now n elements of memory, so:
$T(n) = O(n)$
$M(n) = O(1)$
Want something a bit harder? Now we need to sort this data, it could be coded with stl function sort or quicksort, both would be $T(n) = \Theta(n log n)$, but lets use this one:



#include <cstdio>
#include <algorithm>

#define maxn 10100
int n, A[maxn];

void shellsort(int * A, int n){

     int incr, i, j;
     for (incr = n / 2; incr > 0; incr /= 2)
         for (i = incr; i < n; ++i)
             for (j = i - incr; j > -1; j -= incr)
                 if (A[j + incr] < A[j])
                    std::swap(A[j], A[j + incr]);
                 else j = -1;

}

int main(){

     sum = 0;

     scanf("%d", &n);
     for (int i = 0; i < n; ++i) scanf("%d", A + i);

     shellsort(A, n);

     for (int i = 0; i < n; ++i) printf("%d ", A[i]);

     return 0;

}


I'm using this sort sometimes, but stl sort is the easiest way. This sort is very interesting, 'cause it's rare, but it is short and fast. So, what about complexity? Three nested loops, you say $O(n^3)$ ? No, i've told it is fast :) This algo hasn't $\Theta$ complexity, 'cause it is strongly depends on the input data itself. So, experimentally, proved:
$T(n) = \Omega(n log n)$
$T(n) = O(n \sqrt n)$
and at average it is closer to $\Omega$

Lets discuss recursion. If you could expand it in a simple iteration then you'll have no problems with complexity of this algorithm. But sometimes it is impossible. Lets back to sorting task, it could be done with mergesort:



#include <cstdio>

#define maxn 1000100

/* mergesort(array, left-range, right-range); sorts array A in range [l; r) */
/* returns number chaoses */
int mergesort(int *, int, int);
/* merge(array, left, right); merges two sorted parts */
/* returns number chaoses */
int merge(int *, int, int);

int n, A[maxn], T[maxn];

int main(){
 
 scanf("%d", &n);
 for (int i = 0; i < n; ++i) scanf("%d", A + i);
 
 int chaos = mergesort(A, 0, n);
 
 for (int i = 0; i < n; ++i) printf("%d ", A[i]);
 printf("\n chaos number: %d\n", chaos);
 
 return 0;
 
}

int mergesort(int * A, int l, int r){
 
 /* one element is sorted already */
 if (l >= r - 1)
  return 0; /* one element can't create a chaos */
 
 int h = (l + r) / 2;
 /* sort and calculate chaos number */
 return mergesort(A, l, h) + mergesort(A, h, r) + merge(A, l, r);
 
}

/* T - buffer */
int merge(int * A, int l, int r){
 
 int i, j, k, h = (l + r) / 2, res = 0;
 
 for (i = l, j = h, k = l; i < h && j < r; ){
  
  if (A[i] < A[j])  /* current pair in right order */
   T[k++] = A[i++]; /* copy it, and increase k - end pointer and i - left pointer */
  else {
   
   T[k++] = A[j++]; /* copy it, and increase k - end pointer and j - right pointer */
   res += h - i;  /* A[j] creates chaoses with A[i], A[i + 1], .. , A[h - 1] */
   
  }
  
 }
 
 /* it's possible, that one part depleted, but in other there are some elements */
 /* just copy them to end (they are sorted and they are maximal) */
 for (; i < h; ++i) T[k++] = A[i++];
 for (; j < r; ++j) T[k++] = A[j++];
 
 /* copy result from buffer to our array */
 for (i = l; i < r; ++i) A[i] = T[i];
 
 return res; /* return chaos number for this merge */
 
}



OK, that's a bit long code. Its logic in merging two already sorted arrays, so, if we use this sort on this parts, and then divide it again and again, until each part has only one element (one element sorted as a definition), we could start to merge this mess, one with one, then two with two, four with four, etc. This method called generally "divide and conquer". So, lets look to code of mergesort, it calls itself two times with data size divided by 2, and call merge one time, when r - l <= 1 (one element case), it returns, so:
$T(1) = O(1)$
$T(n) = 2 T(n / 2) + Tm(n)$
where $Tm(n)$ - run-time of merge, obviously:
$Tm(n) = O(n)$
When solving it, we find that at each recursion level sum of all $O$ would be $O(n)$, but n is dividing each time by 2 until it equals to 1, so number of recursion levels is $log n$, so:
$T(n) = O(n log n)$
In additional, this algorithm allows us to find chaos number. Another algorithm for that is bubblesort $T(n) = O(n ^ 2)$, so mergesort is better for that purpose, but this sort needs twice more memory.

This algorithms was polynomial, but there are NP-complete algorithms. It couldn't be solved without check of all possible solutions or a comparable number of solutions (some kind of optimizations). Any NP-complete problem can be reduced to any another NP-complete problem.
Usually complexity of NP-complete algorithms is:
$T(n) = O(n!)$ - brute force
$T(n) = O(2 ^ n n)$ - dynamic
$T(n) = O(3 ^ n n ^ 2)$ - another dynamic
etc.

Phew, that's all for today.

If you have some ideas on how to improve topic, or you have noticed a mistake, feel free to comment or mail me waterlink000@gmail.com

Best regards
Good luck :)

Monday, April 25, 2011

Let me introduce myself

Here I am. I'm Fedorov Alexey, student from Ukraine.

I enjoy programming for nine years. I wrote a lot of fun programs, participated in school competitions, defended the project in the Junior Academy of Sciences. I'm a three-time winner of the Republican School Olympiad on Informatics (2007, 2008, 2009), i.e. three third-degree diplomas.


After school I have acted in the best university in my city. It was easy - I was out of the contest :) There I was immediately taken under the wing of coach acm. After two weeks of study I went to the Ukrainian Vinnitsa Regional Contest (2009). There were many contests in which I have participated to the present day.

 I have a couple of golden rules in programming. They are:
  • do not use a debugger, 
  • write a legible and readable code, 
  • write effectively 
Of course not always possible to adhere to the first rule. The second rule can be broken when the programmer has to quickly write code. The third rule is special, because the effectiveness of the code depends on many parameters. But more on that later.


I specialize in various algorithms, special technologies and features. My favorite programming language is C++. I programmed in Delphi, C#, Assembler (tasm, nasm, gcc asm), Java.

I prefer not to use development environment for six months. Just good editor with syntax highlight, good console compiler (such as g++) and console debugger (if I still have to violate the first rule; such as gdb). You say, "how to deal with large projects?". I have an answer, but more on that later.


I use Linux, but I do not have anything against Microsoft and other well-known businesses in this area. I generally respect these people for how they could be at the right time in the right place and push the right idea, and thus capture the market.

In the following posts I'll discuss some simple but interesting algorithms, as well as show their implementation.

P.S. 
If you notice a mistake in post, or you see a way to improve a paragraph or entire post or an entire blog, please feedback through comments or mail me waterlink000@gmail.com

That's all for today. Good luck :)