(reposted here, on Sven's advice)
Hi guys,
  I've just completed my first altivec program, a very simple one indeed and I have to say I'm really impressed!
I did an optimized version of a common routine strfill(), quite common (and used in this form in MySQL), which fills a given string with a given char. Here are the benchmarks for both scalar and altivec versions, with different string sizes:
Code:
#size   scalar  altivec
13      5       2
16      6       1
27      12      3
64      23      2
90      29      3
128     43      3
185     59      5
256     81      6
347     111     8
512     164     11
831     277     18
2048    686     40
3981    1316    86
(Note: I used both 'random' sizes and powers of 2, but this didn't seem to have any impact.)
and here is the code that produced this output:
Code:
#include <altivec.h>
#include <stdio.h>
#include <time.h>
// This one was shamelessy stolen and adapted from Apple's Altivec tutorial
vector unsigned char inline vec_ldsplatchar(unsigned char splatchar) {
        vector unsigned char splatmap = vec_lvsl(0, &splatchar);
        vector unsigned char result = vec_lde(0, &splatchar);
        splatmap = vec_splat(splatmap, 0);
        return vec_perm(result, result, splatmap);
}
unsigned char *vec_strfill(unsigned char *s, int len, char p) {
        int i;
        vector unsigned char sm = vec_ldsplatchar(p);
        vector unsigned char *v1 = (vector unsigned char *)s;
        vector unsigned char vec_a;
        for (i=0; i < len-1; i = i+16) {
                vec_a = vec_ld(0, v1);
                vec_a = vec_splat(sm, 0);
                vec_st(vec_a, i, s);
        }
        return s;
}
unsigned char *strfill(unsigned char *s, int len,char fill)
{
          while (len--) *s++ = fill;
            *(s) = '\0';
              return(s);
} /* strfill */
int main( void )
{
        int i, j, k, max, loops = 100000000;
        time_t dt1, dt2, t1, t0;
        int sizes[] = {13,16,27,64,90,128,185,256,347,512,831,2048,3981,8192,10000};
        printf("#size\tscalar\taltivec\n");
        for (k = 0; k < 16; k++) {
                max = sizes[k];
                unsigned char __attribute__ ((aligned(16))) test[max];
                unsigned char __attribute__ ((aligned(16))) splatchar = rand();
                t0 = time(NULL);
                for (j=0; j < loops; j++) {
                        test[max] = '\0';
                        strfill(test, max, splatchar);
                }
                t1 = time(NULL);
                dt1 = t1 - t0;
                t0 = time(NULL);
                for (j=0; j < loops; j++) {
                        vec_strfill(test, max, splatchar);
                }
                t1 = time(NULL);
                dt2 = t1 - t0;
                printf("%ld\t%ld\t%ld\n", max, dt1, dt2);
        }
        return 0;
}
I have to say, this code has bugs and is not very optimized, I'm a newbie when it comes to Altivec. But it did convince me that even a newbie like me can write 
fast Altivec code! I'm posting here for comments and maybe ideas to make it faster, safer etc, so if I've done something exceedingly stupid in this code, please tell me 

Also, I used a simple time() function, as I couldn't get pmon to work for me on 2.6.8.
Thanks to Luca for his excellent comments and pointers! Thanks to Genesi for designing such a nice system as Pegasos 2, the more I use it, the more I like it 
 
Regards 
Konstantinos