Ok, I finished the altivec benchmark code, with some first test of the strfill() routine (which is actually memset() with a '\0' in the end).
Here is the output of the program:
Code:
$ ./altivectorize -v -s -g --norandom --loops 1000000
Altivec is supported
Verbose mode on
Will do both scalar and vector tests
Will also do glibc tests
loops: 1000000
output file:
#size   arrays  scalar                  glibc                   altivec (Effective bandwidth)
7       599186  0.030 (222.5 MB/s)      0.060 (111.3 MB/s)      0.140 (47.7 MB/s)
13      325000  0.090 (137.8 MB/s)      0.060 (206.6 MB/s)      0.100 (124.0 MB/s)
16      262144  0.090 (169.5 MB/s)      0.050 (305.2 MB/s)      0.080 (190.7 MB/s)
20      209715  0.100 (190.7 MB/s)      0.040 (476.8 MB/s)      0.080 (238.4 MB/s)
27      155344  0.110 (234.1 MB/s)      0.050 (515.0 MB/s)      0.090 (286.1 MB/s)
35      119837  0.120 (278.2 MB/s)      0.050 (667.6 MB/s)      0.070 (476.8 MB/s)
43      97542   0.130 (315.4 MB/s)      0.060 (683.5 MB/s)      0.070 (585.8 MB/s)
54      77672   0.140 (367.8 MB/s)      0.070 (735.7 MB/s)      0.080 (643.7 MB/s)
64      65536   0.150 (406.9 MB/s)      0.060 (1017.3 MB/s)     0.080 (762.9 MB/s)
90      46603   0.180 (476.8 MB/s)      0.080 (1072.9 MB/s)     0.080 (1072.9 MB/s)
128     32768   0.230 (530.7 MB/s)      0.090 (1356.3 MB/s)     0.080 (1525.9 MB/s)
185     22672   0.320 (551.3 MB/s)      0.100 (1764.3 MB/s)     0.100 (1764.3 MB/s)
256     16384   0.400 (610.4 MB/s)      0.130 (1878.0 MB/s)     0.110 (2219.5 MB/s)
347     12087   0.530 (624.4 MB/s)      0.160 (2068.3 MB/s)     0.120 (2757.7 MB/s)
512     8192    0.880 (554.9 MB/s)      0.260 (1878.0 MB/s)     0.150 (3255.2 MB/s)
831     5047    1.930 (410.6 MB/s)      0.410 (1932.9 MB/s)     0.170 (4661.8 MB/s)
2048    2048    3.410 (572.8 MB/s)      0.800 (2441.4 MB/s)     0.260 (7512.0 MB/s)
3981    1053    5.540 (685.3 MB/s)      1.710 (2220.2 MB/s)     0.460 (8253.4 MB/s)
8192    512     11.240 (695.1 MB/s)     3.110 (2512.1 MB/s)     0.790 (9889.2 MB/s)
13488   311     18.690 (688.2 MB/s)     5.580 (2305.2 MB/s)     1.240 (10373.5 MB/s)
16384   256     22.840 (684.1 MB/s)     6.730 (2321.7 MB/s)     1.430 (10926.6 MB/s)
38893   108     65.790 (563.8 MB/s)     20.240 (1832.6 MB/s)    14.860 (2496.0 MB/s)
65536   64      111.540 (560.3 MB/s)    36.530 (1710.9 MB/s)    25.530 (2448.1 MB/s)
105001  40      179.650 (557.4 MB/s)    55.760 (1795.9 MB/s)    40.730 (2458.6 MB/s)
262144  16      456.450 (547.7 MB/s)    149.500 (1672.2 MB/s)   118.930 (2102.1 MB/s)
600000  7       1824.510 (313.6 MB/s)   1528.040 (374.5 MB/s)   779.820 (733.8 MB/s)
1134355 4       4706.650 (229.8 MB/s)   4936.750 (219.1 MB/s)   2651.260 (408.0 MB/s)
2097152 2       9408.000 (212.6 MB/s)   10181.350 (196.4 MB/s)  6009.540 (332.8 MB/s)
And this is for data that gets picked randomly from a large pool, so that the chance of it existing in the cache is minimised. 
Code:
$ ./altivectorize -v -s -g --loops 1000000
Altivec is supported
Verbose mode on
Will do both scalar and vector tests
Will also do glibc tests
loops: 1000000
output file:
#size   arrays  scalar                  glibc                   altivec (Effective bandwidth)
7       599186  0.210 (31.8 MB/s)       0.160 (41.7 MB/s)       0.200 (33.4 MB/s)
13      325000  0.220 (56.4 MB/s)       0.160 (77.5 MB/s)       0.200 (62.0 MB/s)
16      262144  0.600 (25.4 MB/s)       0.690 (22.1 MB/s)       0.560 (27.2 MB/s)
20      209715  0.220 (86.7 MB/s)       0.150 (127.2 MB/s)      0.190 (100.4 MB/s)
27      155344  0.210 (122.6 MB/s)      0.150 (171.7 MB/s)      0.200 (128.7 MB/s)
35      119837  0.390 (85.6 MB/s)       0.170 (196.3 MB/s)      0.170 (196.3 MB/s)
43      97542   0.330 (124.3 MB/s)      0.200 (205.0 MB/s)      0.210 (195.3 MB/s)
54      77672   0.290 (177.6 MB/s)      0.420 (122.6 MB/s)      0.220 (234.1 MB/s)
64      65536   0.940 (64.9 MB/s)       1.150 (53.1 MB/s)       0.950 (64.2 MB/s)
90      46603   0.260 (330.1 MB/s)      0.190 (451.7 MB/s)      0.190 (451.7 MB/s)
128     32768   1.090 (112.0 MB/s)      1.110 (110.0 MB/s)      0.850 (143.6 MB/s)
185     22672   0.370 (476.8 MB/s)      0.220 (802.0 MB/s)      0.190 (928.6 MB/s)
256     16384   1.660 (147.1 MB/s)      2.010 (121.5 MB/s)      1.330 (183.6 MB/s)
347     12087   0.850 (389.3 MB/s)      0.390 (848.5 MB/s)      0.310 (1067.5 MB/s)
512     8192    2.880 (169.5 MB/s)      3.260 (149.8 MB/s)      2.560 (190.7 MB/s)
831     5047    1.680 (471.7 MB/s)      0.660 (1200.8 MB/s)     0.450 (1761.1 MB/s)
2048    2048    9.380 (208.2 MB/s)      9.760 (200.1 MB/s)      5.080 (384.5 MB/s)
3981    1053    6.330 (599.8 MB/s)      1.860 (2041.2 MB/s)     1.070 (3548.2 MB/s)
8192    512     35.400 (220.7 MB/s)     36.160 (216.1 MB/s)     19.630 (398.0 MB/s)
13488   311     28.640 (449.1 MB/s)     15.610 (824.0 MB/s)     7.800 (1649.1 MB/s)
16384   256     70.920 (220.3 MB/s)     72.020 (217.0 MB/s)     38.100 (410.1 MB/s)
38893   108     138.070 (268.6 MB/s)    137.350 (270.0 MB/s)    70.470 (526.3 MB/s)
65536   64      282.470 (221.3 MB/s)    294.320 (212.4 MB/s)    154.810 (403.7 MB/s)
105001  40      405.400 (247.0 MB/s)    397.400 (252.0 MB/s)    204.320 (490.1 MB/s)
262144  16      1105.890 (226.1 MB/s)   1169.290 (213.8 MB/s)   613.710 (407.4 MB/s)
600000  7       2488.380 (230.0 MB/s)   2632.060 (217.4 MB/s)   1361.240 (420.4 MB/s)
1134355 4       4963.660 (217.9 MB/s)   5405.220 (200.1 MB/s)   2860.420 (378.2 MB/s)
2097152 2       9470.490 (211.2 MB/s)   10690.570 (187.1 MB/s)  5541.520 (360.9 MB/s)
It's interesting to see, that even in cases where we don't hit the cache, Altivec is still almost 2x faster. I'll probably create some graphs with this data to post here.
The code is part of the pegasos project in alioth (
http://alioth.debian.org/projects/pegasos/), and available from anonymous cvs right now:
Code:
cvs -z3 -d:pserver:
anonymous@cvs.alioth.debian.org:/cvsroot/pegasos co altivectorize
but today i'll spend some time converting it to svn, so by tomorrow, you should use:
Code:
svn co svn://svn.d-i.alioth.debian.org/svn/pegasos altivectorize
(yes, i know, lame name but after a couple of beers it seemed fine at the time :-/)
Apart from the altivec routines, this benchmark is written so as to autodetect Altivec and use the appropriate routine if available. It compiles also on x86, but of course no Altivec there. 
 
However, it would be useful to see if/how it works on a G3 for example. 
The Altivec detection works in 3 steps:
a) detect if gcc supports -maltivec and -mabi=altivec (compile time)
b) detect altivec.h (compile time)
c) detect if PPC_FEATURE_HAS_ALTIVEC is enabled in AT_HWCAP (run time).
So, comments, suggestions and flames welcome 
 
Konstantinos