Power Developer - Project

i.MX515 Project

Profiling the i.MX515 GL stack

in category Graphics & 3D
proposed by blu on 30th August 2010 (accepted on 1st September 2010)

[View Full Project]

Profiling the i.MX515 GL stack

posted by blu on 5th September 2010

My experience with the EfikaMX GLES has been very exciting so far, to say the least. From the remarkably feature-complete ES2 and EGL software stacks, to the very intriguing GPU - the machine the size of a DVD case has been firmly holding my curiosity, and making me excited about what the future will deliver on the platform, when the few present hurdles will be in the past.

Speaking of hurdles, the #1 issue in the current GL EfikaMX pipeline is the abnormally-slow path that an already-complete GL frame has to traverse to finally get on the screen. Or IOW, what comes after a eglSwapBuffers() call (sans the drawing part that also takes place then on a tiler). From the OProfile sessions*, it appears that huge amounts of memory are getting moved around (think libc's memcpy) for each frame displayed. Moreover the amount of memcopy workload appears proportional to the active framebuffer size. Now, while there are multiple possible explanations why such a thing could happen, this is something that really has no place in a well-tuned graphics pipeline.

To get a better idea what we lose in the current situation, a simple modification to any rendering loop can be made, so that eglSwapBuffers is (largely) substituted for glFinish - the blocking GL(ES) call that causes the GL pipeline to be executed to framebuffer completion, but no further steps taken - darn useful debug functionality for situations like this one! In a multi-buffering setup (which is what most GL software ever uses), a hypothetical swapFramebuffers could look like:

void swapFramebuffers(const bool skip_show)
{
if (!skip_show)
eglSwapBuffers(display, surface);
else
glFinish();
}

Depending on the amount of time skip_show is set (that is, most of the time) we observe a very curious result from the EfikaMX GL: for sufficiently 'light-draw' frames, the framerate could double, or even triple with this modification, just because we don't take the final step and show the drawn frame on screen. But even for heavy-draw frames (say, fancy shaders), that little trick can add a few frames/s to the framerate. How come? It's really simple.

In scenarios where frames are light, the relative time where the GPU actually does work for us, vis-a-vis the 'housekeeping' work the system (read: mostly CPU) does, is in favor of the latter. By saying 'no, thanks' to the final housekeeping step of showing the frame on screen, we speed up the entire pipeline. Now, the amount of the speedup is proportional to the time said portion of housekeeping work takes. And since on the current EfikaMX software stack this is abnormally big, so is the amount of speedup we obtain.

So, if you want to see how fast your GL frames actually are on the EfikaMX today, employ the above trick. Hey, you might be surprised how fast you could be on the platform - I was! ; )

ps: ntsh-jass has already been modified to allow for a user-specified frame-skip factor.

* OProfile 0.9.6 /w kernel support, kernel 2.6.31.14 from efikamx-10.07.11 branch at gitorious.org.