All times are UTC-06:00




Post new topic  Reply to topic  [ 53 posts ] 
Author Message
 Post subject:
PostPosted: Tue Feb 03, 2009 1:59 pm 
Offline

Joined: Sat Nov 08, 2008 7:27 pm
Posts: 24
Location: Argentina
Neko,

Any idea which build tool are you planning to use?

Freescale usually builds linux and packages with LTIB, but I'd consider OpenEmbedded. OE's comunity is growing big because support most of the ARM machines.


Top
   
 Post subject:
PostPosted: Tue Feb 03, 2009 3:39 pm 
Offline
Site Admin

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1589
Location: Austin, TX
Quote:
Neko,

Any idea which build tool are you planning to use?

Freescale usually builds linux and packages with LTIB, but I'd consider OpenEmbedded. OE's comunity is growing big because support most of the ARM machines.
Whatever you like. Freescale will undoubtedly provide LTIB. The Toshiba board came with a full ARM toolchain in a VMware image and support for TFTP/NFS server for development. I really did not look too hard at the included tools (only enough to make sure the board worked, and I could compile something unique and make the board run what I wrote).

We had a successful project to port OpenEmbedded to the Efika.. porting it to the i.MX515 would be a worthy project (probably a very easy one).

Basically, different people have different ways they want to build root filesystems and flash images.. and it's up to you to find the one you like.

I'm personally looking into doing something with VirtualBox (since it has decent cross-platform support) rather than VMWare, and doing something about packaging cross-compilers for distributions, but I wouldn't know where to start just yet.. installing Linux in a VM is easy, getting a working toolchain etc.. that's harder.

_________________
Matt Sealey


Top
   
 Post subject:
PostPosted: Wed Feb 04, 2009 4:38 pm 
Offline

Joined: Sat Nov 08, 2008 7:27 pm
Posts: 24
Location: Argentina
Mission accomplished ;)
http://www.linuxdevices.com/news/NS6335188160.html


Top
   
PostPosted: Thu Feb 05, 2009 7:29 am 
Offline
Site Admin

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1589
Location: Austin, TX
Guidelines updated
  • Added note about updating profiles. Projects will not be approved which are still using nicknames. We need your full name. Eventually we will actually need an address. Those who may be concerned; we have a Privacy Policy. The full name is so that we can do a little background check - for instance to see your past work if you did not list it explicitly, plus we think you would be more accountable if you weren't hiding your project behind an anonymous handle - and the address ONLY goes into the FedEx API for shipping the i.MX board or for support materials related to the project once approved.
  • Modified the guidelines to include not submitting projects for "maintaining U-Boot" - Freescale provide U-Boot with every development system they have made in the past 5 years, but in this case, the development board will be running the Genesi Firmware which is arguably more advanced.

_________________
Matt Sealey


Top
   
PostPosted: Thu Feb 05, 2009 7:44 am 
Offline

Joined: Mon Jan 08, 2007 3:40 am
Posts: 195
Location: Pinto, Madrid, Spain
Quote:
the development board will be running the Genesi Firmware
YEAH!


Top
   
 Post subject:
PostPosted: Thu Feb 05, 2009 9:43 am 
Offline

Joined: Thu Feb 05, 2009 8:22 am
Posts: 5
Location: Watford, UK
Is there a i.MX515 datasheet? All I can find is the fact sheet on the freescale website (http://www.freescale.com/webapp/sps/sit ... H31143ZrDR).

I was thinking of looking into the 2D acceleration H/W. Starting off implementing a libvo driver under MPlayer the uses the colourspace converter, scaler and blender. Later I might look into doing a xv driver but haven't looked into how much work it would be.


Top
   
 Post subject:
PostPosted: Thu Feb 05, 2009 1:15 pm 
Offline
Site Admin

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1589
Location: Austin, TX
Quote:
Is there a i.MX515 datasheet? All I can find is the fact sheet on the freescale website (http://www.freescale.com/webapp/sps/sit ... H31143ZrDR).

I was thinking of looking into the 2D acceleration H/W. Starting off implementing a libvo driver under MPlayer the uses the colourspace converter, scaler and blender. Later I might look into doing a xv driver but haven't looked into how much work it would be.
There isn't a public datasheet yet; the chip details are still under NDA but we will be providing something when the projects are approved.

The ARM core is actually really capable. You can decode 720p video just using NEON on a TI OMAP; at ~500MHz. The i.MX515 will ship at 1GHz..

NEON docs are hard to get hold of (again only under NDA), but we are working on this too.

_________________
Matt Sealey


Top
   
 Post subject:
PostPosted: Thu Feb 05, 2009 4:57 pm 
Offline

Joined: Thu Feb 05, 2009 8:22 am
Posts: 5
Location: Watford, UK
Quote:
There isn't a public datasheet yet; the chip details are still under NDA but we will be providing something when the projects are approved.
OK, that's good to know, it's always rather difficult trying to write drivers with H/W docs :wink:
Quote:
The ARM core is actually really capable. You can decode 720p video just using NEON on a TI OMAP; at ~500MHz. The i.MX515 will ship at 1GHz..
I already have a OMAP3 based beagleboard :D , it would be interesting to see how they compare (performance wise). I assume you mean the omapfbplay program? That uses the display controller to do the yuv2rgb colour space conversion and scaling. The colour space conversion may be quite optional with the neon co-processor, can't say I've looked at it, but scaling is very CPU intensive (if you want good results), even my old 2GHz laptop can't upscale from 320x240 to 1024x768 with just SW. Besides if you can run the main CPU slower, all the better for your batteries :) .
Quote:
NEON docs are hard to get hold of (again only under NDA), but we are working on this too.
Good news. Hopefully a lot of the work that's going on with optimising mpalyer the OMAP3 can be reused.


Top
   
 Post subject:
PostPosted: Thu Feb 05, 2009 10:36 pm 
Offline

Joined: Sat Nov 08, 2008 7:27 pm
Posts: 24
Location: Argentina
Quote:
The ARM core is actually really capable. You can decode 720p video just using NEON on a TI OMAP; at ~500MHz. The i.MX515 will ship at 1GHz.
Maybe for people with no embedded experience, 1GHz dosen't look too much. But is DAMN alot! I have an iMX31 running at 400Mhz, this SoC has an ARM11 w/ no SIMD.
I made a demo:
http://www.youtube.com/watch?v=qIBE5FNIgWY
That's a QTopia4.3 application, I'm not using any graphic acceleration (SW rendering w/ display set = 800x480, 16bpp, 60Hz), Just think how will run this at 2.5x speed w/ some NEON tunning + GPU!!!


Top
   
 Post subject:
PostPosted: Fri Feb 06, 2009 5:14 am 
Offline

Joined: Thu Feb 05, 2009 8:22 am
Posts: 5
Location: Watford, UK
Quote:
Quote:
The ARM core is actually really capable. You can decode 720p video just using NEON on a TI OMAP; at ~500MHz. The i.MX515 will ship at 1GHz.
Maybe for people with no embedded experience, 1GHz dosen't look too much. But is DAMN alot! I have an iMX31 running at 400Mhz, this SoC has an ARM11 w/ no SIMD.
I made a demo:
http://www.youtube.com/watch?v=qIBE5FNIgWY
That's a QTopia4.3 application, I'm not using any graphic acceleration (SW rendering w/ display set = 800x480, 16bpp, 60Hz), Just think how will run this at 2.5x speed w/ some NEON tunning + GPU!!!
Nice video. I'm by no ways saying the i.MX515 is underpowered, far from it. My point was that where possible it's aways a good idea to use H/W accelerators. The main CPU my be quite capable of doing all the work, but if you have the H/W then you can free the CPU to do other things or clock it down to save power. With H/W accelerators you get stuff done "for free". FYI this is a little demo I put together showing a test SoC (ARM11 @ 350MHz with L2 cache and 2D accelerator) doing some cool stuff http://www.youtube.com/watch?v=SC7PfhpW ... annel_page


Top
   
 Post subject:
PostPosted: Fri Feb 06, 2009 6:32 am 
Offline

Joined: Sat Nov 08, 2008 7:27 pm
Posts: 24
Location: Argentina
Quote:
FYI this is a little demo I put together showing a test SoC (ARM11 @ 350MHz with L2 cache and 2D accelerator) doing some cool stuff http://www.youtube.com/watch?v=SC7PfhpW ... annel_page
Very nice.

btw, Is that a new SoC from NXP?


Top
   
 Post subject:
PostPosted: Fri Feb 06, 2009 8:42 am 
Offline
Site Admin

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1589
Location: Austin, TX
Quote:
Nice video. I'm by no ways saying the i.MX515 is underpowered, far from it. My point was that where possible it's aways a good idea to use H/W accelerators. The main CPU my be quite capable of doing all the work, but if you have the H/W then you can free the CPU to do other things or clock it down to save power.
Ideally the optimization should be to reduce the CPU cycles needed to do it in the first place - using NEON or VFP if it's relevant - and then move those parts that can be further accelerated to the hardware accelerators.

What I'd hate to see is people throwing an MPEG4 decoder up onto the video decoder core, and thinking that is enough, while the rest of the video pipeline is languishing in old scalar code. Once people see it has "hardware accelerated MPEG4" nobody bothers to look at the pipeline.

We got the same effect with Mesa - now that people are spending inordinate amounts of time trying to get it working on graphics cards using every acceleration method they can, they may be missing out on significant optimization opportunities for certain rendering paths which are handled using simple scalar code.

As an example, a lot of binary drivers on Windows do small performance benchmarks on boot, which basically determine if it can produce certain results faster from the SSE2/SSE3 unit than passing it to the graphics card. In those cases, a highly optimized software fallback is used rather than offloading it to the graphics card, in the interests of performance. This comes into it's own when the latest drivers take advantage of CPU multithreading and multicore. You generally don't get to drive a graphics card from two threads and get better performance - the command pipeline is sequential and a lot of sitting around happens during processing.

With the multi-core CPU, multi-core GPU shader/CUDA modules, acceleration features all together, it is usually a mistake to just use one alone, just because it comes for free, and handles a single use case extremely well.
Quote:
With H/W accelerators you get stuff done "for free"
Here's another good example; the guys at Tro^H^H^H QtSoftware have been implementing "raster" and "opengl" rendering modes for the Qt backend, on the basis that a full software fallback or full 3D hardware acceleration is better than using the X Render protocol.

The standard "for free" hardware accelerated compositing engine, which is ironically used by Cairo too (actually X.org and Cairo share a software fallback library, libpixman) is actually slower than a well designed software pipeline..

The technique for gaining extra battery life that works best these days seems to be the race for idle.. get it done on the CPU as fast as you can, so you can sit idle for longer, later. This includes optimizing all the setup and preparation of data before submitting it to the hardware accelerator :)

_________________
Matt Sealey


Top
   
 Post subject:
PostPosted: Sun Feb 08, 2009 4:30 pm 
Offline

Joined: Thu Feb 05, 2009 8:22 am
Posts: 5
Location: Watford, UK
Quote:
Quote:
FYI this is a little demo I put together showing a test SoC (ARM11 @ 350MHz with L2 cache and 2D accelerator) doing some cool stuff http://www.youtube.com/watch?v=SC7PfhpW ... annel_page
Very nice.

btw, Is that a new SoC from NXP?
No, it was a test SoC to verify new IP.


Top
   
 Post subject:
PostPosted: Sun Feb 08, 2009 5:05 pm 
Offline

Joined: Thu Feb 05, 2009 8:22 am
Posts: 5
Location: Watford, UK
Quote:
Quote:
Nice video. I'm by no ways saying the i.MX515 is underpowered, far from it. My point was that where possible it's aways a good idea to use H/W accelerators. The main CPU my be quite capable of doing all the work, but if you have the H/W then you can free the CPU to do other things or clock it down to save power.
Ideally the optimization should be to reduce the CPU cycles needed to do it in the first place - using NEON or VFP if it's relevant - and then move those parts that can be further accelerated to the hardware accelerators.
Sure, and I think people with much better knowledge then me are already working in this area.
Quote:
What I'd hate to see is people throwing an MPEG4 decoder up onto the video decoder core, and thinking that is enough, while the rest of the video pipeline is languishing in old scalar code. Once people see it has "hardware accelerated MPEG4" nobody bothers to look at the pipeline.
Again, I totally agree. There are already many chips out there that can do "H/W accelerated <insert short code list here>". What I want (and am I'm sure I'm not the only one) is a chip that can decode whatever codec I throw at it. Unlike traditional broadcast where the codec is fixed, internet content uses what ever codec is flavor of the month (OK slight exaggeration ;)
Quote:
We got the same effect with Mesa - now that people are spending inordinate amounts of time trying to get it working on graphics cards using every acceleration method they can, they may be missing out on significant optimization opportunities for certain rendering paths which are handled using simple scalar code.

As an example, a lot of binary drivers on Windows do small performance benchmarks on boot, which basically determine if it can produce certain results faster from the SSE2/SSE3 unit than passing it to the graphics card. In those cases, a highly optimized software fallback is used rather than offloading it to the graphics card, in the interests of performance. This comes into it's own when the latest drivers take advantage of CPU multithreading and multicore. You generally don't get to drive a graphics card from two threads and get better performance - the command pipeline is sequential and a lot of sitting around happens during processing.

With the multi-core CPU, multi-core GPU shader/CUDA modules, acceleration features all together, it is usually a mistake to just use one alone, just because it comes for free, and handles a single use case extremely well.
Right so if we can use a H/W scaler but keep the CPU busy starting to decode the next frame then all to the good right?
Quote:
Quote:
With H/W accelerators you get stuff done "for free"
Here's another good example; the guys at Tro^H^H^H QtSoftware have been implementing "raster" and "opengl" rendering modes for the Qt backend, on the basis that a full software fallback or full 3D hardware acceleration is better than using the X Render protocol.

The standard "for free" hardware accelerated compositing engine, which is ironically used by Cairo too (actually X.org and Cairo share a software fallback library, libpixman) is actually slower than a well designed software pipeline..

The technique for gaining extra battery life that works best these days seems to be the race for idle.. get it done on the CPU as fast as you can, so you can sit idle for longer, later. This includes optimizing all the setup and preparation of data before submitting it to the hardware accelerator :)
OK, so you've got to take into account setup and teardown when your looking at H/W vs S/W. Thanks for all your input Neko, I guess I've been making various assumptions that might not be true. My gut feeling says that for video decode if your doing any reasonable amount of scaling (say x1.5 or more) that H/W should win over the CPU although alot will depend on the architecture of the system, but the only way to say for sure is to try both. On the test chip I worked on the H/W made all the difference, it allowed a QVGA 30fps video to be rescaled full screen (800x480) with no extra CPU overhead while running (small setup and teardown cost) where as with the CPU you'd get less then 1/2 fps.


Top
   
 Post subject:
PostPosted: Mon Feb 09, 2009 11:01 am 
Offline
Site Admin

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1589
Location: Austin, TX
Quote:
OK, so you've got to take into account setup and teardown when your looking at H/W vs S/W. Thanks for all your input Neko, I guess I've been making various assumptions that might not be true. My gut feeling says that for video decode if your doing any reasonable amount of scaling (say x1.5 or more) that H/W should win over the CPU although alot will depend on the architecture of the system, but the only way to say for sure is to try both.
Ideally you'd optimize the entire pipeline for a particular codec in software first - then move it into hardware.

If there's anything missing in the video decoder for certain cases, then you are already backing it up with the fastest possible fallback, in that case.

Rather that, than implement video decoding as special cases for particular video types.

In the event that maybe you have to mess with the input to put it into the right format and spacing, this needs to be done as fast as possible to keep the VPU busy, or you'll drop frames for no acceptable reason.

By all means.. use the video decoder. I'm looking at the docs and it's EXTREMELY capable (I deleted popper's post on the subject because he was talking a lot of horse sh..) but as an example, a lot of XviD/DivX videos I've seen around have some really crazy qpel/gmc settings which are wildly out of spec (I don't understand why people don't just stick to the profiles, but even then some of the built-in ones are a bit odd), but people will still expect them to play.

Sometimes they don't even play when encoded with the open-source XviD codec, if you only have the commercial DivX decoder installed. Now that is a weirdly encoded file.. but again, people will still expect them to play :)

We'd all love to have a world where the only way to encode MPEG4, H.264, MPEG2 video was through a point and click GUI that made everything fit a certain format but this is just how software encoders go, unfortunately.

In the end an embedded device, movie player, media center may just restrict input formats or tell the user "tough luck, go read the manual for what I can play", but it helps that the software can make the appropriate adjustments on less embedded targets.

_________________
Matt Sealey


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 53 posts ] 

All times are UTC-06:00


Who is online

Users browsing this forum: No registered users and 13 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
PowerDeveloper.org: Copyright © 2004-2012, Genesi USA, Inc. The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.
All other names and trademarks used are property of their respective owners. Privacy Policy
Powered by phpBB® Forum Software © phpBB Group