Why Starling (or any other 2D framework on top of Stage3D)?

Let me try to answer a pressing question from the community: Why did Adobe not accelerate the classic display list APIs to support the GPU instead of inventing a new API called Starling?

Well, we have done that (accelerating the classic display list APIsĀ that is) and we have learned a lot from it. In fact we did it twice with two completely different approaches:

Approach #1: Back in Flash Player 10 (early 2008) we introduced ‘wmode=gpu’ which accelerated compositing of display list objects using the GPU. It did so by pre-rendering display list objects on the CPU and uploading them as textures to the GPU where they are then composited using the GPU. It worked in some cases but in the end we discovered that only a handful of sites were using this mode as no one could figure how to create faster content. Worse, in some cases it looked like this was enabled by accident as the site was running much faster in non-GPU mode. Designing content for GPUs is non-obvious as I will outline below. Because of these reasons and because GPU code is generally very expensive to maintain for Adobe we decided to pull that rendering mode from Flash Player 10.3 and let it fall back to ‘wmode=direct’ mode.

Approach #2: On mobile, which includes Android and iOS, we have ‘renderMode=gpu’. Unlike the ‘wmode=gpu’ on the desktop this mode renders vector graphics directly in the GPU using OpenGLES2. This mode is still available to you today on Android and iOS and we see some content using it. Content which is using ‘renderMode=gpu’ successfully sticks to a very small subset of the classic display list APIs which looks eerily close to the subset Starling provides. And yet there is a higher cost overall in the Flash Player than if you would just be using Starling due to the many layers involved to emulate some classic display list features. In short: You are likely better off using Starling going forward for new content.

So what is the problem with using the classic display list APIs? The essence is that the classic display list APIs were designed for a software renderer. It does not easily scale to be pushed to a GPU for faster rendering.

- The classic display list has many legacy features which are tied to the specific way our software rasterizer works. That includes vectors masks and scale-9 for instance. You will see that with Starling you will have to find a different way to get the same effects.

- A lot of other classic display list features can not be easily expressed on a GPU without going through slow and complex code paths and more importantly loss of fidelity. That includes blend modes, filters, some forms of transformations, device text among many others. In some of those cases we have to fall back to software. That makes creating well performing SWF content difficult to say the least. You need to exactly understand what happens under the hood of the Flash Player to get well performing content. Documenting the exact behavior of the Flash Player without access to the actual Flash Player code is very difficult as there are simply too many special cases. That documentation could be nothing more than the actual Flash Player code. And reading a large C++ code base might not be your thing either. ;-)

- GPUs like flat display hierarchies. Deeply nested MovieClips are a big no no. You might think this could be easily optimized behind the scenes. I can tell you that without hints about the original application data structure layout that this is not possible. It’s the classic problem where each additional abstract API layer in an application introduces more entropy and at the end you are unable to figure out the original intent of the application which you need to apply meaningful optimizations. I see too much content where excessive use of nested MovieClips makes it impossible to figure out what the content is actually doing on the screen.

Le me put this into an analogy you might be able to understand better: Let’s say the Flash Player would have no APIs to draw strings or text, only APIs to draw individual characters. Drawing strings would be implemented by some AS3 code. OKĀ fine, but actually drawing individual characters is 10x slower than drawing complete strings for the internal Flash Player code. That means that the Flash Player would have to reverse guess what the string/text was which is expensive and sometimes not possible.

- GPUs like bitmaps. Rendering vectors either has to be done on the CPU which means you incur texture upload costs for each frame or will create a lot of vertex data which is a problem on mobile GPUs (and Intel desktop GPUs ;-) . Rendering gradients has its own challenges as pre-rendering a radial gradient into a bitmap can be faster than using pixel shader code on most GPUs. This seems counter-intuitive but makes sense if you realize that texture fetches are implemented in a dedicated part of the silicon vs. a pixel shader which has to be run in the ALU.

- Mouse events are implemented with perfect hit testing in the classic display list API, i.e. it is based on the actual vector graphics shapes. If you have a circular vector shape as a button a mouse click will not activate that button unless it is within that circle. This makes sense on the desktop where you have a precise mouse cursor but is extremely wasteful on mobile where you really want to deal with simple large rectangular touch areas. Each additional computation cycle for detecting mouse hits increases the perceived lag of a SWF. What’s worse is that if you want to express large touch areas which extend over the graphic representation of the button you would do this by adding another MovieClip with a transparent vector rectangle to the display list which further impacts overall performance.

- The classic display list API is a giant state machine which needs to be evaluated for every frame. Just x/y translating an object can trigger expensive recalculations and re-rendering without you knowing it. The classic example here is cacheAsBitmap which is probably the most misunderstood and misused feature in the Flash runtime. With Starling the state changes from frame to frame are not hidden but plainly visible in ActionScript which means you have a chance to see what is actually going on.

I could go on and on, but I hope this answers some questions of why we are offering Starling.

Long term I hope that most games and multimedia content will move to Stage3D and use the classic display list for what it’s really good at which is to create high fidelity vector graphics on the fly, rendering text, pixel processing any many others. It certainly won’t go away and we will continue to add features and optimize performance. If you have fixed graphics assets it is usually better to bring these in externally as bitmaps and stick with Stage3D.

I strongly believe that with Stage3D and Starling we are way ahead compared to other web technologies who still have to go through the same learning experience we went through over the last 4 or so years.

Leave a comment

2 Comments.