compositor: simplify shaders

Re-work how the shaders and emitted vertices work.  Rather than always
rendering clip-rect sized quads and doing transformation in tex coords
(and requiring the corresponding clipping in frag shader), instead
emit transformed vertices, clipped wrt. dirty region, and use simpler
frag shaders.  Also, split the rendering, so blended surfaces with an
opaque region have the opaque region drawn with blend disabled.  The
result is considerably fewer pixels drawn with blend enabled, and much
fewer cycles in the frag shader.

This requires having some more complex logic to figure out the vertices
of the shape which forms the intersection of the clip rect and the
transformed surface.  Which has perhaps got a few bugs or missing cases,
still (visual glitches in some cases) but at this point more or less is
starting to work.  I think it is at least far enough along to get some
initial review.

The result, on small SoC GPU (omap4/pandaboard) on 1920x1080 display,
for simple stuff like moving windows around, I get 60fps (before 30fps
or less), and pushing YUV buffers for hw decoded 1080p video goes from
~6fps to 30fps, with no drop in framerate for transformed/rotated video
surface.

v1: original
v2: check that perpendicular intersect vertex falls within bounds of
    transformed surface
v3: update w/ comments and fixes from Pekka Paalanen
v4: fix for full surface alpha from Pekka Paalanen, fix compositor-
    wayland build

Signed-off-by: Rob Clark <rob@ti.com>
diff --git a/src/compositor.h b/src/compositor.h
index 070d1ca..7e6220c 100644
--- a/src/compositor.h
+++ b/src/compositor.h
@@ -248,8 +248,6 @@
 	GLint tex_uniforms[3];
 	GLint alpha_uniform;
 	GLint color_uniform;
-	GLint texwidth_uniform;
-	GLint opaque_uniform;
 };
 
 enum {
@@ -319,7 +317,9 @@
 	int idle_time;			/* effective timeout, s */
 
 	/* Repaint state. */
-	struct wl_array vertices, indices;
+	struct wl_array vertices;
+	struct wl_array indices; /* only used in compositor-wayland */
+	struct wl_array vtxcnt;
 	struct weston_plane primary_plane;
 
 	uint32_t focus;