Remote visualization of interactive 3D applications with high quality and low delay is a long-standing goal. One approach to enable the ubiquitous usage of 3D graphics applications also on computational weak end devices is to execute the application on a server and to transmit the audio-visual output as a video stream to the client. In contrast to video broadcast, interactive applications like computer games require extremely low delay in the end to end transmission. In this work, Fraunhofer HHI uses an enhanced H.264 video codec for efficient and low delay video streaming. Since the computational demanding video encoding is executed in parallel to the application, several optimizations have been developed to reduce the computational load. The main idea is to exploit information from the 3D rendering context which is not available for generic video encoding:
Modification of the viewport of the game in order to generate image sequences which fit optimally the client's capabilities.
Based on depth maps and transformation matrices, the current frame is predicted from the previous one, thereby speeding up the usually computational demanding motion search algorithms of generic video encoders. Besides efficient calculation of motion vectors, fast capturing methods for the required additional render context information have been developed.
Color space conversion and sub-sampling (RGB 4:4:4 ? YCbCr 4:2:0) can be executed on GPU, thereby exploiting the parallel processing capabilities of modern GPUs and reducing the data volume which needs to be transmitted from GPU to CPU - additional accelerations by a factor of 10 are achieved by calculating the motion vectors directly on GPU.
Since common video decoders are optimized for reliable play-back with considerable buffering, a video client has been developed, optimized for low-delay using minimum buffering. The optimizations have lead to time savings of up to 30%. For SVGA resolutions (800x600), the delay between server-side image generation and client-side image presentation (including capturing, encoding, localhost RTP transmission, decoding and rendering) takes less than 40 msec on a 2.4 GHz Pentium PC.