The ability to perform dynamic migration of multimedia streams between a number of multimedia servers may be used to perform load balancing or to re- cover from failure. This project investigates methods of migrating active media streams between servers without interruption to the playback as percieved by the client. (This project was undertaken as part of my undergraduate degree at Trinity College Dublin. The final report is available for viewing here in PDF format and the presentation in ppt format.)
With no background knowledge in streaming media, discovering a 'high-level' detailed source of technical information on the standards and protocols involved in media delivery is a daunting task online. Standards exist in the form of RFCs, but without resorting to this as an initial introduction it is hard to find an overview of the area without getting locked into reading about one vendors particular platform. 'The technology of video and audio streaming' by David Austerberry eventually provided an introduction to the area, but is a somewhat hap-hazard compilation of material. Other information on the area was taken from the following:
Work on migrating streams has been undertaken by karrer & gross, and related work by Sumit Roy, Bo Shen, Vijay Sundaram, and Raj Kumar looks at migrating streams that are undergoing transcoding. Transport layer migration has been addressed by Alex C. Snoeren, David G. Andersen, and Hari Balakrishnan in the Migrate project and by others in the MTCP project. Review of the RTSP RFC indicates that a server side stream migration cannot take place for a number of reasons in the current architecture (which was the orrigional goal of this project). I do not believe that transport level migration is appropriate as described by the MTCP project (due to lack of fault tollerance) and substantial overhead appears to be introduced by the Migrate implementation for high-level services which must store and transmit a lot of state. In addition the Migrate approach is not suitable for recovery from failed datagram transport.
The three main streaming platforms in use today are being developed by microsoft (windows streaming services/windows media player), apple (quicktime server/player) and realnetworks (universal server/realone player). Of these apple and real have open source projects making the source of their streaming solutions available. Apple provide the quicktime server as darwin streaming server, while real provide a full platform for media delivery including the server, player and encoder. A source code client for quicktime is not provided, however this and appropriate encoding software is available as part of the MPEG4IP project. Thus the choice of platform for use in the project is between either apple or reals solutions.
| Initial ability to compile, install and configure software | Community support for working with the projects code base | Available documentation and source code commenting | |
| Helix DNA | Not quite compile-out-of-the-box: Required download of DXSDK for compiling the player, and the custom build system is initially daunting to use. The supported platforms meant that it took a while to get a combination on which the server and client would work successfully. Installation and configuration were very straightforward. (Installed on windows xp with cv++6.0) | Excellent mailing lists for all the projects, with many developers within real participating. All my questions to the list were promptly replied to by both community members and developers at real. IRC server also available, but mailing list is the quickest way to get help. | Initially only documentation for obtaining+compiling the software was available but that has increased to detailed overviews of particular aspects of the software, the client and server specifications which describe the architecture and use of interfaces in the project. Certain aspects of the code are documented, while others are not -- but the standard coding style and access to developers makes up for this. |
| DarwinSS | Compiled out of the box however installation and configuration took quite some time. Eventually i wrote a new install script which would allow the software all be installed into a single directory tree for ease of access. The admin scripts required further modification to run the GUI configuration utility correctly. (Installed on Redhat linux) | A single mailing list for the server project, which appears to be monitored by developers at apple but who do not actively take part in development or answering development questions. Generally friendly the list was not of much use for solving problems. | Documentation available for writing plugins for the server, but for the actual server core virtually nothing was available describing either architecture -- and virtually no comments anywhere. |
The helix client has support for reconnection to a server, and for failover to an alternate server. These abilities are described in the helix server adminstrators guide [get ref]. Alternate servers are specified in response to a SETUP request from a client. The helix client can accept REDIRECT messages, to redirect the client to an alternative host. This was tested by writing a plugin for the server (The code basically shuts down the first stream, resets the player and negotiates setup for the new stream). I am currently investigating the client implementation for failover and re-connection. It appears that the client randomly selects ONE of the listed alternate servers to connect to incase of failure, thus avoiding an instant load on any one server in the event of failure. From a quick trial, the RealOne player will fail to a second server with a noticable pause (in one event this was ~60seconds, while a second setup resulted in a pause of 3-4 seconds while the new connection was established.) My project is currently looking at how to modify the client to make migration fully 'transparent' to the end user. The open source client when compiled dies on connection failure (i've since fixed this -- more details of the problem here and a re-implementation of the failover code here).
Details of how the client setups the control stream (rtsp) and actual data streams (rtp and rtcp) are not quite straightforward, notes on the setup are here, however i intend to write up a proper description when i get a chance! Currently im trying to determine how/when the renderers are setup, and how to make the rtp transport buffers persistent (so basically decouple them from the actual underlying transport). Other issues that i need to concern myself with is how long it will take to setup a new stream (need to do some testing or find test someone else has done), how to best make use of the persistent buffer in the overall streaming architecture, and think a bit about how this 'transparent' switching fits in with the notion of application level framing in the broad sense (ie. let the application know when there's been a problem).
Having finally determined what i want to do, that its all feasible, and where to modify code in the various parts of the helix client and server ill be implementing for the next few weeks: fixing the reconnect in helix (rewriting it so that its simplified and easier to understand) [DONE], exposing it to a new method in RTSP (current and future semantics of REDIRECT inappropriate for my application) [DONE] and eventually either looking at how to do something funkey with the handoff, or an implementation demo of the new functionality (like dynamic load balancing). An alternative suggestion was to implement a speculative connection based on client buffering activity, to reduce preroll time for streams. Also need to do some work on buffering in the client and possibly some user tests (afterall we're looking for migration thats transparent to the end user!). Another area that needs to be addressed in relation to this is resource naming, and mapping between different names (ie. different filenames on different servers) that represent the same actual media. Aparantly there is some information about this, generally, in the tenenbaum distributed systems book.
An important aspect of this project will involve testing in a 'real' situation. However standard lab conditions and those ive got setup at home (a local lan) dont introduce any/many/enough problems for real testing. Here is a list of some options for simulation:
Two areas for applications of stream migration are server load balancing and for failure recovery. Failure recovery is dealt with within the helix client, and while a speculative migration technique may avoid problems for end users it looks like ill be investigating adding load balancing support. Unlike i initially expected i'll be able to avoid much of the underlying problems of a distributed messaging service by using a library.