RTP MIDI: An RTP Payload Format for MIDI

By John Lazzaro and John Wawrzynek, CS Division, UC Berkeley.

RTP MIDI

Internet telephony and video-conferencing programs send audio and video over the net using the Real-time Transport Protocol (RTP). RTP is an Internet Engineering Task Force (IETF) standard developed by the Audio-Video Transport working group (AVT).

We have worked within AVT to standardize RTP MIDI, a payload format to send MIDI over networks using RTP. MIDI is a standard for coding the gestures of musical performance -- pressing piano keys, striking drum pads, moving faders, etc).

RFC 4695 normatively defines the RTP MIDI payload format, and RFC 4696 is an implementation guide for RTP MIDI. The RFCs were developed in cooperation with the MIDI Manufacturers Association (MMA) and the Motion Pictures Expert Group (MPEG).

RTP MIDI is able to send MIDI over a "lossy" network (a network that loses packets). To prevent "stuck notes" and other artifacts, RTP MIDI uses a feed-forward resiliency system (the recovery journal) to recover from packet loss.

We anticipate three major application areas for RTP MIDI:

  • MIDI over wired and wireless LANs. RTP MIDI may be used to send real-time MIDI streams over wired and wireless Local Area Networks (LANs). For example, Apple Computer uses RTP MIDI as the transport layer for the MIDI Network Driver that is included in the Mac OS X operating system.
  • Network Musical Performance. VoIP and videoconferencing applications may add support for network musical performance via RTP MIDI. In a network performance, musicians located in different physical locations interact over a network to perform as they would if located in the same room.
  • Content Streaming. Content streams may begin to use MIDI for low-bitrate music coding, perhaps in conjunction with normative sound synthesis methods such as Structured Audio. Applications include Internet broadcasting, multimedia presentations, and telephony audicons and ring tones.

To Learn More

This paper, presented at the 117th AES convention, is a good introduction to how RTP MIDI works, and how it fits into the IETF media protocol stack. The AES paper discusses a protocol that is a snapshot of RTP MIDI as it existed in October 2004.

Implementors should refer to RFC 4695 and RFC 4696 for the final version of RTP MIDI.

A few errors have been found in RFC 4695. See this list of errors that may affect implementors. A draft of the intended replacement document of RFC 4695 that fixes all errors is available here

In network musical performance applications, one cause of concern is the latency between performers. This paper, presented at the NOSSDAV 2001 conference, discusses latency (and other issues) in network musical performances, in the context of an application that uses a proto-version of RTP MIDI as the network transport.

The (unofficial) reference implementation for RTP MIDI is the network stack in sfront, an MPEG 4 Structured Audio decoder. Sfront is BSD-licensed. Download sfront here. If you want to examine the sfront network code (but not run it), we offer a smaller distribution that contains only the network source code (click here to download).

Apple uses RTP MIDI as the transport layer for the MIDI Network Driver that ships in Mac OS X. See this Sound on Sound magazine article for a comprehensive guide to using Apple's MIDI Network Driver. A shorter introduction to the topic is presented in this article.

MidiShare, a realtime operating system for musical applications, includes an RTP MIDI library in its development branch.

References

John Lazzaro and John Wawrzynek (2006).  RTP Payload Format for MIDI. RFC 4695, IETF Proposed Standard Protocol [download]. Also see errata.

John Lazzaro and John Wawrzynek (2006).  An Implementation Guide for RTP MIDI. RFC 4696, IETF Standards-Track (Informative) [download].

J. Lazzaro (2006).  Framing RTP and RTCP Packets over Connection-Oriented Transport. RFC 4571, IETF Proposed Standard Protocol [download].

John Lazzaro and John Wawrzynek (2004). An RTP Payload Format for MIDI. The 117th Convention of the Audio Engineering Society, October 28-31, 2004, San Francisco, CA. [PDF].

John Lazzaro and John Wawrzynek (2001). A Case for Network Musical Performance. The 11th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2001) June 25-26, 2001, Port Jefferson, New York [PDF].