CCAG Home


 NetShow Overview

 What's New

 Contents

 Multimedia Basics

 Audio

 Illustrated Audio

 Video

 Encoder

 Codecs

 Hot Topics

 Glossary & Index

Real-Time Encoding and Playback CPU Utilization Study:
Video and Audio Codecs
Contents

Study Summary
Hardware Configuration
Video Encoding Details
Video Playback Details
Video Test Methodology
Audio Encoding Details
Audio Playback Details
Audio Test Methodology
Abstract:

If you produce NetShow content, how often have you asked these questions?

"What machine do I need to create video content for our intranet or for the Internet?" or "I have a P5-166 machine; what type of audio or video content can I encode with this machine and for what bit rate?" or "How powerful a machine is necessary to play back different types of video and audio content?"

Most people that create streaming video and audio content for the Web ask these questions or variations of them. The answers to these questions are not always that simple. Factors such as type of media content, the procedure used for encoding the content, the codecs used for encoding, the configuration of the selected codecs, the bit rate, and the specific processor in the encoding machine all make huge differences in what the answer is to your question.

In order to help answer these questions, a study was undertaken to determine the CPU utilization using the NetShow v 2.0 Real-Time Encoder audio and video encoding and the NetShow v2.0 player for playback. Three major variables were tested: CPU processing power, bit rate, and media type. The objectives of this study were:

  • Determine the CPU utilization across a range of computer processor speeds when encoding content using different audio and video media types across a range of computer processor speeds; for example, low-motion versus high-motion video and voice-only versus music or mixed, music and voice audio.
  • Determine the CPU utilization across a range of bit rates when encoding different audio and video content on computers of varying processor speeds.
  • Determine the CPU utilization when playing back the same content sample across a range of computers with different CPUs.

It must be made clear that this study could be undertaken in many different ways; different hardware configurations for the encoding and playback computers, different media types, different bit rates, and so on. The findings in this study are only valid when considered in light of the test methodology, the hardware configurations, the specific codecs used, and the specific media used as source material. This study does not attempt to conclude that the hardware or software used are recommended as optimal configurations for audio or video encoding. With this data it is hoped that you can answer these questions for your own particular content production needs.

Depending on the level of information you're seeking, you can read the summary information for the overall study or video and audio sections, or you can study details of each section of the study, including factors such as test methods and hardware configurations. Enough said--let's take a look at what we found.


CPU Utilization Summary: up

All data presented in this study should be evaluated only in the context of the study as it was completed. Variations in CPU utilization values are expected if different hardware configurations or media sources are used. Likewise, the findings are relevant only to the specific versions of NetShow and video capture software used in this study. For specific hardware and software configurations, please see "Hardware Configuration."

Also, the procedures used in this testing are not necessarily the only way one can create NetShow video and audio content. The main objective of the test methodology was to use effective and commonly available procedures; however, each producer may have variations to these procedures which work well for them, but would change the outcome of this study.

The following general conclusions can be made from this study. For conclusions specific to video or audio content and codecs, see CPU Utilization Study: Video or CPU Utilization Study: Audio.

  1. The encoding CPU utilization value was not significantly different when running on Windows 95.0 or Windows NT 4.0 Workstation if the computer was a single-processor machine.
  2. The video capture card and associated drivers play a significant part in the findings of this type of study.

Different video capture cards and drivers from this study could significantly change the CPU findings and, in some cases, the conclusions about the specific codec vs. machine or bit rate findings. As you will see in the equipment configuration criteria, all machines used the same video capture card and drivers. The Winnov Videum capture card was considered a base-level capture card, easily affordable and readily available to all content producers. It was selected as a consumer-level capture card, not a high-performance option. Therefore, some of the CPU utilization values observed in this study could possibly be decreased if other NetShow Encoder-compatible video capture cards are used, such as some of the available PCI-based video capture cards.

  1. Directly comparing two or more encoding CPU values does not necessarily provide a correct interpretation as to which content sample is the best image or audio quality or video motion smoothness. For example, encoding CPU values of 40%, 75%, and 100% does not mean that the content produced when the CPU utilization is 40% is the best quality. The key factors in final quality are more closely related to a balance between the codec selected, the configuration of the codec, and the media type, rather than the raw CPU value.
  2. A raw CPU value of 100% doesn't necessarily mean the computer could not encode the content or that the content is unacceptable in quality. Several instances were found where CPU values of 100% resulted in content of an acceptable quality. However, at some point, when the encoding was done on a less powerful machine a CPU value of 100% would produce content at an unacceptable quality.
  3. With some combinations of computer, specific codec format, codec configuration, and media type, an error message is displayed indicating that the CPU processing power has been consumed and the encoding has been terminated. This is the last and final stage of a encoding process consuming 100% of the computer's processing power.
  4. When reference is made to "no significant difference," this is not meant to imply a statistical evaluation of significance. This merely means that two tests did not yield differences in quality of content or that one piece of content could be encoded while another could not be encoded.

Hardware Configuration Criteria: up

The following computers and software versions were used for real-time encoding and playback of video and audio content. 

Processor

Brand

Configuration

Operating System

NetShow

PII-266

P6-200 (Pro)

P5-200 (MMX)

P5-166

P5-133

P5-90

P5-66

Gateway

Gateway

Gateway

Gateway

Gateway

Gateway

Gateway

64 MB of RAM, 2 MB of video RAM

Winnov Videum AV capture card (capture drivers-Win 95 2.0.318 Beta 5)

Seagate SCSI 2 MB hard drive

Sound Blaster 16 audio card

Windows 95

NetShow v2.0 (914) player and real-time encoder

CPU Utilization Study: Video

Summary and Conclusions:

If we consider the objectives this study set out to answer, the following conclusions and recommendations can be made.

Video Encoding:

Video Encoding Details

  1. MPEG-4 video codec is the most effective codec for compressing content with a wide range of motion type from low motion to high motion.
  2. MPEG-4 video codec is the most effective codec for compressing content for a bandwidth range from 28.8 Kbps to 1 Mbps.
  3. H.263 video codec is most effective in compressing low-motion content across the widest range of computer processor speeds.
  4. TrueMotion by Duck video codec is the most CPU-efficient when the targeted bandwidth is 1 Mbps or greater.
  5. TrueMotion by Duck video codec is the only codec tested that compresses video effectively to a bandwidth of 1 Mbps on a computer with a P5-166 or P5-133 processor.
  6. In order to compress low- to high-motion video content across a bandwidth of 28.8 Kbps to 1 Mbps using the MPEG-4 video codec, you need at least a PII-266, a P6-200, or a P5-200 with MMX.
  7. MPEG-4 video codec was much more CPU-intensive during encoding than the H.263 or Duck video codecs; however, this did not negatively impact its ability to compress video content the most effectively.
  8. In order to encode high-motion video at 15 fps and at a CIF frame size, a dual-processor computer (>266 MHz) is recommended.
  9. VoxWare MetaSound produced the best audio quality, but increased the CPU utilization so that only a PII-266 or P6-200 could be used to encode low- and high-motion video content.

 

These graphs summarize the ability to compress varying video motion types across a variety of bandwidths with various computer processor speeds. The quality of image and motion are considered equal for all individual video clips. In other words, a codec not only had to be able to compress the video source to the required settings, but the output had to be of an acceptable quality or it was not included in charts.

 

Video Playback: up

Video Playback Chart

Video Playback Details

  1. Video clips produced on a high-end encoding machine (PII 266) played effectively on the entire range of computers, from a PII-266 to a P5-66.
  2. There was no significant difference in CPU utilization between MPEG-4 and Duck codecs when evaluating content compressed to a target bandwidth of 1 Mbps.
  3. There was no significant difference in CPU utilization when playing high-motion content encoded using MPEG-4 video codec and MPEG Layer-3 (FHG) 8 kbits, 11 kHz or MPEG-4 video codec and VoxWare MetaSound AC10 audio codec.

Test Procedure: Video up

The NetShow Encoder was used on computers of decreasing processing power to encode a live video and audio source into an .asf file. In order to provide consistency between tests on different computers, .asd configuration files were used. A complete description of each file configuration is provided later in the Test Procedure section. This is not meant to be a detailed description of how to encode using NetShow Encoder, rather it is an overview of how the testing was setup to eliminate or at least control many variables that were not being evaluated in this study.

  1. The video source was an S-video tape played through an S-VHS Hi-Fi stereo videocassette deck. An S-video cable was used to connect the S-video out on the VCR to the S-video input on the video capture card.
  2. The CPU utilization value was derived from a graphical representation using the program, System Monitor, available as part of Windows 95. This CPU number is the average value (%) during the entire encoding or playback period.
  3. For this test, all encoding was done with the computer not connected to a network. While this may not be the method which you use to do encoding, it represents a truer value of pure processing power required to encode video under these test conditions.
  4. The definition of low-motion versus high-motion video is arbitrary, but was kept constant for this study. What each producer defines as low-motion might vary, but you should be able to evaluate your own specific video clips relative to the samples used for this study.
  5.  talking head : consistent background and foreground texture and color, minimal motion of head, head about 50% of frame area, and highly visible facial features with voice only
    high motion : rapid scene change, visual transitions, combination of talking heads and high motion race car scenes, high contrast, with both voice and music soundtrack

  6. The audio track for the video portion of this study was kept constant; MPEG Layer-3 (FHG) 8-Kbits, 11-KHz mono was used for both low- and high-motion video clips unless stated otherwise.
  7. Playback was done on each machine using the video clips encoded on the PII 266. That way the quality and CPU utilization was dependent on the playback machine, not variations in different video clips across different machines.
  8. The encoding video and audio input sources are the Winnov Videum card.
  9. The frame rate, frame size, and other configuration settings were defined in .asd files. These were used for live encoding on all machines. The settings were as follows:

28.8 Kbps:


Error correction = on, span = 10 packets, Audio Concealment = Music (Voice only for Low motion)
MPEG Layer-3 8 kBits, 11 kHz audio codec
MPEG-4, H.263, and TrueMotion by Duck video codecs
Frame size = QCIF, 288 x 208, and CIF for MPEG-4 and Duck
Frame size = QCIF and CIF for H.263
Frame rate = 5 fps
IFrame setting = 5 s/IF
Pixel Format = RGBH
Delay Buffer = 5 s
Quality = 50%

56 Kbps:


Error correction = on, span = 10 packets, Audio Concealment = Music (Voice only for Low motion)
MPEG Layer-3 8 kBits, 11 kHz audio codec
MPEG-4, H.263, and TrueMotion by Duck video codecs
Frame size = QCIF, 288 x 208, and CIF for MPEG-4 and Duck
Frame size = QCIF and CIF for H.263
Frame rate = 10 fps
IFrame setting = 5 s/IF
Pixel Format = RGBH
Delay Buffer = 5 s
Quality = 60%

110 Kbps and 1 Mbps:


Error correction = on, span = 10 packets, Audio Concealment = Music (Voice only for Low motion)
MPEG Layer-3 8 kBits, 11 kHz audio codec
MPEG-4, H.263, and TrueMotion by Duck video codecs
Frame size = QCIF, 288 x 208, and CIF for MPEG-4 and Duck
Frame size = QCIF and CIF for H.263
Frame rate = 15 fps
IFrame setting = 5 s/IF
Pixel Format = RGBH
Delay Buffer = 5 s
Quality = 75%


CPU Utilization Study: Audio

Summary and Conclusions:

The following conclusions and recommendations can be made from the following charts or more detailed examination of the audio encoding and playback data.

Audio Encoding Details

VoxWare MetaSound encoding charts

MPEG Layer-3 (FHG) encoding charts

Audio Encoding:

  1. No significant differences in CPU utilization was found on a per machine basis when encoding voice only or mixed (music + voice) audio sources. The more important factor appeared to be the codec selection rather than the type of audio.
  2. The CPU utilization for any given audio codec did not increase with an increasing target bandwidth.
  3. VoxWare MetaSound audio codec consumed the most CPU power, however the overall quality of the audio was better with VoxWare MetaSound than MPEG Layer-3 (FHG).
  4. When encoding audio with VoxWare MetaSound and the CPU power is insufficient (CPU utilization = 100%), the first noticeable effect on the audio is that the end of the audio track is cut off or clipped. The effect might be minimal, only.1 or .2 seconds or several seconds in length. This is most commonly associated with the high-quality and stereo codecs such as AC48 and ACS48. An equivalent cut off or clipping of the audio is not apparent with MPEG Layer-3 (FHG).
  5. The next noticeable effect on the audio track when encoding with VoxWare MetaSound audio codecs and the CPU power is extremely stressed (CPU utilization = 100%+), is that portions of the audio track are dropped during the playback, not just at the end of the audio track.
  6. A P5-90 processor can encode audio using the AC8 and AC10 VoxWare MetaSound codec from 28.8 Kbps to 110 Kbps.
  7. In order to encode audio using all available VoxWare MetaSound codec formats, a P5-200 or greater machine is needed.
  8. The MPEG Layer-3 (FHG) audio codec can effectively be encoded from 28.8 Kbps to 110 Kbps using a P5-66.
  9. The G.723 audio codec exhibited a clear preference for voice only, much more so that either MPEG Layer-3 (FHG) or VoxWare MetaSound.
  10. When a machine is not powerful enough to encode an audio sample at a selected target bandwidth and codec, quality is not the first characteristic to be impacted; the end of the audio clip is cut off. So when testing for quality, be sure to always listen to the end of the clip to see if the entire audio track is present.

 

This graph shows the CPU processing power required to encode different MPEG Layer-3 (FHG) audio codec formats at a given bandwidth. For example, at 56 Kbps, a P5-166 is the minimum machine to encode all available codec formats.

 

This graph indicates the CPU processing power required to encode the VoxWare MetaSound audio codec formats at a given bandwidth. For example, at 56 Kbps, a P5-200 or greater is the minimum machine required to encode all available codec formats.

 

This graph shows the bandwidth of audio clips that can be successfully encoded using MPEG Layer-3 (FHG) audio codec on a per-CPU basis. For example a P5-133, P5-90, and P5-66 are equally effective in encoding audio clips from 28.8 Kbps to 110 Kbps. However, the highest quality formats can't be encoded until a P5-166 is used. MPEG Layer-3 codec doesn't seem to be as dependent on CPU power as the targeted bandwidth.

This graph shows the bandwidth of audio clips that can be successfully encoded using the VoxWare MetaSound audio codec on a per-CPU basis. For example a P5-133 is required to encode audio clips from 28.8 Kbps to 110 Kbps using the AC8 and AC 10 codec format. However, a P5-200 or greater computer is required to encode audio using all the available VoxWare codec formats. VoxWare appears to be much more dependent than MPEG Layer-3 on CPU power than bandwidth.

Audio Playback:

Audio Playback Details

  1. Audio clips produced on a high-end encoding machine (PII-266) played back effectively on all computers through a P5-90.
  2. A P5-66 was able to play all clips except the VoxWare MetaSound AC24 at 110 Kbps. While the audio quality was good the end of the clip was cut off.
  3. The CPU utilization for any given audio codec does not increase with an increasing bandwidth.

Test Procedure: Audio up

Just as with the video encoding portion of this study, the audio was encoded into an .asf file using NetShow Encoder and a live audio source. The same computers were used for the audio portion as for the video encoding. In order to provide consistency between tests on different computers, .asd configuration files were used. A complete description of each file configuration is provided later in this Test Procedure section. This is not meant to be a detailed description of how to encode audio using NetShow Encoder, rather it is an overview of how the testing was setup to eliminate or at least control many variables that were not being evaluated in this study.

  1. The audio source for both voice only and mixed (voice and music) was an S-video tape played through an S-VHS Hi-Fi stereo videocassette deck. An S-video cable was used to connect the S-video out on the VCR to the S-video input on the video capture card.
  2. The CPU utilization value was derived from a graphical representation in the program, System Monitor, available as part of Windows 95. This number is the average value (%) during the entire encoding and playback period.
  3. For this test, all encoding was done with the computer not connected to a network. While this may not be the method which you use to do encoding, it represents a truer value of pure processing power required to encode content under the conditions of this test.
  4. Playback was done on each machine using the audio clips encoded on the PII 266. That way the quality and CPU utilization was dependent on the playback machine, not variations in different audio clips across different machines due to encoding limitations.
  5. The definition of voice only and mixed (voice plus music) is variable, but was kept constant for this study. What each producer defines as voice only might vary in areas such as gender of the speaker, multiple voices, voice pitch and tone, and so on, but you should be able to evaluate your own specific audio clips relative to the samples used for this study.
  6. voice only : male voice only, no background audio
  7. mixed-music + voice : two male voices and music background
  8. The audio track for the video portion of this study was kept constant; MPEG Layer-3 (FHG) 8 kbits, 11 kHz mono was used for both low and high motion video clips unless stated otherwise.
  9. The encoding video and audio input sources are the Winnov Videum card.
  10. The specific configuration settings were defined in .asd files. These were used for live encoding on all machines. The settings were as follows:

28.8 Kbps:


Error correction = on, span = 10 packets, Audio Concealment = Music or Voice as appropriate for audio source.
MPEG Layer-3 audio codec

    8 kBits, 8 kHz
    8 kBits, 11 kHz
    16 kBits, 16 kHz
    20 kBits, 16 kHz

VoxWare MetaSound audio codec

    AC8, 8 kHz
    AC10, 11 kHz
    AC 16, 16 kHz

G.723 audio codec

    5,333 bits/s, 8 kHz
    6,400 bits/s, 8 kHz 

56 Kbps:


Error correction = on, span = 10 packets, Audio Concealment = Music or Voice as appropriate for audio source
MPEG Layer-3 audio codec

    8 kBits, 8 kHz
    8 kBits, 11 kHz
    16 kBits, 16 kHz
    20 kBits, 16 kHz
    24 kBits, 11 kHz and 22 kHz
    48 kBits, 22 kHz

VoxWare MetaSound audio codec

    AC8, 8 kHz
    AC10, 11 kHz
    AC 16, 16 kHz
    AC 24, 22 kHz

110 Kbps and 1 Mbps:


Error correction = on, span = 10 packets, Audio Concealment = Music or Voice as appropriate for audio source
MPEG Layer-3 audio codec

    8 kBits, 8 kHz
    8 kBits, 11 kHz
    16 kBits, 16 kHz
    20 kBits, 16 kHz
    24 kBits, 11 kHz and 22 kHz
    56 kBits, 22 kHz

VoxWare MetaSound audio codec

    AC8, 8 kHz
    AC10, 11 kHz
    AC 16, 16 kHz
    AC 24, 22 kHz

Additional high-quality mono and stereo VoxWare MetaSound codecs were tested, however, these don't currently ship with NetShow v 2.0. These were tested at each bandwidth when available.


Mono

    AC32, 44 kHz
    AC40, 44 kHz
    AC48, 44 kHz

Stereo

    ACS16, 8 kHz
    ACS20, 11 kHz
    ACS32, 16 kHz
    ACS48, 22 kHz
    ACS64, 44 kHz
    ACS80, 44 kHz
    ACS96, 44 kHz


Top of the Pageup