(ELEC360)E360-03F-Q01 Solution.pdf

=========================preview======================
(ELEC360)E360-03F-Q01 Solution.pdf
Back to ELEC360 Login to download
======================================================

Hong Kong University of Science and Technology
Department of Electrical & Electronic Engineering

ELEC 360 C Digital Media and Multimedia Applications By Dr. Sam Chiu
Fall 2003 Quiz #1
Name: Solution
ID:
Email:
Answer all questions. Time allowed: 45 minutes
1. Audio Coding and Processing
The GSM (European cellular) standard encodes speech signals in 20 ms frame. Each frame contains 160 samples and is encoded into 260 bits.
(a) What is the sampling rate of the signal before compression? [1 mark]
160 / 0.020= 8 kHz
(b) The original signal is low-pass filtered before sampling. What is the maximum allowable cut-off frequency? [1 mark]
8 kHz / 2= 4 kHz
(c) What is the data rate of the encoded signal? [1 mark]
260 / 0.020 = 13 kbps
(d) The speech signal is usually non-linearly transformed at the transmitter end before quantization, such that small values are enhanced, and large values are compressed. In the following figure, if the slanted dash line represents the case of Vout = Vin, i.e. no transformation, sketch the effect of the non-linear transformation as described.
[1 mark]
Vout Vout = Vin
Vin
(e) Suggest two reasons why the operation described in (d) above is performed. Explain your answer. [2 marks]
Increase dynamic range C large amplitude signals are effectively encoded with larger quantization steps.
Improve SNR C small amplitude signals are effectively encoded with smaller quantization steps.
(f) Additionally, a technique known as noise gating is used to eliminate background noise in the quiet gaps between spoken words. In the following waveform, indicate an appropriate threshold level for effective noise gating. [1 mark]
Signal amplitude
time

(g) The GSM coder has a Mean Opinion Score (MOS) of 3.6. Explain what is meant by MOS. [1 mark]
MOS is a speech coder performance rating. Listeners are asked to classify the output of a speech coder as excellent (5), good (4), fair (3), poor (2) or bad (1).
2. MPEG-1 Audio
MPEG-1 Audio Layer III (more popularly known as MP3) uses both subband coding and perceptual coding. The signal is analyzed by a filter bank of 32 equally spaced subbands that are 750 Hz wide at a sampling rate of 48 kHz.
(a) Explain the relationship between these numbers: 32 subbands, 750 Hz bandwidth, and 48 kHz sampling rate. [1 mark]
Subband sampling rate = 48 kHz / 32 = 1.5 kHz
Subband bandwidth = 1.5 kHz / 2 = 750 Hz

(b) Encoding is performed on a window of 3 frames of 12 samples each. What is the window length in milliseconds? [1 mark]
(1 / 1.5 KHz)
3
12 = 24 ms
(c) MP3 encoding uses both frequency masking and temporal masking in its
psychoacoustic model. Briefly explain the two phenomena with simple sketches. [2 marks]
Frequency masking C A large amplitude signal at a certain frequency renders the smaller amplitude signals at nearby frequencies inaudible.
Temporal masking C A large amplitude signal renders the sma