Audio, Image and Video Processing Blog: Week 6: 02/11/2012

Lecture:

Today there was no lecture. Instead, lecture time was dedicated to sitting the class test.

Lab:

The sound now sounds slightly quieter (I think the amplitude may have changed) and the wave now looks compressed in height. The wave still looks roughly the same length.

Compression in regards to sound usually means an increase in density when the sound wave is travelling through a medium such as air

With the use of Google and research sites such as Wikipedia, I have researched the use of audio compressors and what they are meant to do and why.
Compression is usually defined under two categories (from knowledge of SQA Higher Computing)

Where the data that is transmitted is reduced for transmission with loss of some data

The data lost is not usually recognised by the user (sound perceiver). This is why the system gets rid of the data. It is technically useless and takes up needless space/memory in a storafe system

You would more than likely want to use an audio compressor to reduce (compress) the dynamic range by reducing the loudest sounds
The spectral display of the sound Sopran ascendescend is consistent with the waveform. It shows me intensity by the colour or brightness on a frequency versus time axis (See figure 1)
I opened up the english words wav
The spectral display for this file (See figure 2) instantly stands out as completely different from the spectral display on the first file

First of, as there are now pauses in the speech of the recording, there is spaces in-between each part of the spectral display's...fragments if you will.

The colour intensity at the bottom of the spectral display completely differs from the top of the display. The higher up the display you look, the less intense the colour/brightness

I do not think speech is a musical sound. Speech on its own is just a fundamental frequency, whereas musical tones/sounds are multiple harmonics of the fundamental frequency.

On applying the convulsion reverb preset: clean-room-aggressive effect, the sound became slightly distorted in my opinion.
On applying the convulsion reverb preset: roller disco aggressive effect, it sounded to me as if a man was speaking to me under a bridge tunnel made of metal. It was like a reverb that would cause you to want to stop talking in my opinion.
Reverb is created in a room by sound bouncing off of walls over and over again and then going to the perceiver of the sound, where the perceiver will hear the reverb
Reverb is created by a computer on a wav file by creating a copy of the sound a slightly changing the frequency of the sound and then placing it on top of the original sound wave, creating the illusion of a reverb (possibly also through the use of distortion)
Upon use of SpeechTranscript Transcribe, the system got most of the words said correct, but it did however add some words in the text that weren't actually spoken. It basically tried to fit in gaps and think the words it created were what was actually said. It basically used grammatical and basic English logic.

This is not a simple computation. Some system that can perform this task are not "trained" to understand words being said, but rather recognise an individual's voice. This is normally used for security processes.

Systems that recognise multiple voices and are "trained" to recognise words and change them to text usually are pre-programmed with several words and sentences etc.

Microsoft® speech-to-text recognition on most systems was dependent on a single voice and the user to "train" it. The user would be asked to read sentences and give commands to allow the system to understand how the user said certain words and phrases, as well as to give the system to recognise its users voice.

The spectrogram for the original file is not that much different from the effect changed file shown in figure 2. Most of the energy is shown at the bottom on the far right.

Figure 1 - Spectral Display

Figure 2 - Spectral Display 2

Audio, Image and Video Processing Blog