Audio, Image and Video Processing Blog: October 2012

Friday, 26 October 2012

Week 5: 26/10/2012

Lecture:

Today's lecture saw us analysing last weeks mock test. We went over each answer to each question with our lecturer explaining why each answer was either correct or incorrect in regards to the appropriate question. We have been made aware that next week is the proper test.

Lab:

During the lab session we were to take the highest and lowest points of the waves in the sound files from last week, and use these files to find the decibel ratio's in each. I was trying to use the formula's for voltage and such, thinking these were the ones we were meant to use. I spent ages trying to find the appropriate values for the files that were to be used. A lab lecturer tried showing me wiki to explain these, although this was not what I needed. I knew what the values meant, I just needed help to find where these values/properties were located. I am grateful for his help all the same.
My lecturer then came over and said I was using the wrong values and looking at something I really didn't need to be looking at. I was to just be doing a basic calculation using the log and inverse log functions along with the found minimum's and maximum's from the sound waves (See figure 1). This made things a lot easier and I now understand what was being asked of me and how to do it. Hopefully this will make future classes much easier.

Figure 1 - Sound Level and the Decibel

Next week:

Class test (proper).

Friday, 19 October 2012

Week 4: 19/10/2012

Lecture:

In today's lecture the class and I learned a lot about the human ear and how it recognises and processes sound.

The ear is a magnificent 'tool' if you will, that allows us to hear sounds. Where many people take hearing for granted, others have looked into how we are able to hear sounds.

The ear acts as a receiver for sound waves and then begins the complicated process of changing the waves into a form our brain's can/ may understand. To do this, it needs to make use of many parts. (See figure 1)

Figure 1 - Sectional View of the Ear

Our lecturer went on to discuss what operation each part of the ear carried out for us to be able to hear sound.

Sound waves travel through the ear canal to the ear drum. The ear drum then vibrates and the ossicles act like amplifiers in the sense that they amplify the vibrations from the ear drum and the frequencies from these vibrations then travel through the cochlea. As higher frequencies die off more quickly than low frequencies, the cochlea firstly picks up the high frequencies at the beginning of the cochlea. Lower frequencies are picked up further on. Once the cochlea picks up these frquencies, small hairs on the cochlea cause neurons connected to the auditory nerve to fire. These then transmit all the necessary factors of a sound wave to the brain stem where a hierarchy of neural processing begins. The necessary factors referred to are: timing, amplitude and frequency. (See figures 2, 3 and 4)

Figure 2 - The Middle and Inner Ear

Figure 3 - Cochlear Structures

Figure 4 - Frequency Response of the Cochlea

A good website that covers a lot of the information covered to this point is:
http://www.deafnessresearch.org.uk/content/your-hearing/how-you-hear/how-ears-works/

A picture from the above mentioned website is below. (See figure 5)

Figure 5 - Outer, Middle and Inner Ear

On the website are a couple of other useful images and a lot of useful text. It can be used as a good reference for study.

The outer ear may seem like just a basic piece of flesh on either side of one's head, but really it has a job to do just like the rest of the ear: when sound enters the outer ear, the outer ear begins to filter the sound wave and then performs the process of transduction ((genetics) the process of transfering genetic material from one cell to another by a plasmid or bacteriophage. (wordnetweb.princeton.edu/perl/webwn#)) and amplification.

The middle ear begins the process of non-linear compression and then impedance matching. Impedance matching is one of the important functions of middle ear. The middle ear transfers the incoming vibration from the comparatively large, low impedance tympanic membrane to the much smaller, high impedance oval window.

At the point of the inner ear, spectral analysis is carried out and the sound is then transferred to the auditory nerve. (See figure 6)

Figure 6 - Schematic of Auditory Periphery

A video I looked at on YouTube explains quite a bit about the human ear which I found quite useful.

http://www.youtube.com/watch?v=0jyxhozq89g

Below is a picture of the processes the auditory brainstem goes through. (See figure 7)

Figure 7 - Auditory Brainstem (Afferent Processes)

The features of auditory processing are a two channel set of time-domain signals in contiguous and non-linearly spaced frequency bands. It can also distinguish from the signals that pass through the left ear and the right ear, distinguish between high and low frequencies as well as distinguish between timing and intensity information.
At various specialised processing centres in the hierarchy it can re-integrate and re-distribute. (See figure 8)

Figure 8 - Features of Auditory Processing

Below is a picture showing the audible frequency ranges of certain animals. (See figure 9)

Figure 9 - Audible Frequency Range

Below are some details on the normal hearing factors for humans.

Hearing threshold - 0dB SPL = 20?Pa @ 1kHz
Dynamic range - 140dB (Up to pain level)
Frequency range (in air) - 20Hz to 20kHz
Most sensitive frequency range - 2kHz to 4kHz
Frequency discrimination - 0.3% @ 1kHz
Minimum audible range - 1ᴼ
Minimum binaural time difference - 11?ms

Where the frequency discrimination of 0.3% is concerned, this means that most sound that have a different frequency by that difference will still sound the same to the human ear, although the sounds are out of tune.

Only a major frequency discrimination would be recognised by the human ear.

Human hearing covers a range of 20Hz to 20kHz. A lot of these ranges are covered by human speech. (See figure 10)

Figure 10 - Human Hearing and Speech Data

Below is an image that is not really needed to be known for the module, but is still useful in its own right. (See figure 11)

Figure 11 - Threshold of Hearing

The MPEG/MP3 audio coding process uses lossy compression (where data the human would not perceive if it was kept is discarded by the computer to create space and get rid of useless information) as well as the psychoacoustic model (the model of human hearing).

A quote from the lecture - "The use in MP3 of a lossy compression algorithm is designed to greatly reduce the amount of data required to represent the audio recording and still sound like a faithful reproduction of the original uncompressed audio for most listeners. An MP3 file that is created using the setting of 128 kbit /s will result in a file that is about 11 times smaller than the CD file created from the original audio source. An MP3 file can also be constructed at higher or lower bit rates, with higher or lower resulting quality.

The compression works by reducing accuracy of certain parts of sound that are considered to be beyond the auditory resolution ability of most people. This method is commonly referred to as perceptual coding. It uses psychoacoustic models to discard or reduce precision of components less audible to human hearing, and then records the remaining information in an efficient manner." (See figure 12)

Figure 12 - MPEG/MP3 Audio Coding

Lab Session:

In today's lab we opened up a sound file of a soprano voice and noted that it had a duration of 7 seconds.

We then opened up another sound file called "english words" in a package called Adobe Soundbooth CS4.

In Adobe Soundbooth CS4 we messed around with the available functions and edits that we could apply to the sound. We then saved our own copy that we cropped so we could edit it fully.

It edited mine to the point where I had four different words being spoken, but the first and last sounded different to each other and the middle words, but the middle words sounded the same although they looked different on the displayed waves and the Spectral Frequency Display. This thus proves the point of Frequency discrimination - 0.3% @ 1kHz. (See figure 13)

Figure 13 - Snapshot from Adobe Soundbooth CS4

Friday, 12 October 2012

Week 3: 12/10/2012

Lecture

Note for blog readers: I would type out a lot more in my own words, but when there is a picture on the PowerPoint lecture slide, I just put the who slide in the blog because the whole slide is a picture all in its own right, not a slide with words and a picture.

Today we were taught about Digital Signal Processing. This involved learning about digital and analogue signals in respect to sound waves.

The information that the PowerPoint taught us all is on what happens when sound is received from a device (such as a microphone) and then processed by the main device the receiver is connected to. An image was used to represent this by showing what happens on the analogue and digital sides of a typical digital signal processing system. (See figure 1)

Figure 1 - A Typical Signal Processing System

We also learned the basic things needed in the system are input and output filtering, analogue to digital, and digital to analogue conversion and a digital processing unit.
Respectively, these are needed because without input and output, you would not be able to pass in sound or receive it; the system would not be able to process an analogue signal (as digital is 1's and 0's i.e. use of the binary system to show voltage's and other properties of waves) and would not be able to produce an output, as we understand the analogue wave (that is, we perceive it) and the digital processing unit processes the digital signal to understand the sound and all of its properties such as filter, pitch warp, echo etc.

The next part of the PowerPoint presentation went on to explain why we would use digital processing. The three main reasons are: precision, robustness and flexibility.
Precision isn't really described or explained in terms of why it is needed, but what is explained are the factors that can affect the precision of the sound wave and the digital signal processing sequence in relation to it. (See figure 2)
The robustness of digital systems is shown mostly by the advantages it has over an analogue system. These advantages are the digital systems are inherently less susceptible than analogue systems to electrical noise (pick-up) and component tolerance variations. (See figure 3)
The flexibility of digital systems allow easy programmability which in turns allows upgrading and expansion of the processing operations. This can be done without necessarily incurring large scale hardware changes.
Practical systems with desired time varying and/or adaptive characteristics can be constructed.
(See figure 4)

Figure 2 - Precision

Figure 3 - Robustness

Figure 4 - Flexibility

To accomplish all this, a sound card that is functional must be present and being used. (See figure 5)

Figure 5 - Simple Sound Card Architecture

When sampling a sound, the system will take a point at nT seconds where it shall begin sampling and then it will use one full period per sample. The sampling rate is usually the double of the frequency of the human hearing range. (See figures 6 and 7)

Figure 6 - Sampling a Signal

Figure 7 - Sound Card Sampling Rates

Most modern sound cards support a 16 bit word length coding of the quantised sample values. This allows a representation of 2^16 (65536) different signal levels within the input voltage range of the card.
(See figures 8, 9, 10, 11 and 12)

Figure 8 - Quantising the Signal Amplitude

Figure 9 - Coding the Quantised Amplitudes

Figure 10 - Sound Card Word Length

Figure 11 - Comparison of Audio Recording Specifications

Figure 12 - Dynamic Range

Lab

In today's lab session we undertook a mock test of the information we learnt from the first PowerPoint. Quite a few of the questions caught me off-guard. I am just hoping that I have not failed and I am only doubting myself because it was a test and I do not yet know the outcome. I find that I doubt myself quite a lot. But then, if I don't get my hopes up, then I can't be disappointed.

I then done this blog as well as slightly edited another. Shall edit the others more thoroughly however, in my free time.

Friday, 5 October 2012

Week 2: 05/10/2012

Attended lecture

Learned about the relationships between amplitude, time periods, frequency, speed (through different mediums) and wavelength and wave behaviour when they begin to form (e.g. someone speaks, the waves grow larger in area and then reach a destination like an ear or mic, or possibly first reverb from a wall).
Learned about the way waves travel

Worked on task sheet

Answered all questions

One of the questions, I took a complete quote from Wikipedia and showed that it was a quote.

Through the task sheet, I learned more about sound waves over and above what was taught in the lecture, such as decibels.

Below is a link to a page on the Britannica encyclopedia website that explains sound in great detail and refers to a lot from the PowerPoint. It can be a useful resource for information on the topic when needed. It covers the general meaning of sound itself as well as things such as wavelength, amplitude, frequency and periods of waves, as well as other concepts in regard to sound and waves.

http://www.britannica.com/EBchecked/topic/555255/sound