Line 3: Line 3:
 
-----
 
-----
 
==Topic Background==
 
==Topic Background==
 +
Like most systems in the real world, the human vocal tract can be broken down and modeled as a system that takes real inputs (vibration of vocal cords) and transforms them (via the vocal tract) into real outputs (human speech).
 +
 +
The vocal system itself resonates certain frequencies, called formant frequencies, depending on how a person changes the shape of their vocal tract in order to pronounce certain phonemes, or "building blocks" of sounds.
 +
 +
This makes sense logically, as our brains classify different phonemes and speech to be  different ranges of frequencies in order to process speech signals, otherwise all sounds would mean the same thing!
 +
 +
In order for our brains to have universally recognized phonemes, it follows that a phoneme is roughly the same frequency for everyone, e.g. "you" sounds like "yoo" for everyone (within the same language, that's an entirely different topic).
 +
 +
Interestingly enough, this implies that the variations between individual pitch caused by differences in vocal cord size (which affects the frequency of the vocal folds when voicing signals) does NOT affect the resonant frequencies of the vocal tract.
 +
 +
Think of the vocal tract as a garden hose with a variable-spray nozzle. The water pressure/flow is your vocal cords applying an input to your nozzle, or vocal tract. Your nozzle (vocal tract) then selects which spray settings it wants to shape the water flow into before it spreads out from the end of the hose, which is seen the same by both yourself and the outside world, the same as your speech.
 +
 
 
-----
 
-----
 
==Audio Files==
 
==Audio Files==
Custom-recorded, 16kHz sampling rate
+
Here are some examples of voiced phonemes that will be analyzed for their formant frequencies:
 +
 
 +
-Custom-recorded, 16kHz sampling rate
  
 
Me saying "a" :[[Media:Vowels_voiced_a.wav]] ("ay")
 
Me saying "a" :[[Media:Vowels_voiced_a.wav]] ("ay")

Revision as of 12:18, 23 November 2019

Formant Analysis Using Wideband Spectographic Representation

By user:BJH (Brandon Henman)


Topic Background

Like most systems in the real world, the human vocal tract can be broken down and modeled as a system that takes real inputs (vibration of vocal cords) and transforms them (via the vocal tract) into real outputs (human speech).

The vocal system itself resonates certain frequencies, called formant frequencies, depending on how a person changes the shape of their vocal tract in order to pronounce certain phonemes, or "building blocks" of sounds.

This makes sense logically, as our brains classify different phonemes and speech to be different ranges of frequencies in order to process speech signals, otherwise all sounds would mean the same thing!

In order for our brains to have universally recognized phonemes, it follows that a phoneme is roughly the same frequency for everyone, e.g. "you" sounds like "yoo" for everyone (within the same language, that's an entirely different topic).

Interestingly enough, this implies that the variations between individual pitch caused by differences in vocal cord size (which affects the frequency of the vocal folds when voicing signals) does NOT affect the resonant frequencies of the vocal tract.

Think of the vocal tract as a garden hose with a variable-spray nozzle. The water pressure/flow is your vocal cords applying an input to your nozzle, or vocal tract. Your nozzle (vocal tract) then selects which spray settings it wants to shape the water flow into before it spreads out from the end of the hose, which is seen the same by both yourself and the outside world, the same as your speech.


Audio Files

Here are some examples of voiced phonemes that will be analyzed for their formant frequencies:

-Custom-recorded, 16kHz sampling rate

Me saying "a" :Media:Vowels_voiced_a.wav ("ay")

Me saying "e" :Media:Vowels_voiced_e.wav ("ee")

Me saying "i" :Media:Vowels_voiced_i.wav ("eye")

Me saying "o" :Media:Vowels_voiced_o.wav ("oh")

Me saying "u" :Media:Vowels_voiced_u.wav ("yoo")

Time Domain Representation of Recorded Signals:

SpeechSignals.png


Methodology


Matlab Code


Spectogram Results & Formant Analysis

Wideband Spectogram of A:

WbSpec a.png

Wideband Spectogram of E:

WbSpec e.png

Wideband Spectogram of I:

WbSpec i.png

Wideband Spectogram of O:

WbSpec o.png

Wideband Spectogram of U:

WbSpec u.png


Summary


Alumni Liaison

Ph.D. 2007, working on developing cool imaging technologies for digital cameras, camera phones, and video surveillance cameras.

Buyue Zhang