Line 1: Line 1:
 +
=SupplementalSpeech_prelecture=
  
 +
Due to a kiwi server fail, my pre-lecture notes are not as substantial as I would have liked See my post-lecture notes for a more detailed description.
  
=SupplementalSpeech_prelecture=
+
    * The server failed??? When?? Zach do you know anything about this? --Mboutin 19:45, 3 November 2009 (UTC)
 +
    * It was from around 2pm till about 5:30pm Tuesday. When I tried to preview my page that I had started writing, it said something like "Server not available." --Pclay
 +
    * We will look into this. Thanks for the detailed info Peter! --Mboutin 13:33, 4 November 2009 (UTC)
 +
 
 +
 
 +
Notes for speech lecture:
 +
 
 +
Structure: -> Basic speech stuff (pipes, fricatives) -> Voiced vs. Unvoiced
 +
 
 +
1) avg power
 +
2) zero crossing
 +
 
 +
-> x(t) -> v(t) => s(t) = conv( x(t), v(t) )
 +
 
 +
periodic filter  phoneme
 +
pulse
 +
train
 +
 
 +
-> Model vocal tract as a series of tubes
 +
 
 +
- Going through tube delays the signal (show function)
 +
- between tubes (show function)
 +
 
 +
+ This model leads to a transfer function -> Transfer function V(d)
 +
 
 +
Since the vocal tract is a cavity that resonates, it amplifies certain frequencies
 +
X(f) is sum(a_k * delta(f-kf_a))
 +
 
 +
This frequencies, which are the local maxes of |S(f)| are called formants
 +
 
 +
- Generally, the vocal tract transfer function is an all-pole filter
 +
  where a real pole or a complex pole pair correspond to a resonance.
 +
- Also, if you are given a z-model, F = theta / (2*pi*T) where T is
 +
  the sampling period. (same thing as wT = theta
 +
 
 +
- zeros, anti-resonances, of the transfer function will occur when there is no
 +
  measurable output (i.e. Nasals and Fricatives) 
 +
- Nasal => output from the mouth is zero
 +
  Fricatives/stop consonants => blockage behind source is infinite (forcing air
 +
  through constriction)
 +
 
  
 +
-> Spectrograms
  
 +
  - Models frequency vs. time
 +
  - Use a short-time DTFT to obtain useful info about an utterance
 +
    X_m(e^jw) = sum( x(n)w(n-m)e^(-jwn))
 +
  - wideband uses window length = one period
 +
    - high time resolution, low freq
 +
    - striations due to energy variation
 +
  - narrowband captures several periods
 +
    - high freq, low time
 +
    - striations correspond to peaks in frequency spectrum.
  
Put your content here . . .
+
The formants correspond to the dark bands.
  
 +
-> How to read a spectrogram by Rob Hagiwara
  
 +
  http://home.cc.umanitoba.ca/~robh/howto.html
  
  
 
[[ Student summary speech|Back to Student summary speech]]
 
[[ Student summary speech|Back to Student summary speech]]

Revision as of 16:54, 14 November 2009

SupplementalSpeech_prelecture

Due to a kiwi server fail, my pre-lecture notes are not as substantial as I would have liked See my post-lecture notes for a more detailed description.

   * The server failed??? When?? Zach do you know anything about this? --Mboutin 19:45, 3 November 2009 (UTC)
   * It was from around 2pm till about 5:30pm Tuesday. When I tried to preview my page that I had started writing, it said something like "Server not available." --Pclay
   * We will look into this. Thanks for the detailed info Peter! --Mboutin 13:33, 4 November 2009 (UTC) 


Notes for speech lecture:

Structure: -> Basic speech stuff (pipes, fricatives) -> Voiced vs. Unvoiced

1) avg power 2) zero crossing

-> x(t) -> v(t) => s(t) = conv( x(t), v(t) )

periodic filter  phoneme
pulse 
train

-> Model vocal tract as a series of tubes

- Going through tube delays the signal (show function) - between tubes (show function)

+ This model leads to a transfer function -> Transfer function V(d)

Since the vocal tract is a cavity that resonates, it amplifies certain frequencies X(f) is sum(a_k * delta(f-kf_a))

This frequencies, which are the local maxes of |S(f)| are called formants

- Generally, the vocal tract transfer function is an all-pole filter
  where a real pole or a complex pole pair correspond to a resonance.
- Also, if you are given a z-model, F = theta / (2*pi*T) where T is 
  the sampling period. (same thing as wT = theta
- zeros, anti-resonances, of the transfer function will occur when there is no 
  measurable output (i.e. Nasals and Fricatives)  
- Nasal => output from the mouth is zero
  Fricatives/stop consonants => blockage behind source is infinite (forcing air 
  through constriction)
  

-> Spectrograms

 - Models frequency vs. time
 - Use a short-time DTFT to obtain useful info about an utterance
   X_m(e^jw) = sum( x(n)w(n-m)e^(-jwn))
 - wideband uses window length = one period
   - high time resolution, low freq
   - striations due to energy variation
 - narrowband captures several periods
   - high freq, low time
   - striations correspond to peaks in frequency spectrum. 
The formants correspond to the dark bands.

-> How to read a spectrogram by Rob Hagiwara

 http://home.cc.umanitoba.ca/~robh/howto.html


Back to Student summary speech

Alumni Liaison

BSEE 2004, current Ph.D. student researching signal and image processing.

Landis Huffman