Revision as of 16:45, 14 November 2009 by Pclay (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Due to a kiwi server fail, my pre-lecture notes are not as substantial as I would have liked See my post-lecture notes for a more detailed description.

  • The server failed??? When?? Zach do you know anything about this? --Mboutin 19:45, 3 November 2009 (UTC)
  • It was from around 2pm till about 5:30pm Tuesday. When I tried to preview my page that I had started writing, it said something like "Server not available." --Pclay
  • We will look into this. Thanks for the detailed info Peter! --Mboutin 13:33, 4 November 2009 (UTC)


Notes for speech lecture:

Structure: -> Basic speech stuff (pipes, fricatives) -> Voiced vs. Unvoiced

1) avg power
2) zero crossing

-> x(t) -> v(t) => s(t) = conv( x(t), v(t) )

 periodic filter  phoneme
 pulse 
 train

-> Model vocal tract as a series of tubes

- Going through tube delays the signal (show function)
- between tubes (show function)

+ This model leads to a transfer function -> Transfer function V(d)

Since the vocal tract is a cavity that resonates, it amplifies certain frequencies
X(f) is sum(a_k * delta(f-kf_a))
This frequencies, which are the local maxes of |S(f)| are called formants
 - Generally, the vocal tract transfer function is an all-pole filter
   where a real pole or a complex pole pair correspond to a resonance.
 - Also, if you are given a z-model, F = theta / (2*pi*T) where T is 
   the sampling period. (same thing as wT = theta
 - zeros, anti-resonances, of the transfer function will occur when there is no 
   measurable output (i.e. Nasals and Fricatives)  
 - Nasal => output from the mouth is zero
   Fricatives/stop consonants => blockage behind source is infinite (forcing air 
   through constriction)
   

-> Spectrograms

  - Models frequency vs. time
  - Use a short-time DTFT to obtain useful info about an utterance
    X_m(e^jw) = sum( x(n)w(n-m)e^(-jwn))
  - wideband uses window length = one period
    - high time resolution, low freq
    - striations due to energy variation
  - narrowband captures several periods
    - high freq, low time
    - striations correspond to peaks in frequency spectrum. 
 The formants correspond to the dark bands.

-> How to read a spectrogram by Rob Hagiwara

  http://home.cc.umanitoba.ca/~robh/howto.html

Alumni Liaison

Ph.D. 2007, working on developing cool imaging technologies for digital cameras, camera phones, and video surveillance cameras.

Buyue Zhang