Revision as of 23:33, 23 April 2017

Hello, World!

by Alden Fisher

Introduction

My original intent was to insert an audio file into Matlab and have it print out what I was saying in plain text. This proved to be a challenge for several reasons which I will get to later. What I ended up doing instead was finding the 1st and 2nd formants in the famous sentence "Hello, world."

Challenges

The toughest part of the original plan was in finding a table that mapped out the formants of the 42 phonemes [1] in the English language. This left me with nothing to compare my results to so the computer had no viable data to use. For this reason, I changed the direction of the project to the current goal.

Approach

I audio recorded me in a quiet room saying the phrase "Hello, World." This was recorded on my iPhone which has a sampling rate of 44.1kHz. From there, I converted the file [2] to a '.wav' so that it would be compatible on all computing platforms. Once I had the audio file, I manually trimmed the data to, basically, get rid of any dead time. One assumption I made was that each of the 10 letters lasted the same amount of time. For this reason, I took 10 DFTs using the 'DFTwin' function we created in lab 9a [1]. From there, I extracted the first 2 largest peaks (the formants). Once I had these, I was able to plot them in 3-space with respect to the letters.

http://www.zamzar.com/

Matlab Code

https://en.wikipedia.org/wiki/International_Phonetic_Alphabet http://phonemicchart.com/transcribe/1000_basic_words.html

@@ Line 1: / Line 1: @@
-'''Introduction'''
+=Hello, World!=
+by Alden Fisher
+----
+== Introduction  ==
 My original intent was to insert an audio file into Matlab and have it print out what I was saying in plain text. This proved to be a challenge for several reasons which I will get to later. What I ended up doing instead was finding the 1st and 2nd formants in the famous sentence "Hello, world."
+<br><br>
+== Challenges  ==
+The toughest part of the original plan was in finding a table that mapped out the formants of the 42 phonemes [1] in the English language. This left me with nothing to compare my results to so the computer had no viable data to use. For this reason, I changed the direction of the project to the current goal.
+== Approach ==
+I audio recorded me in a quiet room saying the phrase "Hello, World." This was recorded on my iPhone which has a sampling rate of 44.1kHz. From there, I converted the file [2] to a '.wav' so that it would be compatible on all computing platforms.
+Once I had the audio file, I manually trimmed the data to, basically, get rid of any dead time. One assumption I made was that each of the 10 letters lasted the same amount of time. For this reason, I took 10 DFTs using the 'DFTwin' function we created in lab 9a [1]. From there, I extracted the first 2 largest peaks (the formants). Once I had these, I was able to plot them in 3-space with respect to the letters.
+http://www.zamzar.com/
 [[File:3D plot.PNG|thumbnail]]

Difference between revisions of ""Hello, World" Speech to Text" - Rhea

Revision as of 23:33, 23 April 2017

Contents

Hello, World!

Introduction

Challenges

Approach

Alumni Liaison