Automatically Syllabifying English Speech

Adam Thomas, Drew McDaniel, Michael Albanese, Salvatore Bottiglieri, Trent Cooper

Northern Arizona University, College of Engineering, Forestry, and Natural Sciences

Project Motivations

The goals of this project are to research automatic syllabification techniques and identify one that can improve the performance of the current Applied Linguistics Speech Lab tools.

After breaking down speech into syllables, the pitch, rhythm, and stress of each individual syllable can be analyzed to detect features of the speaker.

Some features include: emotion, sarcasm, country of origin, whether the speaker was asking a question, issuing a command, or simply making a statement.

Software Artchitecture

  • Sound files containing english speech are recorded
  • The raw sound files are then process to attach start and end times to each individual phone
  • After attaching timing data, the phones are structured into objects
  • The structured phones are then ran through various systems to calculate the syllabification of the phone data

Challenges

  • Very few public syllabification solutions exist
    • There is very little foundation to build off of
  • Iterative Process
    • Each of the syllabification solutions required many iterations in order to achieve desired percentages
  • Memory Management
    • The software had to be able to handle large amounts of data at once, special care to stay within reasonable memory limits was needed
Based on 1680 utterances

Genetic Algorithm

  • Iterative technique that takes cues from evolution in biology

Genes

  • Genes are candidate syllabification solutions
  • Genes start and end with vowels, have a variable number of consonants and one syllable split in-between

Evolution

  • The genes that have a more accurate syllabification result will be kept and ones that are inaccurate will be culled
Based on 1680 utterances

Hidden Markov Model

  • Relies on statistical calculations to make predictions based on a series of sonority values

Markov Chain

  • Works with a chain of states, where the current state is statistically reliant on the previous one

Syllabification with States

  • With an input list of sonority values, the HMM outputs states which represent the beginning of each syllable
Based on 1680 utterances

K-Means Clustering

  • Allows for organizing utterances into a specified number of groups

Data Grouping

  • Performs grouping based on similar features ex: average, mean, median sonority values

Syllabification with Grouping

  • Each group of utterances receives an individual Hidden Markov Model, allowing for more precise syllabification results

Testing

  • Integration Testing
    • Focused on utility functions that all algorithms use
  • Evaluated code coverage of the systems
    • Helps eliminate code that is not being used
  • Basic Unit Testing
    • Check robustness of systems by ensuring inputs are in the expected format and stay within boundaries

Result Highlights

Genetic Algorithm 72.60%
Hidden Markov Model 77.64%
K-Means Clustering 78.10%

Room for Improvement

Although these are the percentages that were achieved for the three solutions, due to the constant reworking that each requires, higher percentages may exist.

  • The genetic algorithm theoretically gets more accurate over time, however it is based on random chance
  • K-Means and HMM both work on a set of numerical parameters, a more optimal driving set of parameters may exist

Sponsors

Dr. David
Johnson
Dr. Okim Kang
PhD, Computer Science PhD, Linguistics

Mentor

Dr. John Georgas

Associate Professor with the Electrical Engineering and Computer Science Department

The Programming Team



From left to right:
Drew McDaniel Genetic Algorithm
Salvatore Bottiglier Genetic Algorithm
Michael Albanese Hidden Markov Model and K-Means Clustering
Adam Thomas Website, Documents, and Testing
Trent Cooper Documents and Testing