GitHub - angelphanth/alphabet_asl

Classifying letters of the ASL alphabet with Keras GPU

The goals of this project were to classify letters of the American Sign Language (ASL) alphabet from existing images and new images taken from live-video.

Nearly five years ago, a viral video of two Deaf women ordering in sign language at a Starbucks drive thru via two-way video made headlines. Since then we haven’t seen much as far as technology advancements go regarding accessibility for those who use sign language as their main means of communication. The ability to transcribe sign language for non-signers could be a way to improve accessibility for members of the Deaf, hard of hearing, and non-verbal communities. I hope the goals of this project, which include interpreting fingerspelling in real-time, can be a steppingstone towards transcribing ASL words and expressions.

Multiple Convolutional Neural Networks (CNNs) were attempted to solve this image classification problem. CNNs initialized and trained from scratch and transfer learning (e.g., VGG16, VGG19, Xception) were trained on an ASL dataset sourced from Kaggle.

The dataset consisted of 29 classes with 3,000 training images for each class. The 29 classes included the 26 letters of the alphabet, as well as “nothing”, “delete” and “space” classes. All images were colour (3 channels), 200 pixels in height and width, and were in JPG format. It should be noted that the dataset included a test set of 29 images (one image for each class), which were not used.

Instead, a new test set was created by taking 20% of the training images using the shell script data_augmentation/split_only_asl.sh. The shell script was responsible for:

Creating class-labelled sub-directories within a test_set directory,
Randomly allocating 20% of the training images for each class to their respective test sub-directories, and
Executing the python script data_augmentation/bright_images.py.

bright_images.py was responsible for creating brightened colour and grayscale copies of the original ASL alphabet images, with the option to:

Normalize, resize (64x64) and flatten each image channel to (1, 4096) arrays, and
Saving those arrays to the CSV files asl_grey.csv or asl_colour.csv (for anyone interested in fitting scikit-learn machine learning classifiers).

Figure 1. Samples of ASL letters "A", "B", and "C" in their original format, following brightening, and following augmentation with ImageDataGenerator.

The images were brightened to improve visibility of individual digits and their positioning. Additional preprocessing of the images, which included flipping and shifting the images horizontally, and applying a zoom and shear to the images, was randomized and completed by Keras ImageDataGenerator. The purpose of randomly augmenting the images was to improve model performance when introducing new images.

Transfer learning was then leveraged by utilizing Keras VGG16, VGG19, Xception (with and without additional fully-connected layers)and training the models on the preprocessed ASL images.

After learning from the training images, each CNN model was evaluated on their performance on the test images (first goal) and on new images acquired from a webcam in real-time (second goal). Performance on the tests were evaluated by creating confusion matrices and classification reports from the true vs. model predicted classes. How each model performed on the image classification problem is summarized via accuracy scores in Figure 2.

Figure 3. Each model's number of parameters and performance on the training, validation and test image sets, and new webcam-sourced images as quantified by the model accuracy scores.

All models achieved comparable accuracies of 98-100% when predicting the letters of the test set of existing images (first goal). However, differences in model performances became evident when introducing new images taken in real-time_test.ipynb.

The Live Test was completed in real-time_test.ipynb and entailed:

Loading the six model architectures with their best weights from training,
Capturing 20 frames for each class using Open Computer Vision (OpenCV),
Having all models predict the class of each frame,
Save the true classes and all model predicted classes to a DataFrame, and
Output the classification report and confusion matrix for each model.

In the live test VGG19 had the best performance, followed by VGG16 and SqueezeNet. The poor performance of Xception could be attributed to differences in the number of weight layers and parameters. Xception even without extra fully-connected layers may have been too powerful for the training set, quickly overfitting to the data and resulting in poor learning where weights of the earlier layers failed in updating due to vanishing gradients.

While we were able to classify ASL letters from existing images, a lot of work still needs to be done to improve ASL letter prediction in real-time. However, this project marks a first attempt towards transcribing ASL from live-video as a means of improving accessibility for those who use sign language as a primary mode of communication.

Next steps include:

Training a CNN on ASL alphabet images with more variety in terms of who is signing and in front of what background they are signing
Incorporating a hand-detector

Resources:

Setting up Keras GPU

Transfer Learning and Image Preprocessing

Configuring OpenCV for Real-Time Predictions

Navigating this git repo

No directories:

README.md
capstone_demo.py acquires webcam images and provides SqueezeNet predictions for demonstrative and interactive purposes (used to make the opening GIF)

Within the directories:

requirements

Option to set up Keras CPU and Keras GPU environments via conda or pip.

Note: Even if training models with GPU acceleration, to complete real-time_test will need Keras CPU environment to access webcam.

data_augmentation

split_only_asl.sh creates test sets and the sub-directories, executes bright_images.py
bright_images.py creates brightened colour and grayscale copies of the training and test sets, and CSV files for scikit-learn models

VGG16

VGG16_ASL.ipynb training and evaluating a VGG16

VGG19

VGG19_ASL.ipynb training and evaluating a VGG19

xception

xception_asl.ipynb training and evaluating an Xception model with additional fully-connected layers

xception_noFC

xception_asl_nofc.ipynb training and evaluating an Xception model without additional layers

SqueezeNet

squeezenet_asl.ipynb training and evaluating a Squeezenet model
squeeze_asl.JSON model architecture
best_weights_squeeze.h5 model weights

demo

demo.gif demonstration of capstone_demo.py
figure_1.png Figure 1. Samples of ASL letters "A", "B", and "C" in their original format, following brightening, and following augmentation with ImageDataGenerator.
figure_2.png Figure 2. Each model's number of parameters and performance on the training, validation and test image sets, and new webcam-sourced images as quantified by the model accuracy scores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classifying letters of the ASL alphabet with Keras GPU

Resources:

Navigating this git repo

No directories:

Within the directories:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
VGG16		VGG16
VGG19		VGG19
data_augmentation		data_augmentation
demo		demo
flask_demo		flask_demo
requirements		requirements
squeezenet		squeezenet
xception		xception
xception_noFC		xception_noFC
README.md		README.md
capstone_demo.py		capstone_demo.py
real-time_test.ipynb		real-time_test.ipynb

Folders and files

Latest commit

History

Repository files navigation

Classifying letters of the ASL alphabet with Keras GPU

Resources:

Navigating this git repo

No directories:

Within the directories:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages