[Week Two] Progress of Epoch Proportions

I have a research project! In the long run, I hope to have implemented an algorithm that utilizes neural networks to detect how far away objects are and produces the corresponding depth maps. The ultimate goal is to create a device that aids blind people with “visualizing” the space in front of them based on the objects’ shape and distance away. The device would emit a sound and listen to the echo to judge distance. This idea is inspired by bats and their echolocating abilities.

This week was all about taking baby steps towards the bigger picture. In addition to reading up on bats and the science behind sound, I built a neural network that does a good job detecting how far away a speaker/mic apparatus is from a specific wall in the JMU CS hallway. The sound that the device emits was selected with bat sounds in mind— it is a brief sweeping chirp of high frequency (not bad until you’ve been listening to it for an hour).

I turned the raw audio data into spectrograms and added a convolutional layer to the neural network; in a sense, the network was treating the spectrograms as images and learning from visual cues (such as the distance between the representation of the emitted sound and the representation of the echo). Something I am working to do is build a neural network that uses the raw audio numbers as the input to see if that will produce better performance than the spectrogram images. The figures below show a sample of the data inputs with the true depth and predicted depth from the neural network.

The training set data contains about 1000 samples taken using one hallway wall. Due to the small size of the set and lack of location diversity, the neural network does not fare well against data collected from other locations (for instance, the walls of a small, closed room). That makes sense— the network wouldn’t be able to guess the depth of many walls if it only ever knew about one wall. That is something we need to consider in taking the next step in the project— we need a LOT more data points and from different locations.

As the neural network’s world is expanded, we hypothesize that stereo speakers will be able to provide more informative data. Next week, I will be adjusting the neural network to accommodate this type of speaker… makeshift bat ears may or may be not involved this time.




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s