I have a research project! In the long run, I hope to have implemented an algorithm that utilizes neural networks to detect how far away objects are and produces the corresponding depth maps. The ultimate goal is to create a device that aids blind people with “visualizing” the space in front of them based on the objects’ shape and distance away. The device would emit a sound and listen to the echo to judge distance. This idea is inspired by bats and their echolocating abilities.
This week was all about taking baby steps towards the bigger picture. In addition to reading up on bats and the science behind sound, I built a neural network that does a good job detecting how far away a speaker/mic apparatus is from a specific wall in the JMU CS hallway. The sound that the device emits was selected with bat sounds in mind— it is a brief sweeping chirp of high frequency (not bad until you’ve been listening to it for an hour).
I turned the raw audio data into spectrograms and added a convolutional layer to the neural network; in a sense, the network was treating the spectrograms as images and learning from visual cues (such as the distance between the representation of the emitted sound and the representation of the echo). Something I am working to do is build a neural network that uses the raw audio numbers as the input to see if that will produce better performance than the spectrogram images. The figures below show a sample of the data inputs with the true depth and predicted depth from the neural network.
The training set data contains about 1000 samples taken using one hallway wall. Due to the small size of the set and lack of location diversity, the neural network does not fare well against data collected from other locations (for instance, the walls of a small, closed room). That makes sense— the network wouldn’t be able to guess the depth of many walls if it only ever knew about one wall. That is something we need to consider in taking the next step in the project— we need a LOT more data points and from different locations.
As the neural network’s world is expanded, we hypothesize that stereo speakers will be able to provide more informative data. Next week, I will be adjusting the neural network to accommodate this type of speaker… makeshift bat ears may or may be not involved this time.