It’s the first week of JMU’s CS REU! Lots of acronyms!
I am a rising junior from Swarthmore College, and I am here this summer to absorb as much information as I can about what it means to conduct research in computer science. The questions with yet-to-be discovered answers, the trials and errors, the foundational theory behind the technology we’ve created… I am excited to explore computer science questions this summer and work with my team to to contribute something novel and interesting to the field.
For the next eight weeks, I will be working with Dr. Nathan Sprague on a machine learning project. There are a few projects that I can expand on, but we haven’t decided yet which one I will tackle. I spent this week learning about the ongoing research that relates to each project’s ideas. As Dr. Sprague puts it, I am looking for the project that “tugs at my heartstrings.” So far, we’ve talked about three ideas, and I’ve read journal papers that offer different approaches to addressing each research question.
How can we estimate depth data for an environment given one image?
Stereo vision is a popular method for reconstructing the 3D properties of an object— multiple images of the object from different perspectives are “pieced” together. With our two eyes, we implement this method everyday. People with visual impairment cannot do this. Thus, the answer to this question can be applied to an instrument that takes in the scene in front of a person as an image, calculates the depth of everything in the image, and sends information to the person about how far away everything in front of them is. This is a crude explanation of the hardware vision for the project since I will be focusing on the software side of things. I read two papers that offer answers to how we can conduct monocular depth estimation.
The first paper I read is “Depth Map Prediction from a Single Image using a Multi-Scale Deep Network” by Eigen, et al. Their approach utilizes two deep convolutional neural networks. First, a network is used to predict a coarse depth map for a given image based on a global perspective. That is, the network takes into account the overall structure of the image— where objects are placed in relation to each other, are there vanishing points in the image, etc. The resulting depth map is then given to a second network who task is to observe the finer details in the image (where are the edges in the image, what are the object shapes, etc.) through a local view. The idea is that by getting both global and local perspectives of an image, we can get a more accurate depth map.
The second paper on this topic is “Towards Domain Independence for Learning-Based Monocular Depth Estimation” by Mancini, et al. They propose the use of recurrent neural networks, specifically with Long Short Term Memory (LSTM) layers, that excel in generalizing and responding to previously unseen images. Their network consists of having a LSTM layer placed between convolutional encoding and decoding layers. Additionally, the paper explains their process of creating synthetic images to augment their data set and introduce the neural network to novel images.
Both papers produced hypothesis-supporting results and hint that neural networks is the way to go for this research question. Another application for this research, as mentioned in the second paper, is autonomous vehicles.
How can we estimate depth data for an environment given audio data?
This question is inspired by the echolocation abilities of bats! The application possibilities of this research question aligns with the first question. I am still in the process of reading papers for this idea. Right now, I am learning about how bat ears work, specifically how they gather audio information and use it to locate both stationary and moving objects. Other papers I have with me discuss the logistics and implications of creating bio-sonar systems (echolocation) technology.
How can we train a machine to perform a series of unique tasks via reinforcement learning?
A big question in the area of reinforcement learning is how we can get a machine to perform a task well without forgetting all it has learned about prior tasks it’s completed. For example, once a machine has learned how to tend a garden and then make spaghetti, how do we get it to go back to tending the garden without having to retrain it (since now it only knows how to make spaghetti)? The problem is often called catastrophic forgetting. A popular model to test on, at least within the papers I read, was the Atari set of games (arcade games from “back in the day”).
The first paper I read on this: “Human-level control through deep reinforcement learning” by Mnih et al. They developed a deep Q-network agent that was able to play most of the Atari games very well. In other words, they combined Q-learning with a convolutional neural network, using the game pixels and scores as the only inputs. The agent is able to generalize its playing using a set of parameters that are the same for every game.
The second paper: “Progressive Neural Networks” by Rusu et al. Their approach was to connect individual neural networks that learned different tasks together to create a generalized network. The networks are arranged in a chronological order and later networks are able to extract what they deem as important features from earlier networks to strengthen their own layers.
Third paper on catastrophic forgetting: “Overcoming catastrophic forgetting in neural networks” by Kirkpatrick et al. In this paper, a single neural is created. The key is that the weights learned to succeed in task A are protected during the process of learning weights for task B. That is, all weights are constrained to be in the region of low error for all previous tasks. What ends up happening, according to their tests, is that in the early layers, the network’s weights are distributed amongst the tasks (i.e. some weights are more important than other weights for each task). Then, the weights within the layers closest to the output are used with equal importance by all tasks. This method is able to incorporate more tasks into the network than the progressive network method.
In addition to these readings, I’ve done tutorials for tensorflow, a software library I will be using to design my own implementations of machine learning tools. I have also started the process of constructing a basic neural network that will take in auditory data and attempt to predict depth maps. This would be relevant to the bat project. More on this next week…
Overall, I’ve received a lot of cool information about ongoing machine learning research that I would not have been able to get within a standard class. It’s nice to see how the concepts I’ve learned in my courses get applied to real world problems. So far, I am liking the research process, and I look forward to continuing with one of these projects… just have to figure out which one tugs at my heartstrings the hardest.