This week I finished looking into deep learning and started thinking about possible ideas for my research project.
3D Depth Representation through Deep Learning Stereo Matching
Dr. Sprague came up with the idea of trying to incorporate the output generated by OpenCV into a CNN together with the original stereo images in order to get a better 3D depth representation.
We looked into current literature and progress made and realized that there is an enormous amount of research being done in this subject matter, but with no perfect result yet achieved by any.
On Thursday I started reading the latest research, as well as one of the best results achieved so far in 3D depth representation through stereo matching. Dr. Sprague prompted me to read the following research: Efficient Deep Learning for Stereo Matching.
Previous research used Siamese networks and concatenation to combine the two branches and then followed with more processing.
The research presented in the article consists in a CNN architecture that uses two Siamese networks with a simple product layer (inner product of patch representations) on top of the convolutions. This method provides more accurate matches and decreases errors compared to previous methods.


However, even though their research has the best results out of many others, it is still not perfects, lacking in correct representation of repetitive patterns and large saturated areas.
What we are thinking of doing is use openCV to output a depth image, then using that as an input to our CNN, together with the raw images coming from the camera.
We’re hoping that by using this combination, we can create a better 3D representation than the current research, and minimize the problems presented in those.