As I had desperately hoped, the temporary setbacks that constituted last week’s major obstacles shed some light on how to move forward with the project after all. After understanding that those problems were largely the result of the way TensorFlow alters tensors by returning new ones, I set about combing through the code to make sure that when I called a method that was intended to return a new tensor that was, itself, a modification of a given input tensor (as opposed to directly altering one already in existence), I captured that returned tensor in a variable. Upon doing so, and after returning to the code to comment through it in an attempt to clean it up and make it more readily comprehensible, I discovered another pitfall which, though likely responsible for many of my problems, could rectify those problems once solved.
In essence, the critical piece of information I was missing was that methods like tf.assign() or with a variable name as the calling object (e.g. variable.assign()) actually don’t do… well… anything on their own. They have to be run in a session because these methods actually return what TensorFlow refers to as an “op”- an operation. These methods have to either be passed as a parameter to (if our tf.Session is named sess) sess.run() directly or via a variable (e.g. assignment = tf.assign() then sess.run(assignment)). Otherwise, they will not modify the variables to which one is interested in assigning values. As such, I went through the code and ensured that all of the variable modifications were actually taking place by passing around the default session to the relevant methods and invoking sees.run() where necessary. This appears (though I am of course hesitant to make any concrete claims just yet) to have resulted in output much more in line with what we have been looking for.
Also, when delving into the online forums, I discovered that someone had recently posted about an issue they discovered with a TensorFlow method that does not modify variables as expected unless a parameter is set upon that variable’s initial creation. Given that this method is invoked in our code, this was a critical piece of information. I tested his claim and he was, in fact, correct. Ironically, seeing this made me feel very fulfilled. If we are working on the bleeding edge to the point at which the software isn’t even necessarily engineered to reliably do what we seek to do, then perhaps our work truly is novel and progressive. At least that’s how I felt when I read the user’s post and the somewhat confused response from a moderator to the tune of “Well maybe that’s the case, but why would you want to do that anyway?”
A key piece to discovering the underlying issues in the variable modifications in the code was invoking print() statements that took all of a few seconds to code. I used them to monitor the shapes of my variables upon altering them to ensure that they had been altered in the first place, as well as that the changes that had taken place were as I had intended. I cannot stress enough how important it is (now that I have learned the hard way) to perform regular sanity checks and periodically confirm what you in your “infinite wisdom” are “100% positive” is going on under the hood (which may or may not actually be the case). I didn’t even have to write any unit testing code or anything of the sort to troubleshoot a complex problem, and the time I saved by doing this is quite literally incalculable.
Next up in the process is finishing up some output accuracy calculations. Although the output at this point is very promising and closely resembles what I saw when I was convinced a couple of weeks ago that our project had been a complete success already (so young…. so naive…) I do need to find out how to account for changes in the ewc_loss (the variable that holds Elastic Weight Consolidation loss) metric’s shape with expansion of the network. Cross entropy was merely expanded as part of a new layer relationship formula in the network as Stochastic Gradient Descent applies only to the most recent task upon which the network is being trained, but for Elastic Weight Consolidation the loss calculations are more complicated. I am so hopeful, however, that our goal of a self-expanding network capable of sequential, continued learning without catastrophic forgetting is not only possible but close to being a reality.