It works, but how this change affects the model architecture, and the results? It would be great if anyone can explain the intuition behind this.
It works, but how this change affects the model architecture, and the results? It would be great if anyone can explain the intuition behind this.