I have implemented a classic feedforward NN (by myself) and it works fine. However, I added conv layers and now learning behavior is very strange.
On a simple task (classifying zeros and Xs on a 28x28 grid) for about 20 epochs the error doesn't change (it decreases by very little steps, about 10^(-4) per epoch) and after that in 2 epochs error drops from about 1.3 to 0.01.
On MNIST dataset in the beginning error is about 3.2 on each number. After some training CNN gets about 0.2 error on some numbers and on the others it increases to about 6. It looks like I coded something wrong but it works on the first example.
Architecture is 1 ReLU conv layer, 1 ReLU fully connected layer and softmax output. The loss function is cross-entropy.