During training, the neural net settles into a place where it always predicts 1 of the 5 classes.
My train and test sets are distributed as such:
Train Set Samples: 269,501. Features: 157 Data distribution 16.24% 'a' 39.93% 'b' 9.31% 'c' 20.86% 'd' 13.67% 'e' Test Set Samples: 33,967. Features: 157 Data distribution 10.83% 'a' 35.39% 'b' 19.86% 'c' 16.25% 'd' 17.66% 'e' Note the percentages of class b!
I am training an mlp with dropout, and training and test (aka validation) accuracies both plateau, perfectly matching the train and test distributions of 1 of my 5 classes, i.e. it is learning to always predict 1 class out of 5 classes! I've verified the classifier is always predicting b.
I’ve tried batch_size of 0.25 and 1.0 and made double-y sure the data was shuffled the data. I tried both SGD and Adam optimizers with and without decay and different learning rates and still the same result. Tried dropout of 0.2 and 0.5. EarlyStopping of 300 epochs.
Every so often I'll get a situation where during training it'll pop out of where it has settled for training accuracy and validation accuracy but then validation always goes down and training goes up -- or in other words, overfitting.
Output, cut off after 6 epochs. It doesn't always converge this fast, just with this particular SGD optimizer:
Epoch 1/2000 Epoch 00000: val_acc improved from -inf to 0.35387, saving model to /home/user/src/thing/models/weights.hdf 269501/269501 [==============================] - 0s - loss: 1.6094 - acc: 0.1792 - val_loss: 1.6073 - val_acc: 0.3539 Epoch 2/2000 Epoch 00001: val_acc did not improve 269501/269501 [==============================] - 0s - loss: 1.6060 - acc: 0.3993 - val_loss: 1.6042 - val_acc: 0.3539 Epoch 3/2000 Epoch 00002: val_acc did not improve 269501/269501 [==============================] - 0s - loss: 1.6002 - acc: 0.3993 - val_loss: 1.6005 - val_acc: 0.3539 Epoch 4/2000 Epoch 00003: val_acc did not improve 269501/269501 [==============================] - 0s - loss: 1.5930 - acc: 0.3993 - val_loss: 1.5967 - val_acc: 0.3539 Epoch 5/2000 Epoch 00004: val_acc did not improve 269501/269501 [==============================] - 0s - loss: 1.5851 - acc: 0.3993 - val_loss: 1.5930 - val_acc: 0.3539 Epoch 6/2000 Code: Model creation:
def create_mlp(input_dim, output_dim, dropout=0.5, arch=None): """Setup neural network model (keras.models.Sequential)""" # default mlp architecture arch = arch if arch else [64,32,32,16] # setup densely connected NN architecture (MLP) model = Sequential() model.add(Dropout(dropout, input_shape=(input_dim,))) for output in arch: model.add(Dense(output, activation='relu', W_constraint=maxnorm(3))) model.add(Dropout(dropout)) model.add(Dense(output_dim, activation='sigmoid')) # compile model and save architecture to disk sgd = SGD(lr=0.01, momentum=0.9, decay=0.0001, nesterov=True) # adam = Adam(lr=0.001, decay=0.0001) model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy']) return model And inside main after some preprocessing:
# labels must be one-hot encoded for loss='categorical_crossentropy' # meaning, of possible labels 0,1,2: 0->[1,0,0]; 1->[0,1,0]; 2->[0,0,1] y_train_onehot = to_categorical(y_train, n_classes) y_test_onehot = to_categorical(y_test, n_classes) # get neural network architecture and save to disk model = create_mlp(input_dim=train_dim, output_dim=n_classes) with open(clf_file(typ='arch'), 'w') as f: f.write(model.to_yaml()) # output logs to tensorflow TensorBoard # NOTE: don't use param histogram_freqs until keras issue fixed # https://github.com/fchollet/keras/pull/5175 tensorboard = TensorBoard(log_dir=opts.tf_dir) # only save model weights for best performing model checkpoint = ModelCheckpoint(clf_file(typ='weights'), monitor='val_acc', verbose=1, save_best_only=True) # stop training early if validation accuracy doesn't improve for long enough early_stopping = EarlyStopping(monitor='val_acc', patience=300) # shuffle data for good measure before fitting x_train, y_train_onehot = shuffle(x_train, y_train_onehot) np.random.seed(seed) model.fit(x_train, y_train_onehot, nb_epoch=opts.epochs, batch_size=train_batch_size, shuffle=True, callbacks=[tensorboard, checkpoint, early_stopping], validation_data=(x_test,y_test_onehot))