Problem
My goal is to perform multi-label image classification with EfficientNet. It should take a picture as input and e.g. tell the user that it sees a person AND a dog on the picture, meaning the probabilities wont sum up to 1 - every class gets its own probability from 0 to 1. In short, I would like to convert a multi-class solution into a multi-label solution.
Data
I'm using a small subset of the COCO dataset from Kaggle, which you can find here. It contains about 100k images containing exactly 80 classes. The labels.csv contains one column with the filename and 80 one hot encoded columns for the target output.
I added headings to the subsets label.csv to know which columns refer to which label. I also copied all image files into one directory (datasets/coco_subset/train), since the label information was also in one single .csv file and I couldn't get the DataGenerators to retrieve the images separately. But this is not my main problem! (I would still be thankful for some advice on how to handle this the correct way though)
Accuracy and Loss
I got the model to compile and started training. First I tried 20 epochs, then 50 and finally 100. After 100 epochs, which took a significant amount of time (roughly 12 hours with 6GB VRAM), I still couldn't achieve an acceptable accuracy. Actually the accuracy stagnated at around 20% for the whole 100 epochs. Same happened for the loss, which was stuck at around 4-5, as you can see in the graphs below:
Code
In the following section you can find the associated code. EfficientNet is used as the base model for the new multi-label classification CNN. For EfficientNets pretrained weights I chose the imagenet weights. I replaced the original top layers with a Flatten, Dropout and a Dense layer with number of nodes = number of possible outputs.
I didn't want the model base to be trainable, since I read, that this way it uses the pretrained imagenet weights to extract features. And also setting this property to True would mean a lot more time needed for training.
I replaced the final activation function with sigmoid. As loss function I used binary-crossentropy. I used EfficientNetB0 for performance purposes. I wanted to switch to B4 or B5 when I got good results from training variant B0.
from tensorflow.keras.applications import EfficientNetB0 from tensorflow.keras.preprocessing.image import ImageDataGenerator from keras.models import Sequential from keras.layers import Dropout, Dense, Flatten from keras.optimizers import Adam import pandas as pd # define input shape and batch size input_shape = (224, 224, 3) batch_size = 256 # paths train_dir = "datasets/coco_subset/images/train" # test_dir = "datasets/coco_subset/images/test" # didn't work with ImageDataGeneartor.flow_from_dataframe csv_dir = "datasets/coco_subset/labels/labels_train.csv" label_names_dir = "datasets/coco_subset/labels/categories.csv" # read csv data for loading image label information df = pd.read_csv(csv_dir) df_labels = pd.read_csv(label_names_dir) label_names = list(df_labels["Labels"]) x_col = df.columns[0] y_cols = list(df.columns[1:len(label_names)+1]) # load input images and split into training, test and validation datagen = ImageDataGenerator(rescale=1./255,validation_split=.25) train_generator = datagen.flow_from_dataframe( df, directory=train_dir, x_col=x_col, y_col=y_cols, subset="training", target_size=input_shape[0:2], color_mode="rgb", class_mode="raw", # for multilabel output batch_size=batch_size, shuffle=True, seed=42, interpolation="bilinear", validate_filenames=False ) test_generator = datagen.flow_from_dataframe( df, directory=train_dir, x_col=x_col, y_col=y_cols, subset="validation", target_size=input_shape[0:2], color_mode="rgb", class_mode="raw", batch_size=batch_size, shuffle=True, seed=42, interpolation="bilinear", validate_filenames=False ) # build model n_outputs = len(label_names) model_base = EfficientNetB0(weights='imagenet', include_top=False, input_shape=input_shape) model_base.trainable = False model = Sequential([ model_base, Dropout(0.25), Flatten(), Dense(n_outputs, activation="sigmoid") ]) model.summary() # compile model opt = Adam(learning_rate=0.01) model.compile(optimizer=opt, loss="binary_crossentropy", metrics=["accuracy"]) # define training and validation steps steps_per_epoch = train_generator.samples // train_generator.batch_size validation_steps = test_generator.samples // test_generator.batch_size # train model hist = model.fit( train_generator, epochs=100, steps_per_epoch=steps_per_epoch, validation_data=test_generator, validation_steps=validation_steps).history Summarize
To sum up my problem:
- Are the new top layers even able to solve a multi-label problem?
- Should I set the trainable property of the base model to True or keep it False?
- Is the model suffering from over- or underfitting?
