How to perform Multi-Label Image Classification with EfficientNet

Question

Problem

My goal is to perform multi-label image classification with EfficientNet. It should take a picture as input and e.g. tell the user that it sees a person AND a dog on the picture, meaning the probabilities wont sum up to 1 - every class gets its own probability from 0 to 1. In short, I would like to convert a multi-class solution into a multi-label solution.

Data

I'm using a small subset of the COCO dataset from Kaggle, which you can find here. It contains about 100k images containing exactly 80 classes. The labels.csv contains one column with the filename and 80 one hot encoded columns for the target output.

I added headings to the subsets label.csv to know which columns refer to which label. I also copied all image files into one directory (datasets/coco_subset/train), since the label information was also in one single .csv file and I couldn't get the DataGenerators to retrieve the images separately. But this is not my main problem! (I would still be thankful for some advice on how to handle this the correct way though)

Accuracy and Loss

I got the model to compile and started training. First I tried 20 epochs, then 50 and finally 100. After 100 epochs, which took a significant amount of time (roughly 12 hours with 6GB VRAM), I still couldn't achieve an acceptable accuracy. Actually the accuracy stagnated at around 20% for the whole 100 epochs. Same happened for the loss, which was stuck at around 4-5, as you can see in the graphs below:

Code

In the following section you can find the associated code. EfficientNet is used as the base model for the new multi-label classification CNN. For EfficientNets pretrained weights I chose the imagenet weights. I replaced the original top layers with a Flatten, Dropout and a Dense layer with number of nodes = number of possible outputs.

I didn't want the model base to be trainable, since I read, that this way it uses the pretrained imagenet weights to extract features. And also setting this property to True would mean a lot more time needed for training.

I replaced the final activation function with sigmoid. As loss function I used binary-crossentropy. I used EfficientNetB0 for performance purposes. I wanted to switch to B4 or B5 when I got good results from training variant B0.

from tensorflow.keras.applications import EfficientNetB0 from tensorflow.keras.preprocessing.image import ImageDataGenerator from keras.models import Sequential from keras.layers import Dropout, Dense, Flatten from keras.optimizers import Adam import pandas as pd # define input shape and batch size input_shape = (224, 224, 3) batch_size = 256 # paths train_dir = "datasets/coco_subset/images/train" # test_dir = "datasets/coco_subset/images/test" # didn't work with ImageDataGeneartor.flow_from_dataframe csv_dir = "datasets/coco_subset/labels/labels_train.csv" label_names_dir = "datasets/coco_subset/labels/categories.csv" # read csv data for loading image label information df = pd.read_csv(csv_dir) df_labels = pd.read_csv(label_names_dir) label_names = list(df_labels["Labels"]) x_col = df.columns[0] y_cols = list(df.columns[1:len(label_names)+1]) # load input images and split into training, test and validation datagen = ImageDataGenerator(rescale=1./255,validation_split=.25) train_generator = datagen.flow_from_dataframe( df, directory=train_dir, x_col=x_col, y_col=y_cols, subset="training", target_size=input_shape[0:2], color_mode="rgb", class_mode="raw", # for multilabel output batch_size=batch_size, shuffle=True, seed=42, interpolation="bilinear", validate_filenames=False ) test_generator = datagen.flow_from_dataframe( df, directory=train_dir, x_col=x_col, y_col=y_cols, subset="validation", target_size=input_shape[0:2], color_mode="rgb", class_mode="raw", batch_size=batch_size, shuffle=True, seed=42, interpolation="bilinear", validate_filenames=False ) # build model n_outputs = len(label_names) model_base = EfficientNetB0(weights='imagenet', include_top=False, input_shape=input_shape) model_base.trainable = False model = Sequential([ model_base, Dropout(0.25), Flatten(), Dense(n_outputs, activation="sigmoid") ]) model.summary() # compile model opt = Adam(learning_rate=0.01) model.compile(optimizer=opt, loss="binary_crossentropy", metrics=["accuracy"]) # define training and validation steps steps_per_epoch = train_generator.samples // train_generator.batch_size validation_steps = test_generator.samples // test_generator.batch_size # train model hist = model.fit( train_generator, epochs=100, steps_per_epoch=steps_per_epoch, validation_data=test_generator, validation_steps=validation_steps).history

Summarize

To sum up my problem:

Are the new top layers even able to solve a multi-label problem?
Should I set the trainable property of the base model to True or keep it False?
Is the model suffering from over- or underfitting?

your question is actually three questions: consider simplifying the problem and ask them separately. — Oscar
– Oscar, Commented Nov 2, 2021 at 16:56

cag51 · Accepted Answer · 2022-07-02 02:51:39Z

Your code looks correct. In particular, you are correct to use sigmoid activation (since you want multi-hot outputs, using softmax would not make sense) and binary_cross_entropy (this may seem counter-intuitive given that the problem is not binary, but each decision is).

So why isn't it working? I suspect the issue is with your pre-trained weights. When you use frozen, pre-trained weights, it is essential that you use the exact same preprocessing that was used when the weights were derived. As a sanity check, you could take a subset of the original ImageNet and verify that your evaluation accuracy on those images is about the same as what the paper gets.

You also ask about whether you should set the base model to be trainable. If you do so, you will be less beholden to the normalization scheme, and you'll derive features that are better tuned to your particular problem. On the other hand, the features may actually be less good since your dataset is smaller than ImageNet (and it'll take longer to train).

spb · Accepted Answer · 2021-05-15 10:42:22Z

sigmoid activation and binary_cross_entropy loss is used for binary classification.

Apply softmax activation and categorical_cross_entropy loss for multi-class classification.

... model = Sequential([ model_base, Dropout(0.25), Flatten(), Dense(n_outputs, activation="softmax") ]) ... model.compile(optimizer=opt, loss="categorical_cross_entropy", metrics=["accuracy"]) ```

Thanks for the answer, but I want a multi-label output, not multi-class. — RazzFazz
– RazzFazz, Commented May 16, 2021 at 11:59

Stack Exchange Network

How to perform Multi-Label Image Classification with EfficientNet

Problem

Data

Accuracy and Loss

Code

Summarize

2 Answers 2

Hot Network Questions

How to perform Multi-Label Image Classification with EfficientNet

Problem

Data

Accuracy and Loss

Code

Summarize

2 Answers 2

Related

Hot Network Questions