1
$\begingroup$

I have 20,000 plus images of art (paintings, sculptures, jars, etc) stored in a data base. The actual pieces are distributed in multiple warehouses. Ideally, the physical pieces SHOULD have a sticker (with its ID's, QR code, etc), those stickers are made of paper, so the could suffer damaged, be poorly printed, unreadable, be completely missing or even mislocated. My goal is creating a model that receives an input (image sent by someone from any warehouse), identifies the exact same piece of art from the available data and returns its id, details, etc.

In my case, the sample is static, fixed (there will not be "new" art pieces, unless the client purchases more), so the model will not ever "see" new images, which makes me think that overfitting perhaps be the most desirable thing for the model to achieved (this translates to heavy data augmentation and high number of epochs).

Notice that there's ONLY ONE image available per class (art piece). That's the situation, which cannot change.

The selected programming language is R, mainly the tensorflow and keras3 libraries.

Just for testing purposes, I took a sample of 10 pieces, generated 9 others from each (data augmentation, applying rotation, vertical/horizontal flipping, random saturation factors, random brigthness factors, etc). Later, created 5 positive and 5 negative pairs per each class. Finally, I ran a siamese network but the accuracy seems to be stuck at 49%.

That being said, it's hard for me to find solutions since every documentation relies on either the same cat vs dogs or mnist dataset. My questions are:

  1. Is a siamese netowrk the right algorithm for this purpose?
  2. What can I do to improve the accuracy
$\endgroup$
2
  • $\begingroup$ What do you mean by "the model will not ever see new images"? What's the point then? How will it be used in practice? $\endgroup$ Commented Jul 11, 2024 at 17:35
  • $\begingroup$ @picky_porpoise it's meant for work purposes, everytime we are ask to carry out an inventory we want to identify an art piece as easy/fast as possible (e.g the tag is damaged, no legible) instead of scrolling through a 20000+ image gallery until we find the one. $\endgroup$ Commented Jul 12, 2024 at 19:09

1 Answer 1

0
$\begingroup$

Your problem is an image classification problem with 20000+ classes: given an image, return the class, i.e. the exact piece of art.

Before using the 2-step workflow, why not trying just the image classification model that takes the image and outputs the class? I don't think adding a simple feature like the color cluster would help much for the classification.

The approach of trying first with the 2000 square pictures seems reasonable, but this means you can train your model on at most 2000 classes (2000 classes if you have 1 example per class, less classes if you have more than 1 example for some class), so be careful when you evaluate this model.

If you have so few examples per class, data augmentation is probably important if your model needs to generalize across different lighting condition, camera angles, background etc. in the photos. Overfitting on some random features of the photos could be an issue if you have very few examples.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.