[...] images are resized to 224x224, which seems to be stretching or compressing the object in the image. Does this have an effect on the model's precision and accuracy?
Resizing to 224x224 is distorting the aspect ratio of phytoplankton. That might not be a problem if phytoplankton aspect ratio is unimportant. But if you think that aspect ratio is useful for discriminating between species, then I'd expect the distortion to be having a negative impact on model performance. That said, the model might also find a way to compensate.
I would prefer to avoid strong distortion too early-on in the pipeline, especially if it is inconsistent with domain knowledge, since it likely handicaps the model and results in an unrealistically-weak baseline.
After resizing, some species become much more similar in shape.
The resizing operation is suppressing aspect ratio cues, and the model is forced to rely on other phytoplankton features. I would expect this makes it harder for the model to tease apart different species, but I don't know how relevant aspect ratio is for telling species apart.
Resizing is also introducing a degree of blurring - that might not be an issue unless phytoplankton species can be told apart by intricate patterns.
I think you have a good starting point for debugging the learning pipeline and ensuring everything works as expected. Re-thinking the preprocessing stage such that it doesn't distort aspect ratios sounds like a plausible way of improving the classification rate.
Suggestions:
- Don't resize unless necessary, or resize as long as important detail is not strongly attenuated.
- If you resize, maintain the aspect ratio.
- You don't need to stick to 224x224 - modern architectures are fully convolutional and work with arbitrary input sizes.
These factors often need to be balanced against memory and other constraints.
certain classes are completely wrong. Which is why I'm trying to figure out how I could improve my training set.
Augmentation could help improve classification rates, but if the net is totally failing, the issue might be more fundamental (e.g. data preprocessing, limitations inherent in the data, and model architecture).
[...] when I try the trained model on other datasets (which seem to have a different image resolution), certain classes are completely wrong.
Perhaps they are lower resolution, making it harder to tell species apart using detailed patterns? It also depends on how well-represented the various species are in that dataset.
For reference, I currently have around 50 classes with at least 100 images each. Some classes contain up to 5000 images.
The larger classes may be dominating the learning. I would consider using balancing strategies if you want performance to be equally-good across classes.