Questions tagged [pytorch]
Pytorch is an open source library for Tensors and Dynamic neural networks in Python with strong GPU acceleration. For details, see https://pytorch.org.
707 questions
0 votes
0 answers
11 views
Sequence generation model produces incorrect, but coherent outputs
My model takes in an image of a handwritten equation and converts it into its LaTeX representation. In order to do this, it uses a ResNet50 pre-trained model for feature extraction and a transformer ...
8 votes
1 answer
160 views
How to correctly implement the loss function for my distillation of Mask2Former?
I have a Mask2Former model fine-tuned on my own custom dataset and it is working nicely. I want to play around with knowledge distillation and use my pretrained ...
7 votes
1 answer
125 views
LSTM feature scaling with windowing?
Beginner ML practitioner here. I'm trying to do some time series forecasting on a fairly high resolution dataset that stretches over a long period of time. The values vary pretty widely over time: to ...
7 votes
1 answer
71 views
How do I combine multiple texts with mathematical accuracy using specific weights?
In the work I am doing right now, I have multiple (say 5, for purposes of illustration) pieces of text, (which are somewhat close in meaning, let's say for clarity). My objective is to combine these 5 ...
2 votes
0 answers
44 views
Fine-tuning YOLO: Directly cloning and modifying the GitHub repo vs. using Transformers library and Hugging Face — pros and cons?
I’m planning to fine-tune a YOLO model for a custom object detection task. There seem to be two main approaches: Clone the official YOLO GitHub repository (e.g., YOLOv5 or YOLOv8), adjust the codebase ...
10 votes
1 answer
5k views
Is CUDA 13 a thing (or am I misinterpreting something)?
A few days ago I installed my new NVIDIA GeForce RTX 5090 and I can't get pytorch to work on my Win11 Desktop (just background info, the question is not directly ...
3 votes
1 answer
50 views
Single nn.Embedding instance vs mulitple nn.Embedding instances
I am trying to determine if using multiple instances of nn.Embedding() has any value over using a single instance in training a model. As an example, let's say I ...
9 votes
1 answer
1k views
How should a typical reward curve look like while training a RL model
I have set up a DQN with TorchRL to solve a problem where the agent can move in a square grid and pick some rewards scattered randomly on it. Right now, I am using a 5x5 grid and have 3 rewards on it. ...
0 votes
0 answers
16 views
terrible performance on CIFA10 using SWIN model
I am trying to apply the idea from Embedding Deep Networks into Visual Explanations and see if it works on Transformers. The performance is terrible because the accuracy hasn't passed 10%. Can someone ...
0 votes
1 answer
144 views
YOLO knowledge distillation (11x to 11n) yields poorer performance than native training
I'm trying to distill a YOLO11x detection model into a YOLO11n for inference speed improvements without sacrificing too much detection performance. For this, I just overloaded some functions in the ...
3 votes
1 answer
54 views
Model seems to peek into target sequence and cheat during training despite using masking
I am using CNN-transformer hybrid architecture to detect handwritten equation and convert them to LaTex strings. All target sequences (the actual LaTex representation of a handwritten equation) are ...
0 votes
0 answers
22 views
What are the correct steps to successfully train a simple bert seq2seq model on scraped data?
I am trying to train a bert-base using LoRA with HF transformers to experiment how different datasets could influence the model's output. This is just a simple project, and I am not trying to ...
0 votes
1 answer
58 views
when testing with shuffled data, accuracy is high, but when testing with unshuffled data, accuracy is low
To be clear, I shuffled my data when I trained it. It is only the testing data that I modified to be unshuffled, and found that accuracy tanks. (i also used the same data for training and for testing)
2 votes
1 answer
43 views
N-Beats, Pytorch forecasting: predicitons are slightly shifted
I am applying the N Beats Model of the pytorch-forecasting package on a traffic dataset. I am doing single step prediction with a context length of 5. Now the prediction is unfortunately slightly ...
0 votes
1 answer
37 views
Why is my upscaling gan not working?
I have been trying to code an upscaling gan but while the code run, I pretty much always end up with terrible result when the gan doesn't collapse, collapse which happen often. I previously tried to ...