0
$\begingroup$

I have loaded my data into a Pandas DataFrame, and performed some pre-processing, and then I need to convert it into a PyTorch Tensor for training as my features data.

Obviously, This new tensor do NOT need auto-gradient with it, because it is only source data.

I convert the df into a tensor like follows:

features = torch.tensor( data = df.iloc[:, 1:cols].values, requires_grad = False ) 

I dare NOT use torch.from_numpy(), as that the tensor will share the storing space with the source numpy.ndarray according to the PyTorch's docs.

Not only the source ndarray is a temporary obj, but also the original DataFrame will be released before training, because it is huge.

Further more, I'm worrying about the training performance, so I want to my feature data is really stored in a Tensor, not in some form of 'View', or sharing space with ndarray/df.

I'm confused by the PyTorch's docs, since it says that from_numpy will sharing space, and that torch.Tensor.clone() will carry gradient, and if detach() be used, one more copying will occur.

I just need to create a neat Tensor owning data, without gradient, and at best, with less data copying op.

Is my method correct? or is there any better way?

$\endgroup$

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.