I have loaded my data into a Pandas DataFrame, and performed some pre-processing, and then I need to convert it into a PyTorch Tensor for training as my features data.
Obviously, This new tensor do NOT need auto-gradient with it, because it is only source data.
I convert the df into a tensor like follows:
features = torch.tensor( data = df.iloc[:, 1:cols].values, requires_grad = False ) I dare NOT use torch.from_numpy(), as that the tensor will share the storing space with the source numpy.ndarray according to the PyTorch's docs.
Not only the source ndarray is a temporary obj, but also the original DataFrame will be released before training, because it is huge.
Further more, I'm worrying about the training performance, so I want to my feature data is really stored in a Tensor, not in some form of 'View', or sharing space with ndarray/df.
I'm confused by the PyTorch's docs, since it says that from_numpy will sharing space, and that torch.Tensor.clone() will carry gradient, and if detach() be used, one more copying will occur.
I just need to create a neat Tensor owning data, without gradient, and at best, with less data copying op.
Is my method correct? or is there any better way?