Questions tagged [linear-algebra]
A field of mathematics concerned with the study of finite dimensional vector spaces, including matrices and their manipulation, which are important in statistics.
88 questions
2 votes
1 answer
57 views
Why does the perceptron decision boundary "pass through the origin" after applying the bias trick?
I'm watching Lecture 4 ("Curse of Dimensionality / Perceptron") from Cornell’s CS4780 (Spring 2017) by Prof. Kilian Weinberger on YouTube. In this lecture, he applies the bias trick to ...
1 vote
0 answers
53 views
Linear Discriminant Analysis (LDA) determining the class for test data after transforming with eigenvectors
Suppose we are given two classes class-1 and class-2 and the mean of the two classes are $μ_1$ and $μ_2$ respectively before projection. Both of their variance-covariance matrix is also provided ...
1 vote
1 answer
67 views
How to identify the basis? [closed]
I got this slide from the class lecture. My questions are: Q1. Why "there are enough vectors" are required for the linear span of vectors to satisfy 1st condition of basis? And why "...
6 votes
2 answers
199 views
What type of technique can be used to solve this question?
Apology for the ambiguous title, I do not know the term. I have data of some products which a few variables: origin, weight, brand. For example: Product A = "China, 100g, Brand X" Product B ...
1 vote
0 answers
48 views
Outlier detection with elliptic envelope - unexpected error
I am trying to detect outliers with sklearn.covariance.EllipticEnvelope for a single variable, but it throws an unexpected error. Here is an example the reproduces ...
2 votes
2 answers
167 views
Affine 2D mapping in python
I have two sets of 2D data $A$ and $B$ (representing 2D positions on a 2D plane $x,y$) which are related (the first pair of $x,y$ of $A$ is related to the first pair of $x,y$ of $B$ for instance). I ...
1 vote
2 answers
264 views
About the last decoder layer in transformer architecture
So, in the decoder layer of transfomer, suppose I have predicted 3 words till now, including the start token then the last decoder layer will produce 3 vectors of size d-model, and only the last ...
1 vote
1 answer
3k views
How is weight matrix calculated in a neural network?
Context: I am a pure mathematician trying to understand machine learning. I am studying it from various sources, now focusing on NLP and word embeddings. My question: What is the weight matrix for a ...
0 votes
1 answer
57 views
What (in the world) is well-conditioned vs. low rank fat-tail singular profile?
Scikit learn has a make_regression data generator. Can someone explain it to me like I'm 5 what is meant in the help docs by "The input set can either be well ...
0 votes
1 answer
132 views
Proof of perpendicular distance of an observation from the Maximal Margin Hyperplane
I was reading about Maximal Margin Classifiers in "Introduction to Statistical Learning" and could not understand how is the perpendicular distance of an observation (which is a vector) from ...
1 vote
1 answer
86 views
Plot a matrix as a single point in space
I have a dataset of drugs represented as a graph, each of which is described by three non-square matrices: edge index (A), an 2xe matrix, where e are the bonds of the molecule, the first line ...
0 votes
1 answer
109 views
Beginner Question on Understanding Linear Classifier
I have been trying to understand the math behind Linear classifier for images and I'm hitting a roadblock to understanding this image below: I can to some extent agree that we stretch the pixels into ...
3 votes
0 answers
309 views
Multi-dimensional Euclidian R^2 squared - reasonable?
I have a high-dimensional space, say $\mathbb{R}^{1000}$, and I have samples $y_1, \ldots , y_n \in \mathbb{R}^{1000}$ and $\hat{y}_1, \ldots , \hat{y}_n \in \mathbb{R}^{1000}$. Would $$ R^2 = 1 - \...
0 votes
1 answer
47 views
What what will happen if all the layers of a MLP or any DL architecture are set as same in the beginning?
Setting the initial weights as all zeros will have the output dependent on the bias and setting the weights of all the neurons of a layer as same, will update the gradients in same way thus removing ...
1 vote
0 answers
32 views
Nearest neighbor face recognition in eigenspace when using dot product of test set with eigenvectors does not match the performance when using sklearn
I am trying to perform Face recognition using PCA (eigenfaces). I have a set of N training images (of dimensions M=wxh), which I have pre-processed into a vertical ...