Questions tagged [randomized-algorithms]
The randomized-algorithms tag has no summary.
23 questions
3 votes
0 answers
64 views
How does gradient descent perform, compared to informed random walk?
I have a complex problem, and I am not sure if I can do it with gradient descent. Most importantly, because I do not know the gradient, it is strongly non-continuous on small steps, and I have no easy ...
0 votes
1 answer
827 views
Is it mandatory to set a random_state when using RandomizedSearchCV?
When I use RandomizedSearchCV, if I put the random state I always obtain the same results with the same hyperparams trainer. So, is it mandatory to use? Because in my opinion it is better to always ...
1 vote
1 answer
192 views
Clustering by using Locality sensitive hashing *after* Random projection
It is well known that Random Projection (RP) is tightly linked to Locality Sensitive Hashing (LSH). My goal is to cluster a large number of points lying in a d-dimensional Euclidean space, where $d$ ...
1 vote
1 answer
385 views
Grid Searching seed in randomized machine learning
I was wondering if tuning a seed with cross-validation in order to maximize the performance of an algorithm heavily based on a randomness factor is a good idea or not. I have created an Extra Tree ...
1 vote
0 answers
20 views
Create a random chi-Square independence distribution with a given p-Value
I want to randomly create a table of data that has a predefined p-Value and chi-Value of a chi-square distribution. For example this would have a p-Value of 1 on a chi-square independence test: ...
0 votes
1 answer
534 views
What is the objective that is optimized with Random Search?
I have recently learned about Random Search (or sklearn.model_selection.RandomizedSearchCV in Python) and was thinking about the theory behind the optimization process. In particular my question is, ...
-2 votes
1 answer
966 views
Is shuffling data really necessary for training? [duplicate]
I don't mean if we had a dataset where if sequentially sampled, the labels would be [1111122223333]. In this case, the network learns to predict everything as 1, then 2, and so on and it's impossible ...
1 vote
1 answer
2k views
How to compute modulo of a hash?
Let's say that I have a set of users in my database, that have GUIDs as their IDs. I use xxhash to generate fixed-length hashes for each value, so that I can then ...
3 votes
3 answers
4k views
Cannot clone object <keras.wrappers.scikit_learn.KerasRegressor object at 0x7fdc9c3ba550>
Trying to hypertune ANN but getting an error while using fit..(grid1.fit(X_train, y_train)) Below is the code ...
0 votes
1 answer
313 views
RL Sutton book, initial estimate of q*(a) for 10 arm testbed
The Sutton book does not mention what the initial estimate is for q*(a) before the first reward is received. In this code repo that seems to go along with the book: Sutton code repo They have ...
0 votes
1 answer
976 views
How to generate 12 independent random weights which all add up to one
I'm using Palisade's @Risk software with a triangular distribution to generate 12 random weights which must add up to one, but I get a lot of negative numbers. Is there a straightforward way to set ...
4 votes
1 answer
315 views
Why would one crossvalidate the random state number?
Still learning about machine learning, I've stumbled across a kaggle (link), which I cannot understand. Here are lines 72 and 73: ...
10 votes
3 answers
9k views
Splitting train/test sets by an identifier?
I know sklearn has train_test_split() to split a train and test set. But I read that, even with setting a random seed, if your actual dataset is updated regularly, ...
12 votes
2 answers
2k views
What is the most efficient method for hyperparameter optimization in scikit-learn?
An overview of the hyperparameter optimization process in scikit-learn is here. Exhaustive grid search will find the optimal set of hyperparameters for a model. The downside is that exhaustive grid ...
1 vote
0 answers
46 views
how to label a tain_data? [closed]
I have one assignment that I have four files 1) train_data.csv: The training file contains two fields (text, id). 2) train_label.csv: The label file contains two fields (id, label). 3) test_data.csv: ...