Questions tagged [hierarchical-data-format]
The hierarchical-data-format tag has no summary.
30 questions
2 votes
0 answers
34 views
Comparing demographics for hierchichal data
A common ask I get is to compare demographics for 2 businesses. However, the data is nested (hierchichal). Each business has a unique set of locations, and the customer data comes from each location. ...
2 votes
1 answer
239 views
scaling before hierarchical clustering by single and complete linkage
I know that for hierarchical clustering, it's the best practice to scale before so that you give the same weight to each variable. Otherwise, for the complete linkage, the variable with a wider range ...
0 votes
1 answer
82 views
Sequence prediction in Parent - Child dataset
We have a large collection of documents (D), each accompanied by a set of metadata (M). Within this collection, some documents act as parent documents and have multiple child documents. Both parent ...
0 votes
1 answer
68 views
Which modeling technique is appropriate when I have nested/hierarchical data (individual and group) but user inputs will only be at the group level?
I am trying to create a predictive model that will be built on individual data, but user input will only exist at the group level. Reasoning is that I have 5 million rows of data at the individual ...
2 votes
2 answers
5k views
Where can I view the ImageNet classes as a hierarchy on WordNet?
I always find a list of classes on Github that represent the synset ID and name of each Imagenet class label. I need to view the WordNet hierarchy of ImageNet as a tree so I can prune some classes ...
1 vote
0 answers
109 views
Proof related to Ward's Method
According to Ward's Method that says :
0 votes
1 answer
26 views
How to cluster/group these data points (using K-Mean or Hirarachal clustering)
I have genes from different species Gene A , Gene B, Gene C, ... Gene Z Some Genes are similar to each other ...
0 votes
0 answers
114 views
Input Features of a Hierarchical Structure
I have input features of a hierarchical structure. Each feature consists of a header element and 0 to n subfeatures of the same structure. Also, there is no upper limit for n and n can be different ...
1 vote
1 answer
142 views
Use dummy variables to create a rank variable. R
I have a series of multiple response (dummy) variables describing causes for a canceled visits. A visit can have multiple reasons for the cancelation. My goal is to create a single mutually exclusive ...
1 vote
1 answer
2k views
Should I scale or normalise my dataset before clustering? [closed]
So i have a dataset with variables with unit of measurement as milligrams, kgs and quintals. Should i use standard scaler or minmaxscaler to scale the dataset.
1 vote
1 answer
36 views
Can we define a data partitioning in K clusters, by cutting the branches of the tree at some levels in the tree below the root node?
Assume we have a dendogram (hierarchical clusterisation tree), can we define a data partitioning in K clusters, by cutting the branches of the tree at some levels in the tree below the root node?
0 votes
1 answer
50 views
Different representations of dendrograms
I have a dendrogram represented in a format I don't understand: (K_5:1.000030e+00,((K_1:2.000000e-05,(K_2:1.000000e-05,K_3:1.000000e-05):1.000000e-05):1.000000e-05,K_4:3.000000e-05)0.806:1.000000e+00):...
0 votes
2 answers
179 views
Question About Coming Up With Own Function for Distance Matrix (For Clustering)
Right now, I am currently working on implementing a clustering algorithm with millions data entries with regards to game users for a mobile game. A lot of the features I plan on using are unique to ...
1 vote
1 answer
32 views
Handling hierarchical category independent variables
I have data with huge categorical attributes. For example, main_column, sub_column1, sub_column2 are 3 hierarchical attributes. If if take dummy variable on these columns the column count is ...
4 votes
1 answer
2k views
Advice on dealing with very large datasets - HDF5, Python
Recently, I've started working on an application for the visualization of really big datasets. While reading online it became apparent that most people use HDF5 for storing big, multi-dimensional ...