Estimating Final Vehicle Counts from Pairwise Marginals Using Python

Question

I am working with vehicle registration data from website . The website provides counts for various combinations of vehicle attributes such as Maker, RTO, Fuel, Category, SubCategory, and Emission.

Since directly scraping all combinations (Cartesian product) for each state would result in ~200K combinations per state and may get my IP blocked, I opted to download aggregated pairwise counts manually for each state. I stored them in 15 Excel sheets:

I imported all 15 sheets into SQL Server as separate tables and merged them step by step.

For each combination of Maker, RTO, Fuel, Category, SubCategory, Emission, I mapped all relevant counts from the 15 tables. This gave me a dataset with columns like:

Maker, RTO, Fuel, Category, SubCategory, Emission, MR_Count, MF_Count, ME_Count, MC_Count, MS_Count, RE_Count, RF_Count, RC_Count, RS_Count, FE_Count, FC_Count, FS_Count, CE_Count, CS_Count, SE_Count

I now want to calculate a single Final_Count per combination that is consistent with all 15 pairwise counts. Accuracy is very important (≥95%) and memory efficiency is required, since I want the result to match the website closely.

What is the best data science algorithm to compute Final_Count from these 15 pairwise counts?

I’ve heard Iterative Proportional Fitting (IPF) can estimate multi-dimensional contingency tables. Would this be appropriate? Any advice on optimizing for accuracy and convergence?

Here I am attaching sample Input for your reference..

Could you please anyone help with this requirement. It's urgent requirement from client side. Please suggest any models with data science approach? — Guru Moorthy
– Guru Moorthy, Commented Oct 11 at 13:21
A few clarifications would help provide an answer: (1) Should the Final_Count satisfy all marginal constraints exactly when summed? (2) Are all pairwise tables complete, or might some pairs be missing? (3) Is there a hierarchical relationship between Category and SubCategory? (4) Roughly how many unique values exist for each dimension (Maker, RTO, Fuel, etc.)? (5) Do you have any ground truth full combinations to validate accuracy? Understanding these will help recommend the optimal reconstruction algorithm. — Robert Long
– Robert Long, Commented Oct 12 at 19:25

Stack Exchange Network

Estimating Final Vehicle Counts from Pairwise Marginals Using Python

0

Linked

Hot Network Questions

Estimating Final Vehicle Counts from Pairwise Marginals Using Python

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked

Related

Hot Network Questions