I am working with vehicle registration data from website . The website provides counts for various combinations of vehicle attributes such as Maker, RTO, Fuel, Category, SubCategory, and Emission.
Since directly scraping all combinations (Cartesian product) for each state would result in ~200K combinations per state and may get my IP blocked, I opted to download aggregated pairwise counts manually for each state. I stored them in 15 Excel sheets:
I imported all 15 sheets into SQL Server as separate tables and merged them step by step.
For each combination of Maker, RTO, Fuel, Category, SubCategory, Emission, I mapped all relevant counts from the 15 tables. This gave me a dataset with columns like:
Maker, RTO, Fuel, Category, SubCategory, Emission, MR_Count, MF_Count, ME_Count, MC_Count, MS_Count, RE_Count, RF_Count, RC_Count, RS_Count, FE_Count, FC_Count, FS_Count, CE_Count, CS_Count, SE_Count
I now want to calculate a single Final_Count per combination that is consistent with all 15 pairwise counts. Accuracy is very important (≥95%) and memory efficiency is required, since I want the result to match the website closely.
What is the best data science algorithm to compute Final_Count from these 15 pairwise counts?
I’ve heard Iterative Proportional Fitting (IPF) can estimate multi-dimensional contingency tables. Would this be appropriate? Any advice on optimizing for accuracy and convergence?
Here I am attaching sample Input for your reference..


