I hope someone can help me with a work problem I am facing. My data has machineID, timestamp(UTC), batterypotential for multiple machines over 14 days for every 2 mins.
I need to look at their time series data and categorize if they are showing battery disconnect symptoms or not.
Battery Disconnect definition -
Battery Disconnects 1. Scenario 1: The machine’s main switch is on the negative side of the battery causing voltage to drop suddenly or register a floating voltage due to the location the power wire for the device is terminated. 2. Scenario 2: The machine’s main switch is on the positive side of the battery. Due to the location of the device’s power wire when the switch is used, the voltage drops suddenly to 0V and randomly back up to normal equipment voltage when in use.
Here is my current approach -
I’ve been analyzing time-series battery potential data to identify significant changes or “disconnect events.” The process involves extracting and grouping data by hardware serial numbers and timestamps, calculating open and close values for consecutive time points, and determining absolute differences between them. Using these differences, machine-specific dynamic thresholds are computed based on the average and standard deviation of differences. Any differences exceeding the threshold are flagged, and flagged events are aggregated to report the total number of disconnect events and the first and last occurrences for each device. We are considering a minimum of 15 such events. Let me know your thoughts.
Here is the issue -
The distribution is quite interesting with it being skewed so heavily. Its been some time since I last worked with parametric statistics on non-normal distributed datasets, but something tells me that a threshold cannot be determined in this way when the distribution does not approximate a normal distribution. I'm, struggling to figure out the right approach in this case. Perhaps it would make sense to remove outliers when calculating the thresholds?
Please help!