2
$\begingroup$

I hope someone can help me with a work problem I am facing. My data has machineID, timestamp(UTC), batterypotential for multiple machines over 14 days for every 2 mins.

I need to look at their time series data and categorize if they are showing battery disconnect symptoms or not.

Battery Disconnect definition -

Battery Disconnects 1. Scenario 1: The machine’s main switch is on the negative side of the battery causing voltage to drop suddenly or register a floating voltage due to the location the power wire for the device is terminated. 2. Scenario 2: The machine’s main switch is on the positive side of the battery. Due to the location of the device’s power wire when the switch is used, the voltage drops suddenly to 0V and randomly back up to normal equipment voltage when in use.

Here is my current approach -

I’ve been analyzing time-series battery potential data to identify significant changes or “disconnect events.” The process involves extracting and grouping data by hardware serial numbers and timestamps, calculating open and close values for consecutive time points, and determining absolute differences between them. Using these differences, machine-specific dynamic thresholds are computed based on the average and standard deviation of differences. Any differences exceeding the threshold are flagged, and flagged events are aggregated to report the total number of disconnect events and the first and last occurrences for each device. We are considering a minimum of 15 such events. Let me know your thoughts.

Here is the issue -

The distribution is quite interesting with it being skewed so heavily. Its been some time since I last worked with parametric statistics on non-normal distributed datasets, but something tells me that a threshold cannot be determined in this way when the distribution does not approximate a normal distribution. I'm, struggling to figure out the right approach in this case. Perhaps it would make sense to remove outliers when calculating the thresholds?

Please help!

$\endgroup$

1 Answer 1

1
$\begingroup$

The mean and SD are susceptible to distortion by the voltage drop, in the direction of the voltage drop. This could somewhat de-sensitise the threshold to the symptoms you want to detect.

A more robust measure of 'normal behaviour' in this case would be to use the median and quantiles. Those thresholds will be more stable when the voltage suddenly drops, allowing you to detect a disconnect more readily. This will depend on the data, how many steps you average over, and the thresholds used.

I have found that for time-series data, detecting things like spikes, edges, or reasonably well-defined landmarks can be done well using combinations of smoothing, gradients, and histogramming (1, 2). In particular, try calculating the gradient and seeing if you can threshold that to detect sudden changes.

If the data is too variable/complex for reliable thresholding, then ML approaches will likely both perform better and require less tuning (sklearn examples).

We are considering a minimum of 15 such events.

That would be a 'hyperparameter' to tune - you'd need to try different values and see if it's worth increasing or decreasing that number.

Perhaps it would make sense to remove outliers when calculating the thresholds?

That could help. Pruning it too much would make the threshold less realistic, leading to false triggers or missed detections.

Consider uploading some sample data if possible.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.