4
$\begingroup$

TLDR

Have background in MLOps and machine learning engineering, started at a new employer (as the first AI engineer) and failed in a project of time series forecasting. Approach detailed below, any idea on what could I do better?

Original Task

As described by my non-technical boss (no background in machine learning), the goal is to find anomalies in a cost database in Big Query. No other detail, but he said as a senior engineer, I should figure out the specifics.

Fair enough, but the data has a dozen different cost attributes, at department level, individual customer level, account manager level, pre onboarding cost, post trading cost, resource cost etc. The domain iss kinda new to me, so initially, I was a bit flustered on figuring out what to model or find anomaly on.

Anyway, after about a week, I delivered an anomaly detection model and basic results the form of python scripts, notebooks, graphs and power point decks based on

  • my judgement and assumptions of what costs are relevant
  • what are the features to look at to identify the anomaly
  • future steps in how to push it to a production application, and make it accessible to the user (internal company users from other departments)
  • asking for feedback on my assumptions

The AI modelling part was trivially simple in itself. I also insisted on surfacing the basic ideas and results to the stakeholders in different departments (who would be the consumer/user) to get the domain feedback. But my boss kept giving relatively inconsequential (in my eyes) feedback (at visual level) like

  • show a pie chart here instead of bar chart
  • show the cost on a per department basis instead of account manager basis
  • show the median of past three quarters here etc.
  • incorporate a user specified threshold on some cost outlier data (it was all running a python script, so no user as such, but mocked by a setting a variable to a threshold)

and many others like this. The data is available on Bigquery, and anyone can create a view with groupby filtering etc. (and I did) but these had nothing to do with anomaly detection (just different ways to slice, dice and present the data), and went on a few times back and forth.

I mentioned several times something along the line of

If you have a specific requirement on the business logic, what kind of chart you want to see, which costs you want to model, or what you think is an anomaly, can you tell me?

The response was usually something like

You are an expert on ML, you should figure it out.

My General Workflow (after presenting the basic results and exploratory analysis)

Incorporated actionable feedbacks soon as they came (within two working days), documented the discussions, progress and the updated in a shared file and jira board to keep record. But my request to actually talk to the users on what they could find useful was ignored on several occassions with reasons like Jack is having a vacation, Bob is on a business trip, Joye is very busy etc.

Needless to say, somehow my boss got impatient with it, and I faced the axe.

So, the goal behind this post is not to seek sympathy, but on how would you approach the whole project (the expectation management+ the data+ the ambiguity). As I said, the raw technical task seemed simple enough, as is generating a few views on Big Query to see e.g. which department spent the most on so and so quarter etc.

So the concrete questions are

  • Do you think the project is an AI/ML project at all?
  • How would you gather the requirement in a more concrete manner against which you can deliver?

P. S. They do not even have a definition of anomaly in their mind. Initially, I used the spectral residual model (from microsoft, there is a paper on it) to define anomalies, but then they could not understand it. So I shifted to a simple Z-score (based on mean and standard deviation) based anomaly detection.

$\endgroup$
3
  • 2
    $\begingroup$ Interesting but you are not going technical enough in your description. is your question's goal to help you find anomalies in your dataset ? if so, I suggest that you choose one "column" to make things concrete and gives information on the data and why your boss or you think that there are anomalies (outliers are not always errors) ? a definition of an anomaly would be useful like a negative price or a height of 6 meter of an employee or it is more like finding costs that are excpetionnally high/low but possible ? $\endgroup$ Commented Feb 15 at 8:38
  • $\begingroup$ No, the question's goal is to understand how could I manage the project better. I posted in this stackexchange because it is a deeply data science project where I thought people would understand and appreciate the background. $\endgroup$ Commented Feb 16 at 6:02
  • $\begingroup$ Your task looks more like data analysis than data science and even less MLE. Moreover a necessary part of the DS job is to get information from the final users/experts. You could consider another hypothesis: you manage the project just fine with the tools/information you were provided with. $\endgroup$ Commented Feb 23 at 10:28

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.