Einstein Prediction Builder : Improving Predictions by improving data quality

In my last blog on Einstein Prediction Builder, we were able to improve our prediction score by choosing the right fields for doing the prediction. Link to last blog – Einstein Prediction Builder : Improving Predictions through choice of fields

In this blog, we will understand why data quality is important for improving prediction results.

Imagine an astrologer predicting your future but doesn’t have your date of birth . Imagine a store predicting the sale in month’s to come, but they count the store and customer’s receipt of bills (i.e. double counting the sale proceeds).

This is a similar situation that we might encounter in our data. We are now going to talk through steps to improve data quality and thus improve our prediction.

Just a quick recap, we achieved a score of 47 in the previous module.

STEP 1 : Remove the Duplicates

When we prepare the example set, it is important to cleanse the data to remove duplicates. Salesforce Duplicate Rules or products from AppExchange can help in cleaning data and removing the duplicates. Post removing the duplicates, it is important to re-evaluate the score. In our case, the score increased to 53.

STEP 2 : Validate Data in the example set

It is important to ensure that our example set has valid data. Try to give meaning to the data, by ensuring the fields are populated with proper and meaningful values.

Avoid values like etc. , any other reason, none of the above. Such values can result in low scores especially if they appear on large number of records.

Check if the data is correct and fix the data if there is an issue. This however requires involvement from business users to help in fixing the data and converting it into a meaningful information.

After completing the above, the score of my prediction became 85.

So with some basics checks, we moved our score from Good to Great.

It is important to note that it was not just improving the data quality that moved the score, it was series of steps that helped in reaching this mark. To reiterate it is

  • Removing Hindsight Bias.
  • Removing fields which are majorly blank (and will not have too much data in future).
  • Removing fields which has no impact.
  • Removing the Duplicate data.
  • Validating the data.

We will next talk about the Einstein Prediction Builder Component (provided by Salesforce) in my next blog.

Einstein Prediction Builder : Improving Predictions through choice of fields

In my last blog on Einstein Prediction Builder, we created a new numeric prediction. Link to last blog – Build Numeric Predictions with Einstein Prediction Builder

We saw that we got a “Good” prediction score of 44. But is our score good enough to to get quality predictions?

It is really important to understand that even if a score is good, there are chances that our predictions might not predict what we want to see.

To ensure quality, we need to do 2 level of checks :-

  1. The fields we are using.
  2. The data in our example set.

In this Blog, we will cover the improvement we gain from changing the fields.

STEP 1 : When we choose the fields, we have to first eliminate “Hindsight Bias“. These are fields where outcome occurs post our use case eg: In our scenario, we are checking the probability of case escalation, but have included “Escalated” field in our prediction set. This means when the field is true, the case is escalated. So if we are predicting chances of escalations, then we can’t have a field in prediction that tells us whether escalation has happened already. Thus it is better to remove this field.

Sometimes it is easy to locate as such fields as they are having High Impact on predictions. Removing such fields can sometimes drop the scores drastically. In my use case, the score dropped to 17.

Now we see Case Origin field having an impact and “LiveAgent” having most impact. However a quick check suggests that this field is not falling under the category. Thus we can eliminate the Hindsight Bias towards the prediction by carefully checking the report and fields being used.

Having got a low score at the moment, shouldn’t be a concern as we have taken a step towards quality prediction.

STEP 2 : We should now review the fields where we have no / minimal data in example set. These fields can’t help much in quality predictions and should be removed from the field set. In my use case, I chose to eliminate the fields with no / minimal data.

Checking the score with the change resulted in increase of score to 24.

STEP 3 : Remove the fields which have low impact / variance. This can be found when we open the Top Predictors report (link on Scorecard).

As we can see Account Name has no impact so removing the field can help in improving our results.

We have now reached the Good range again with a score of 47.

So we started with a score of 44 with low quality of predictions and ended with 47 but having a higher quality of prediction. The improvement we did is a step in the right direction to get quality predictions and would further help the score from Good to Great (which I will cover in the next blog).