Benford’s Law compares the frequency distribution of leading digits to its (empirically proven) natural counterpart. This can then be used to detect fraud and errors.
The green columns show how often each number should be the first digit in numbers that should follow the Benford-distribution. In black you’ll see the actual distribution of first digits within my table. Lastly, the red line shows the percentual absolute deviations between actual and Benford values.
In this example, there is a relatively high occurrence of numbers starting with 4 and 5. So this could be a sign for fraudulent manipulations.
The Benford Distribution
First you need a table with the Benford-distribution. Just load it as a disconnected table to your data model and name it “BenfordTable”. The “Value”-field from this table will be taken as x-axis for the visualisations. As the Benford-distribution is logarithmic, it can quickly be created with the following DAX-code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
BenfordTable = | |
ADDCOLUMNS ( | |
GENERATESERIES ( 1, 9 ), | |
"BenfordDistribution", LOG10 ( 1 + ( 1 / [Value] ) ) | |
) |
Benford’s law Measure
The measure calculates how often a number starts with one of the BenfordNumbers (1..9) compared to the total number of rows in the FactTable.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
BenfordMeasure = | |
VAR BenfordNumber = | |
MAX ( BenfordTable[Value] ) | |
VAR CountBenfordNumber = | |
CALCULATE ( | |
COUNTROWS ( FactTable ), | |
LEFT ( FactTable[Value], 1 ) * 1 = BenfordNumber | |
) | |
VAR CountTotal = | |
COUNTROWS ( FactTable ) | |
RETURN | |
DIVIDE ( CountBenfordNumber, CountTotal ) |
If there are blank in the Value-column of the table to be analyzed, the measure has to be adjusted by filtering them out in the VAR CountTotal: ( CALCULATE(COUNTROWS(FactTable), FactTable[Value] <> BLANK()) )
Please note that you can create as many measures as you need in one model. So if you have multiple columns to investigate, just write a measure for each.
Benford’s law Variance Measure
To calculate the difference between target and actual, I use a MAXX-aggregation. This returns the maximum difference there is for a number. I also use this in a card visual if I want to add a data driven alert. So I don’t have to check and eyeball the chart regularly, but can just use this in a card visual in a dashboard. Then I’ll set a threshold value for the alert and will not miss any alarming developments.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Deviation % = | |
MAXX ( | |
ADDCOLUMNS ( | |
BenfordTable, | |
"Diff", ABS ( [BenfordMeasure] – BenfordTable[BenfordDistribution] ) / BenfordTable[BenfordDistribution] | |
), | |
[Diff] | |
) |
Why DAX?
I’m using measures here instead of a calculated column (in the Benford-Table) because this allows me to filter and slice my table. This allows for making advanced and flexible analysis like comparing different values against each other or over time against the Benford distribution:
Enjoy & stay queryious 😉
What happens with 0.01? I think it should be Benfords Law number 1, because the first non zero digit is 1. Therefore I multiply the absolute value by 100 for transactions and then take the first digit. Also I use ABS to eliminate “-” signs.
Yes, you should prepare your figures accordingly before you apply the measures.
Thanks for the hint!
/Imke
Thanks Imke
Very helpful.
Peta – Staying queryious
Thanks 🙂
Hey can you help me understand what i should change if i want to addapt the measure to reach the first 2 digits?
Hi John,
you need to adjust the Benford table like so:
Benford_Table_1st2digits =
ADDCOLUMNS(
GENERATESERIES(10,99)
,"BenfordDistribution", LOG10(1+(1/[Value]))
,"FirstDigit", LEFT([Value],1)
,"SecondDigit", RIGHT([Value],1)
)
and in row 7 of the measure you have to take the 2 digits from the left: LEFT ( FactTable[Value], 2 ) * 1 = BenfordNumber
/Imke
How would you get this logic to work on journal entries? Negative numbers as -234 and numbers leading with zero as 0.001 will be able to occur.
Hi Casper,
if negative numbers have to be considered as well, you have to take the absolute values instead (ABS-function).
If you’re one scale with most below 1, then you should multiply them by a factor of 10 (10,100, 1000…) so that the first non-zero digits will sit before the decimal separator.
/Imke
Hi. Thanks for this, it is very helpful. I have a question, when I click in the chart (actual distribution), it does not show the list of details (say list of all amounts start with number 9 when I click value 9 on X axis for example). How to make this link? Thanks.
Hi How can I modify this to Last 2 Digits,2nd Order and Summation when my Columns has decimal Number ranging from 6.42 to 7937938.04
Your website is a gem for acct & fin professionals. Huge thank you!
For some reason, I cannot make the DAX code work for the line variance. Could you, please, elaborate why you are using MAXX vs Average?