Having been deep dived into data cleaning these days, I found that some cells were filled with zeros when it should be blank. The following explain why there are differences between filling a cell with 0 and leaving it blank to describe your observation in Excel. More importantly, why this means two utterly different things when presenting your data.
To those that that are impatient, here is my short answer:
Put zero when you are certain that you observed nothing when describing your data, leave the cell blank otherwise.
For my long answer, lets start with an example (told you this will be long)…
What kind of data you are collecting?
Say you were given a task to do stocktake in a fruit stall. Particularly, your boss wanted to know how many fruits were left in the following items:
You counted the number of fruits diligently and came back with the following numbers:
- Apple: 20
- Banana: 60
- Orange: 10
- Guava: 100
- Watermelon: 0
With these numbers, it was clear that watermelon needed to be restocked. Before your boss placed order, s/he wanted to know if the neighbouring stall was selling watermelon too and how many. You went over and found that the neighbouring stall was not selling any watermelon. This begs the question:
Should you tell your boss the neighbouring stall was not selling any watermelon or just that they ran out of stock?
What’s the different between 0 and blank in describing your data?
From the above example, it was clear that when conducting stocktake within the fruit stall, watermelon should be counted as zero. This is because watermelon was indeed one of the selling items but ran out of stock when stocktake was conducted. In Excel, you put “0” in the cell next to watermelon to treat it like a number. Just like 1, 2, 3, 4, 5….
When you went over to the neighbouring stall to find out if they were selling watermelon but observed none, situation became different. Without prior knowledge if watermelon was one of their selling items, it would be logically wrong to put “0” in Excel to describe the situation in the neighbouring stall. Hence, you will need to leave the cell blank to reflect this situation.
How Excel handle 0 and blank?
Continue with our example above, your very competitive boss now wanted to compare all the fruit items between his stall and the neighbouring stall. You went over again, took a snap, counted the number of fruits and consolidated your results in Excel.
Again, without prior knowledge, it would be difficult to say whether orange or watermelon were simply ran out of stock or they don’t sell these items in the first place, so you leave both of these cells blank.
When you tried to count number of items in Column C (i.e. Neighbouring Stall), using function =COUNT(C2:C6), it would return 3. However, if you count the number of items in Column B (i.e. your Boss’s Stall), using the same function =COUNT(B2:B6), it would return 5. This is because the function COUNT treated 0 as a number just as any other numbers and blank cells remained to be blank. It goes something like this:
- [20, 60,10, 100, 0] – 5 items
- [60, 30, 100] – 3 items
While there could be other times when Excel function could return 0 and blank in the same way (e.g. INDEX) but that is beyond our discussion here.
The focus here is to determine when to put 0 or blank in raw data appropriately. If done wrongly, Excel function would still run the numbers without hiccups, but it would produce a false result. This falsehood of data description would lead to wrong interpretation during analysis.
Why choosing between 0 and blank is important in Excel?
In data collection, there are differences between counting items that are being known or unknown. This affects the way how you input your data and ultimately how you interpret your results. If you described your data wrongly in the first place, analysis will become wrong.
As in the example above, if you put 0 in items that you have no prior knowledge of its existence, it would give a false impression that the neighbouring store sells watermelon in the first place but ran out of stock, rather than they don’t sell this item at all. Further calculation using function, such as COUNT, would produce a wrong result that could only be far from truth. This is because it was based on wrong input.
While it would be easy to spot any mistakes with only a few rows, it would be much difficult to spot just even one if you have thousand of rows. As if it is not troublesome enough, a mix of both 0 and blanks in a column wrongly would become difficult to tell apart whether it truly means numerically none or a blank cell means a certain item does not exist in the first place. Therefore, a clear concept between when to use 0 and when to leave a cell blank is fundamentally important to keep your data right.
Hope this clear out some of the confusion on when to use 0 and when to leave the cell blank in describing observation. Check out my other article about Excel errors and how to fix them…