Business Intelligence for Sales Directors
Normality tests determine whether a dataset follows a normal distribution. This is important because many statistical methods assume that the data is normally distributed. Understanding the distribution of your data helps select appropriate statistical techniques and interpret results correctly.
We tested the normality of product dimensions (weight, length, height, width) to understand their distribution patterns.
Figure 1: Histograms of product dimensions with normal distribution curves
Figure 2: Q-Q plots of product dimensions
| Variable | Shapiro-Wilk Test p-value | D'Agostino's K² Test p-value | Skewness | Kurtosis | Normally Distributed? |
|---|---|---|---|---|---|
| Weight (g) | <0.0001 | <0.0001 | 3.87 | 21.34 | No |
| Length (cm) | <0.0001 | <0.0001 | 2.14 | 7.65 | No |
| Height (cm) | <0.0001 | <0.0001 | 1.98 | 6.32 | No |
| Width (cm) | <0.0001 | <0.0001 | 2.32 | 8.76 | No |
Business Insight: All product dimensions show significant deviations from normality, with p-values near zero for both Shapiro-Wilk and D'Agostino's K² tests. The positive skewness values indicate right-skewed distributions (many small products, fewer large ones), and the high kurtosis values indicate heavy tails (more extreme values than expected in a normal distribution). This non-normal distribution of product dimensions has implications for inventory management, packaging strategies, and shipping cost models. Statistical analyses involving these variables should use non-parametric methods or apply appropriate transformations.
We tested the normality of order values to understand their distribution patterns.
Figure 3: Histogram and Q-Q plot of order values
| Variable | Shapiro-Wilk Test p-value | D'Agostino's K² Test p-value | Skewness | Kurtosis | Normally Distributed? |
|---|---|---|---|---|---|
| Order Value (BRL) | <0.0001 | <0.0001 | 4.32 | 28.76 | No |
Business Insight: Order values show a significant deviation from normality, with a strong right-skewed distribution (skewness = 4.32) and heavy tails (kurtosis = 28.76). This indicates that most orders are of relatively low value, with a small number of high-value orders pulling the distribution to the right. This pattern is common in e-commerce and has implications for pricing strategies, revenue forecasting, and customer segmentation. Statistical analyses involving order values should use non-parametric methods or apply appropriate transformations.
We tested the normality of delivery times to understand their distribution patterns.
Figure 4: Histogram and Q-Q plot of delivery times
| Variable | Shapiro-Wilk Test p-value | D'Agostino's K² Test p-value | Skewness | Kurtosis | Normally Distributed? |
|---|---|---|---|---|---|
| Delivery Time (days) | <0.0001 | <0.0001 | 1.87 | 5.43 | No |
Business Insight: Delivery times show a significant deviation from normality, with a right-skewed distribution (skewness = 1.87) and heavy tails (kurtosis = 5.43). This indicates that while most deliveries occur within a standard timeframe, there's a notable number of delayed deliveries that extend the right tail of the distribution. This pattern has implications for logistics planning, customer satisfaction management, and delivery time promises. Identifying and addressing the factors contributing to these delayed deliveries could significantly improve overall service quality.
We tested the normality of customer review scores to understand their distribution patterns.
Figure 5: Histogram and Q-Q plot of review scores
| Variable | Shapiro-Wilk Test p-value | D'Agostino's K² Test p-value | Skewness | Kurtosis | Normally Distributed? |
|---|---|---|---|---|---|
| Review Score (1-5) | <0.0001 | <0.0001 | -1.43 | 0.87 | No |
Business Insight: Review scores show a significant deviation from normality, with a left-skewed distribution (skewness = -1.43). This indicates that customers tend to give more high scores than low scores, with a concentration at the 5-star rating. This pattern is common in e-commerce reviews and suggests that customers are generally satisfied with their purchases. However, the non-normal distribution means that average review scores should be interpreted with caution, and percentile-based metrics might be more informative for tracking customer satisfaction.
We applied various transformations to see if they could normalize our non-normally distributed variables.
Figure 6: Log transformation of order values
| Variable | Transformation | Shapiro-Wilk Test p-value (after transformation) | Skewness (after transformation) | Improved? |
|---|---|---|---|---|
| Order Value | Log | 0.0023 | 0.87 | Yes, but still not normal |
| Weight | Log | 0.0018 | 0.76 | Yes, but still not normal |
| Delivery Time | Square Root | 0.0045 | 0.92 | Yes, but still not normal |
| Review Score | Box-Cox | 0.0001 | -0.98 | Slight improvement, still not normal |
Business Insight: While transformations (particularly log transformations for monetary and physical measurements) improve the normality of our variables, they still don't fully normalize the data. This suggests that the underlying distributions have inherent characteristics that resist normalization. For statistical analyses, we should either:
The non-normal distributions observed in our data have several implications for statistical analysis:
Our normality tests have revealed that key variables in our e-commerce data do not follow normal distributions:
These findings guide our choice of statistical methods and interpretation of results, ensuring that our analyses are robust and our business insights are valid despite the non-normal nature of our data.