Log-Log Regression

April 16, 2024

Overview

Log-log regression is a statistical technique used to model the relationship between a dependent variable and one or more independent variables, where both the dependent and the independent variables undergo logarithmic transformations. This method is particularly effective for examining multiplicative relationships and elasticities between variables that exhibit exponential growth or decline.

Purpose

  • Elasticity Analysis: Log-log regression is commonly employed to analyze the elasticity of a variable, which is useful for understanding how percentage changes in one variable are associated with percentage changes in another.
  • Scale Invariance: The logarithmic transformation of both dependent and independent variables helps in handling data across different scales and makes the model scale-invariant.

Business Applications of Log-Log Regression

Marketing

The following are some marketing applications of log-log regression:

Price Elasticity: Log-log regression can be used to estimate the price elasticity of a product’s demand. A linear relationship can be depicted between the log of the quantity demanded and the log of the price by taking the natural logarithm of both the dependent and independent variables. The estimated price elasticity of demand can be used to inform pricing strategies and maximize revenue for businesses.

Advertising Effectiveness: The effect of advertising on sales can be estimated using log-log regression. A linear relationship can be modeled between the log of sales and the log of advertising expenditure by taking the natural logarithm of both the dependent and independent variables. The estimated coefficient can be used to guide advertising expenditure decisions and assist businesses in optimizing their marketing campaigns.

Brand Loyalty: Log-log regression can be utilized to estimate the impact of brand loyalty on sales. A linear relationship can be modeled between the log of sales and the log of brand loyalty by taking the natural logarithm of both the dependent and independent variables. The estimated coefficient can be used to inform brand strategy decisions and assist businesses in identifying market share expansion opportunities.

Market Segmentation Analysis: Log-log regression can be used to identify and analyze market segments based on product attributes. A linear relationship can be modeled between the log of the product attribute and the log of market share by taking the natural logarithm of both the dependent and independent variables. Estimates of the resulting coefficients can be used to determine which product attributes are most essential to each market segment and to inform decisions regarding product development. [2]

Finance

The following are some finance applications of log-log regression:

Asset Pricing Models: Log-log regression can be used to estimate asset pricing models, such as the capital asset pricing model (CAPM). By taking the natural logarithm of both the dependent and independent variables, a linear relationship can be modeled between the log of the expected return and the log of the risk premium. The resulting coefficient estimate can be used to inform investment decisions and help investors evaluate the risk and return of a portfolio.

Risk Management: Log-log regression can be used to estimate risk models, such as value at risk (VaR). By taking the natural logarithm of both the dependent and independent variables, a linear relationship can be modeled between the log of the portfolio value and the log of the portfolio risk. The resulting coefficient estimate can be used to estimate the level of risk that the portfolio is exposed to and inform risk management decisions.

Option Pricing: Log-log regression can be used to estimate option pricing models, such as the Black-Scholes model. By taking the natural logarithm of both the dependent and independent variables, a linear relationship can be modeled between the log of the stock price and the log of the option price. The resulting coefficient estimate can be used to inform option pricing decisions and help investors evaluate the fair value of an option.

Credit Risk Analysis: Log-log regression can be used to estimate credit risk models, such as the credit default swap (CDS) pricing model. By taking the natural logarithm of both the dependent and independent variables, a linear relationship can be modeled between the log of the CDS spread and the log of the credit risk. The resulting coefficient estimate can be used to inform credit risk analysis and help investors evaluate the creditworthiness of a company. [3]

Organizational Behavior

The following are some applications of log-log regression in Organizational Behavior:

Analysis of Employee attrition: Log-log regression can be used to predict employee attrition rates based on a variety of variables, including compensation, job satisfaction, and work environment. A linear relationship can be modeled between the log of the employee turnover rate and the log of the various factors by taking the natural logarithm of both the dependent and independent variables. Estimates of the resulting coefficients can be used to identify the most influential factors influencing employee attrition and inform strategies for employee retention.

Analysis of Employee Performance: Log-log regression can be used to predict employee performance based on a number of variables, including job training, work experience, and job satisfaction. A linear relationship can be modeled between the log of employee performance and the log of the various factors by taking the natural logarithm of both the dependent and independent variables. The estimated coefficients can be used to determine the most influential employee performance factors and to inform training and development strategies.

Organizational Culture Analysis: Analyzing the influence of organizational culture on employee behavior and attitudes can be accomplished through the use of log-log regression. By taking the natural logarithm of both the dependent and independent variables, it is possible to construct a linear relationship between the log of employee behavior and attitudes and the log of the different aspects of organizational culture. The estimated coefficients can be used to determine the most influential aspects of organizational culture on employee conduct and attitudes.

Leadership Effectiveness Analysis: Analyzing the influence of leadership on employee behavior and performance can be accomplished using log-log regression. A linear relationship can be modeled between the log of employee behavior and performance and the log of the various leadership factors by taking the natural logarithm of both the dependent and independent variables. The estimated coefficients can be used to identify the most influential leadership factors on employee behavior and performance, as well as to inform leadership development strategies. [4]

Model

Model Form

The general form of a log-log regression model is represented by the following equation:

\[\begin{equation} \log(Y) = \beta_0 + \beta_1\log(X_1) + \beta_2\log(X_2) + ... + \beta_n\log(X_n) + \epsilon \end{equation}\]

where:

  • \(Y\) is the dependent variable (after applying the logarithm).

  • \(X_1, X_2, ..., X_n\) are the independent variables (after applying logarithms to each).

  • \(β_0, β_1, ..., β_n\) are the coefficients that the regression model aims to estimate.

  • \(ε\) is the error term.

Assumptions

Similar to other regression analyses, log-log regression relies on several assumptions:

  • Linearity: The relationship between the log-transformed dependent and independent variables is linear.
  • Independence: Observations must be independent of each other.
  • Homoscedasticity: The variance of error terms should be consistent across different values of independent variables.
  • Normal Distribution of Errors: Errors should follow a normal distribution for valid hypothesis testing.

Advantages

  • Interpretability of Elasticities: The coefficients in a log-log model represent the elasticity between variables, providing insights into how a percentage change in one variable affects another.
  • Handling Non-linear Patterns: By transforming variables logarithmically, non-linear relationships can be modeled linearly, enhancing the analysis of complex patterns.

Disadvantages

  • Zero Values Problem: Logarithmic transformation cannot be applied directly to zero or negative values, which might require additional data adjustments or transformations.
  • Interpretation Complexity: Understanding and interpreting elasticities might be less straightforward compared to additive models, requiring a good grasp of logarithmic relationships. [1]

Estimation

  1. The natural logarithm transformation of both variables can help to capture non-linear relationships, stabilize the data variance, and clarify the relationship between the variables. The coefficients \(β_0\) and \(β_1\) are elasticities that quantify the percentage change in the dependent variable for a one percent change in the independent variable.

  2. The method of least squares is used to minimize the sum of squared residuals when estimating coefficients. This involves determining the values of \(β_0\) and \(β_1\) that minimize the difference between the observed and predicted values of the dependent variable. [5]

MODEL 1: Base Model

For reference and a point of comparison, we setup the same base model as the one used in the previous chapter on Log-Linear Regression.

data(mtcars)
attach(mtcars)

Linear Regression

fit <- lm(mpg ~ wt, data = mtcars)
summary(fit)

Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5432 -2.3647 -0.1252  1.4096  6.8727 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared:  0.7528,    Adjusted R-squared:  0.7446 
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

MODEL 2: Log-Log Model

Model 2 adopts a log-log regression approach to explore the relationship between the weight (wt) of cars and their fuel efficiency (mpg). In contrast to a simple linear regression model that predicts mpg directly from wt, this model predicts the logarithm of mpg from the logarithm of wt. This double logarithmic transformation is particularly useful for examining how percentage changes in vehicle weight influence percentage changes in fuel efficiency, reflecting an elasticity-based relationship between these variables.

fit2 <- lm(log(mpg) ~ log(wt), data = mtcars)
summary(fit2)

Call:
lm(formula = log(mpg) ~ log(wt), data = mtcars)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.18141 -0.10681 -0.02125  0.08109  0.26930 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.90181    0.08790   44.39  < 2e-16 ***
log(wt)     -0.84182    0.07549  -11.15 3.41e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1334 on 30 degrees of freedom
Multiple R-squared:  0.8056,    Adjusted R-squared:  0.7992 
F-statistic: 124.4 on 1 and 30 DF,  p-value: 3.406e-12

Model Output and Interpretation

The results from our log-log regression model are explained below:

Residuals

The residuals, or the differences between observed and predicted values of log(mpg), show a median close to zero (-0.02125), indicating that, on average, the model’s predictions closely match the actual log values. The spread of residuals from the minimum (-0.18141) to the maximum (0.26930) shows that most predictions fall within this range, suggesting a reasonable model fit.

Model Fit

  • Residual Standard Error (RSE): 0.1334 on 30 degrees of freedom indicates the typical deviation of the predicted values from the actual values.

  • Multiple R-squared (0.8056): About 80.56% of the variability in log(mpg) is explained by the model, indicating a strong fit.

  • Adjusted R-squared (0.7992): This adjusted statistic accounts for the number of predictors and still explains about 79.92% of the variability in log(mpg), supporting a robust model.

  • F-statistic (124.4): This high value and the corresponding very low p-value (3.406e-12) affirm the overall significance of the model, demonstrating that the model fits the data well and the variables are appropriate.

Coefficients

  • Intercept (3.90181): This value represents the expected value of log(mpg) when log(wt) equals zero. Since log(wt) equals zero when wt is 1 (not zero due to the nature of logarithmic transformation), this intercept can be interpreted as the expected log of mpg for a car with a weight of 1 unit.

To find the fuel efficiency (mpg) of a car with a weight of 1 unit, we can use the intercept value from the log-log regression model. The calculation involves converting log(mpg) back to mpg using the exponential function:

\[ \text{mpg} = e^{\text{intercept}} = e^{3.90181} \]

Evaluating this expression yields:

\[ \text{mpg} = 49.49 \]

This result indicates that the mpg for a car weighing 1 unit is approximately 49.49 miles per gallon.

  • log(wt) Coefficient (-0.84182): This coefficient indicates the expected change in log(mpg) for each one-unit increase in log(wt). Specifically, a one-unit increase in log(wt) is associated with a decrease of 0.84182 in log(mpg). This negative coefficient reflects a strong inverse relationship, suggesting that as car weight increases on a logarithmic scale, fuel efficiency decreases.

In practical terms, a 1% increase in a car’s weight leads to an approximate 0.84182% decrease in its fuel efficiency.

This is interpreted as the elasticity of mpg with respect to weight, indicating a highly elastic relationship where small percentage changes in weight lead to substantial percentage changes in mpg.

Therefore, the beta coefficient of -0.84182 indicates a strong negative elasticity, signifying that increases in car weight have a substantial negative impact on fuel efficiency, all else being constant.

In contrast, in the linear-linear model, the relationship was additive rather than multiplicative, with each unit increase in weight reducing the mpg linearly by 0.27178 units.

Statistical Significance

In the log-log regression model, we conduct hypothesis testing for each coefficient separately to assess their statistical significance in predicting the dependent variable, log(mpg). Here’s how we define and interpret the tests for the intercept and log(wt):

  • For the Intercept:
    • Null Hypothesis (H0): The intercept is zero, suggesting it has no effect on log(mpg) when all independent variables are zero (logarithmically speaking).
    • Alternative Hypothesis (H1): The intercept is not zero, indicating it does affect log(mpg) and provides a baseline level of mpg when wt equals 1 unit (since log(1) = 0).
    • Result: Given that the p-value for the intercept is significantly below 0.05, we reject the null hypothesis. This means there is strong statistical evidence that the intercept is a significant contributor to the model, affecting the baseline mpg calculation.
  • For log(wt):
    • Null Hypothesis (H0): The coefficient for log(wt) is zero, implying no relationship between weight and mpg on a logarithmic scale.
    • Alternative Hypothesis (H1): The coefficient for log(wt) is not zero, suggesting that changes in weight have a significant effect on mpg.
    • Result: The very low p-value associated with the log(wt) coefficient leads us to reject the null hypothesis. This rejection provides substantial evidence that an increase in weight significantly decreases mpg, affirming the predictive power of log(wt) in the model.

The statistical tests confirm that both the baseline level of mpg and the effect of weight changes are significant factors in the model.

Another Log-Log Model

We present two regression models, Model 3a (Linear-Linear Model) and Model 3b (Log-Log Model), using the mtcars dataset. Both models aim to predict the miles per gallon (mpg) of cars based on their weight (wt) and transmission type (am, with levels “Automatic” and “Manual”).

MODEL 3a - Linear-Linear Model

# Convert am to a factor variable
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))

fit3a <- lm((mpg) ~ wt + am, data = mtcars)
summary(fit3a)

Call:
lm(formula = (mpg) ~ wt + am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5295 -2.3619 -0.1317  1.4025  6.8782 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 37.32155    3.05464  12.218 5.84e-13 ***
wt          -5.35281    0.78824  -6.791 1.87e-07 ***
amManual    -0.02362    1.54565  -0.015    0.988    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.098 on 29 degrees of freedom
Multiple R-squared:  0.7528,    Adjusted R-squared:  0.7358 
F-statistic: 44.17 on 2 and 29 DF,  p-value: 1.579e-09

MODEL 3b: Another Log-Log Model

Suppose we want to understand the relationship between the logarithm of miles per gallon (mpg) and two predictors: the logarithm of car weight (wt) and the transmission type (am). The predictor log(wt) suggests an exploration of how changes in vehicle weight impact fuel efficiency on a multiplicative scale, while am (transmission type: automatic or manual) serves as a categorical variable to compare the fuel efficiency differences between transmission types. This model helps in quantifying how much vehicle weight and transmission type influence the fuel efficiency of cars, assuming multiplicative effects rather than additive ones.

fit3b <- lm(log(mpg) ~ log(wt) + am, data = mtcars)
summary(fit3b)

Call:
lm(formula = log(mpg) ~ log(wt) + am, data = mtcars)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.194110 -0.117056 -0.008833  0.071274  0.258052 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.04162    0.14480  27.911  < 2e-16 ***
log(wt)     -0.93629    0.10822  -8.652 1.58e-09 ***
amManual    -0.08329    0.06886  -1.210    0.236    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1324 on 29 degrees of freedom
Multiple R-squared:  0.815, Adjusted R-squared:  0.8022 
F-statistic: 63.87 on 2 and 29 DF,  p-value: 2.369e-11

Model Output and Interpretation

The output from Model 3b highlights the relationship between the logarithm of miles per gallon (mpg) and factors including the logarithm of car weight (wt) and transmission type (am). Below, we detail the interpretation of the model’s results:

Residuals

  • The residuals of the model show a median very close to zero (-0.008833) and are relatively tightly distributed, ranging from -0.194110 to 0.258052. This tight clustering indicates that the model predictions are generally close to the observed data, demonstrating a good fit.

Coefficients

  • Intercept (4.04162): The intercept, statistically highly significant with a p-value much less than 0.05, represents the expected value of log(mpg) when log(wt) equals zero and when the car has an automatic transmission (am=0). For practical purposes, this value can be interpreted as the expected log of mpg for a hypothetically light car with a weight of 1 unit (since log(1) = 0) and an automatic transmission.

  • log(wt) Coefficient (-0.93629): The coefficient for log(wt) is negative, indicating that as the weight of a car increases, its fuel efficiency decreases.

    • The elasticity of mpg with respect to weight is about -0.936, meaning that a 1% increase in weight leads to roughly a 0.936% decrease in mpg.

    • This relationship is statistically significant, with a very low p-value (1.58e-09), suggesting a strong and reliable negative impact of weight on fuel efficiency.

  • am (-0.08329): This coefficient is associated with cars having manual transmission (am=1) compared to the baseline of automatic transmission (am=0).

    • The negative sign suggests a decrease in fuel efficiency for manual compared to automatic, but this result is not statistically significant (p-value = 0.236).

    • It implies that, after controlling for weight, the type of transmission (manual vs. automatic) does not have a significant effect on the mpg.

Model Fit and Statistics

  • Residual Standard Error (0.1324): The RSE is quite low, which reflects the small average distance of the data points from the fitted line, indicating a good fit.
  • Adjusted R-squared (0.8022): Adjusted for the number of predictors, it confirms that the model explains a significant amount of the variability in log(mpg).
  • F-statistic (63.87): This value tests the overall significance of the regression model, and the associated p-value (2.369e-11) indicates the model is statistically significant.

This analysis reveals that weight is a critical factor in predicting the fuel efficiency of cars, with significant impacts demonstrated through the log-log regression model. However, the transmission type does not show a statistically significant influence on mpg when considering this dataset.

Model Prediction

To illustrate the prediction of fuel efficiency using Model 3b for a car with a weight of 2 units, we’ll calculate the predicted mpg for both automatic and manual transmission types using the regression coefficients obtained from the log-log model. Below are the steps involved:

Model Equations and Coefficients

The model formula based on the regression output is expressed as:

\[ \log(\text{mpg}) = \beta_0 + \beta_1\log(\text{wt}) + \beta_2\text{am} \]

where: \[ \beta_0 \text{ (Intercept)} = 4.04162, \] \[ \beta_1 \text{ (Coefficient for } \log(\text{wt})) = -0.93629, \] \[ \beta_2 \text{ (Coefficient for } \text{am}) = -0.08329 \]

Calculation for an Automatic Transmission Car (wt = 2 units, am = 0)

  1. Log of Weight: Since ( = 2 ), we calculate ( (2) ).
  2. Substitute into Model: \[ \log(\text{mpg}) = 4.04162 + (-0.93629 \times 0.693) + (-0.08329 \times 0) \] \[ \log(\text{mpg}) = 4.04162 - 0.64874 \] \[ \log(\text{mpg}) = 3.39288 \]
  3. Exponentiate to find MPG: \[ \text{mpg} = e^{3.39288} \approx 29.76 \]

Calculation for a Manual Transmission Car (wt = 2 units, am = 1)

  1. Log of Weight: ( (2) ) (same as before).
  2. Substitute into Model: \[ \log(\text{mpg}) = 4.04162 + (-0.93629 \times 0.693) + (-0.08329 \times 1) \] \[ \log(\text{mpg}) = 4.04162 - 0.64874 - 0.08329 \] \[ \log(\text{mpg}) = 3.30959 \]
  3. Exponentiate to find MPG: \[ \text{mpg} = e^{3.30959} \approx 27.36 \]

Interpretation

  • Automatic Transmission Car: A car with an automatic transmission and a weight of 2 units is predicted to have an mpg of approximately 29.76. This demonstrates that, even with an increase in weight, the car maintains relatively high fuel efficiency.
  • Manual Transmission Car: A car with a manual transmission and the same weight has a slightly lower predicted mpg of about 27.36. The negative coefficient for manual transmission, though not statistically significant, suggests a minor reduction in fuel efficiency compared to an automatic transmission model.

These calculations clearly show how the log-log model can be applied to predict the impact of weight and transmission type on the fuel efficiency of cars. The significant negative impact of weight, as shown by the elasticity coefficient, highlights the strong influence of weight increase on reducing fuel efficiency, which is consistent with general automotive principles.

References

[1] Stock, J. H., & Watson, M. W. (2002). Introduction to econometrics. New York: Addison Wesley.

Jaccard, J., & Turrisi, R. (2003). Interaction effects in multiple regression. Sage Publications.

Long, J. S. (1997). Regression models for categorical and limited dependent variables. Sage Publications.

Greene, W. H. (2012). Econometric analysis. Pearson Education.

Hill, R. C., Griffiths, W. E., & Lim, G. C. (2018). Principles of econometrics. John Wiley & Sons.

[2] Anderson, E. T., & Simester, D. I. (2011). Price elasticity of demand in online retail markets. Journal of Marketing Research, 48(2), 316-327.

Tellis, G. J. (2004). Effective advertising: Understanding when, how, and why advertising works. Sage Publications.

Rust, R. T., Zeithaml, V. A., & Lemon, K. N. (2004). Customer-centered brand management. Harvard Business Review, 82(9), 110-118.

Wedel, M., & Kamakura, W. A. (2001). Market segmentation: Conceptual and methodological foundations. Kluwer Academic Publishers.

Kim, K. H., & Kumar, V. (2014). An empirical examination of the determinants of retail prices and promotional markdowns using store-level data. Journal of Retailing, 90(1), 43-55.

[3] Fama, E. F., & French, K. R. (1992). The cross-section of expected stock returns. The Journal of Finance, 47(2), 427-465.

Alexander, C. (2008). Market risk analysis, value at risk models. Wiley Finance.

Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81(3), 637-654.

Duffie, D., & Singleton, K. (1999). Modeling term structures of defaultable bonds. Review of Financial Studies, 12(4), 687-720.

Hull, J. (2018). Options, futures, and other derivatives. Pearson Education.

[4] Hom, P. W., Caranikas-Walker, F., Prussia, G. E., & Griffeth, R. W. (1992). A meta-analytical structural equations analysis of a model of employee turnover. Journal of Applied Psychology, 77(6), 890-909.

Griffin, M. A., & Neal, A. (2000). Perceptions of safety at work: A framework for linking safety climate to safety performance, knowledge, and motivation. Journal of Occupational Health Psychology, 5(3), 347-358.

Schein, E. H. (2010). Organizational culture and leadership (4th ed.). John Wiley & Sons.

Avolio, B. J., & Bass, B. M. (1991). The full range leadership development program: Manual for the multifactor leadership questionnaire. Mind Garden.

Cascio, W. F. (1991). Costing human resources: The financial impact of behavior in organizations (3rd ed.). PWS-Kent Publishing Company.

[5] Wooldridge, J. M. (2015). Introductory econometrics: A modern approach. Nelson Education.

Gujarati, D. N. (2003). Basic econometrics (4th ed.). McGraw-Hill.

Stock, J. H., & Watson, M. W. (2002). Introduction to econometrics. Addison Wesley.

Hill, R. C., Griffiths, W. E., & Lim, G. C. (2018). Principles of econometrics (5th ed.). John Wiley & Sons.

Long, J. S. (1997). Regression models for categorical and limited dependent variables. Sage Publications.

[6]