Machine Learning

Log Link vs Log Transformation in R – Difference Differences Your Data Analysis

The distribution is a highly used, a large number of real national data is unusual. When faced with the most checkable data, they salute us to use the log conversion to normalize distribution and make a difference. I just worked on the project analyzing project for AI models, using data from Epoch Ai [1]. No official data in the use of each model power, so I counted it by repeating the power of each model during its training time. New variations, power (in KWH), would be done well, along with some excessive retailers and overdue (figurative figure (Fig. 1).

Fig. 1. Histogram of power usage (kWh)

Dealing with this drink and htoskedesticity, my first nature was to put log changes to the change of energy. The Log Distribution of the Log (Energy) is typically normal (Figure 2), and Shapiro-Wilk test confirmed Borderline (P ≤ 0,5).

Figure 2. The Log of Energy for Energy Use (KWH)

Modem Dilemma: Log transformation vs log Link

Sight appears to be good, but when I passed on to a smile, I faced a problem: I have to imitate the Convertible convertible conversations (log(Y) ~ XSelected, or I have to imitate -thange using a LOG coordinate activity (Y ~ X, link = “log"Selected? I also looked at two distribution – Gaussian (General) distribution and the distribution of Gamma – and one combined in each distribution of both entry. This gave me four different models as below, all installed using standard models er Gregech (GLM):

all_gaussian_log_link <- glm(Energy_kWh ~ Parameters +
      Training_compute_FLOP +
      Training_dataset_size +
      Training_time_hour +
      Hardware_quantity +
      Training_hardware, 
    family = gaussian(link = "log"), data = df)
all_gaussian_log_transform <- glm(log(Energy_kWh) ~ Parameters +
                          Training_compute_FLOP +
                          Training_dataset_size +
                          Training_time_hour +
                          Hardware_quantity +
                          Training_hardware, 
                         data = df)
all_gamma_log_link  <- glm(Energy_kWh ~ Parameters +
                    Training_compute_FLOP +
                    Training_dataset_size +
                    Training_time_hour +
                    Hardware_quantity +
                    Training_hardware + 0, 
                  family = Gamma(link = "log"), data = df)
all_gamma_log_transform  <- glm(log(Energy_kWh) ~ Parameters +
                    Training_compute_FLOP +
                    Training_dataset_size +
                    Training_time_hour +
                    Hardware_quantity +
                    Training_hardware + 0, 
                  family = Gamma(), data = df)

Comparing Model: AIC Zones and Notifications

I compared four models using Akaike Information Criterion (AIC), an average of predicting error. Usually, lower than AIC, preferably appropriate model.

AIC(all_gaussian_log_link, all_gaussian_log_transform, all_gamma_log_link, all_gamma_log_transform)

                           df       AIC
all_gaussian_log_link      25 2005.8263
all_gaussian_log_transform 25  311.5963
all_gamma_log_link         25 1780.8524
all_gamma_log_transform    25  352.5450

Among the four models, models use the converted log-transformation results with a lot of low prices lower than those using log links. Since the difference in AIC-Links and Log-Link was a major (311 and 352 vs 1780 and 2005), I also check the diagnostic models to ensure the relevant log-transformed models:

Figure 4. The diagnosis of the Gausian diagnosis remains the remains of the remaining remains vS showing insight despite a few merchants. However, the QQ's conspiracy displays the visual deviation from the syringe line, suggests the unusual.
Figure 5. Diagnosing the converted Gaussian model sites. QQ structure shows the best balance, the general support. However, remnants of remains vs limited vs with DIP TO -2, which can elevate misconduct.
Figure 6. The diagnostic dioxide diagnosis of a model. QQ plot looks okay, but the combined vs remains show clear non-lineartity symptoms
Figure 7. The diagnostic vessels of the model model. Fossils of the Fossil Fatilized VS looks good, with a small dip of 0.25 at the beginning. However, the QQ's conspiracy displays some deviation from both tails.

Based on AIC and Diagnostic values, I have decided to move forward with Gampa Model for the Gamma, because it had the lowest AIC value and its remains of the farm looks better than that gaussian arrangement changed.
I looked at what kind variations were helpful and what collision might be important. The last model I chose is:

glm(formula = log(Energy_kWh) ~ Training_time_hour * Hardware_quantity + 
    Training_hardware + 0, family = Gamma(), data = df)

Interpreting Coefficients

However, when I first interpret the model coefficients, something heard something heard. Only the answer only the answer was changed, forecasting results are the gatherings, and we need to produce coefficients to turn it back on the original scale. The increase in one unit increases effect 𝓎 in an Exp (β), or each additional unit in 𝓍 leading to (Exp (β) – 1) × 100% change in 𝓎 [2].

If you look at the table of the results of the model below, we have Training_Time_Hour, Hardware_quantity, and their Teraction Stures Training_Time_Hour: Hardware_quantity They varies on continual, so the their coefficients represent the slopes. At that time, as I specify +0 in model formula, all grinding levels Training_khardare Act as disruption, which means that each hardware type has served as antercept β₀ where its corresponding dummy variable was working.

> glm(formula = log(Energy_kWh) ~ Training_time_hour * Hardware_quantity + 
    Training_hardware + 0, family = Gamma(), data = df)

Coefficients:
                                                 Estimate Std. Error t value Pr(>|t|)    
Training_time_hour                             -1.587e-05  3.112e-06  -5.098 5.76e-06 ***
Hardware_quantity                              -5.121e-06  1.564e-06  -3.275  0.00196 ** 
Training_hardwareGoogle TPU v2                  1.396e-01  2.297e-02   6.079 1.90e-07 ***
Training_hardwareGoogle TPU v3                  1.106e-01  7.048e-03  15.696  < 2e-16 ***
Training_hardwareGoogle TPU v4                  9.957e-02  7.939e-03  12.542  < 2e-16 ***
Training_hardwareHuawei Ascend 910              1.112e-01  1.862e-02   5.969 2.79e-07 ***
Training_hardwareNVIDIA A100                    1.077e-01  6.993e-03  15.409  < 2e-16 ***
Training_hardwareNVIDIA A100 SXM4 40 GB         1.020e-01  1.072e-02   9.515 1.26e-12 ***
Training_hardwareNVIDIA A100 SXM4 80 GB         1.014e-01  1.018e-02   9.958 2.90e-13 ***
Training_hardwareNVIDIA GeForce GTX 285         3.202e-01  7.491e-02   4.275 9.03e-05 ***
Training_hardwareNVIDIA GeForce GTX TITAN X     1.601e-01  2.630e-02   6.088 1.84e-07 ***
Training_hardwareNVIDIA GTX Titan Black         1.498e-01  3.328e-02   4.501 4.31e-05 ***
Training_hardwareNVIDIA H100 SXM5 80GB          9.736e-02  9.840e-03   9.894 3.59e-13 ***
Training_hardwareNVIDIA P100                    1.604e-01  1.922e-02   8.342 6.73e-11 ***
Training_hardwareNVIDIA Quadro P600             1.714e-01  3.756e-02   4.562 3.52e-05 ***
Training_hardwareNVIDIA Quadro RTX 4000         1.538e-01  3.263e-02   4.714 2.12e-05 ***
Training_hardwareNVIDIA Quadro RTX 5000         1.819e-01  4.021e-02   4.524 3.99e-05 ***
Training_hardwareNVIDIA Tesla K80               1.125e-01  1.608e-02   6.993 7.54e-09 ***
Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   1.072e-01  1.353e-02   7.922 2.89e-10 ***
Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  9.444e-02  2.030e-02   4.653 2.60e-05 ***
Training_hardwareNVIDIA V100                    1.420e-01  1.201e-02  11.822 8.01e-16 ***
Training_time_hour:Hardware_quantity            2.296e-09  9.372e-10   2.450  0.01799 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 0.05497984)

    Null deviance:    NaN  on 70  degrees of freedom
Residual deviance: 3.0043  on 48  degrees of freedom
AIC: 345.39

When converting slopes has changed the percentage of variable variables, each continuous flexibility result was almost zero, even a little negative:

All the club were converted and turned back around 1 kWh with the first measure. The results do not make sense at least at least the slopes to grow and intense use of power. I wondered if I used the panel connected model with the same predictors they might produce different results, so I agree with the model and:

glm(formula = Energy_kWh ~ Training_time_hour * Hardware_quantity + 
    Training_hardware + 0, family = Gamma(link = "log"), data = df)

Coefficients:
                                                 Estimate Std. Error t value Pr(>|t|)    
Training_time_hour                              1.818e-03  1.640e-04  11.088 7.74e-15 ***
Hardware_quantity                               7.373e-04  1.008e-04   7.315 2.42e-09 ***
Training_hardwareGoogle TPU v2                  7.136e+00  7.379e-01   9.670 7.51e-13 ***
Training_hardwareGoogle TPU v3                  1.004e+01  3.156e-01  31.808  < 2e-16 ***
Training_hardwareGoogle TPU v4                  1.014e+01  4.220e-01  24.035  < 2e-16 ***
Training_hardwareHuawei Ascend 910              9.231e+00  1.108e+00   8.331 6.98e-11 ***
Training_hardwareNVIDIA A100                    1.028e+01  3.301e-01  31.144  < 2e-16 ***
Training_hardwareNVIDIA A100 SXM4 40 GB         1.057e+01  5.635e-01  18.761  < 2e-16 ***
Training_hardwareNVIDIA A100 SXM4 80 GB         1.093e+01  5.751e-01  19.005  < 2e-16 ***
Training_hardwareNVIDIA GeForce GTX 285         3.042e+00  1.043e+00   2.916  0.00538 ** 
Training_hardwareNVIDIA GeForce GTX TITAN X     6.322e+00  7.379e-01   8.568 3.09e-11 ***
Training_hardwareNVIDIA GTX Titan Black         6.135e+00  1.047e+00   5.862 4.07e-07 ***
Training_hardwareNVIDIA H100 SXM5 80GB          1.115e+01  6.614e-01  16.865  < 2e-16 ***
Training_hardwareNVIDIA P100                    5.715e+00  6.864e-01   8.326 7.12e-11 ***
Training_hardwareNVIDIA Quadro P600             4.940e+00  1.050e+00   4.705 2.18e-05 ***
Training_hardwareNVIDIA Quadro RTX 4000         5.469e+00  1.055e+00   5.184 4.30e-06 ***
Training_hardwareNVIDIA Quadro RTX 5000         4.617e+00  1.049e+00   4.401 5.98e-05 ***
Training_hardwareNVIDIA Tesla K80               8.631e+00  7.587e-01  11.376 3.16e-15 ***
Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   9.994e+00  6.920e-01  14.443  < 2e-16 ***
Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  1.058e+01  1.047e+00  10.105 1.80e-13 ***
Training_hardwareNVIDIA V100                    9.208e+00  3.998e-01  23.030  < 2e-16 ***
Training_time_hour:Hardware_quantity           -2.651e-07  6.130e-08  -4.324 7.70e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 1.088522)

    Null deviance: 2.7045e+08  on 70  degrees of freedom
Residual deviance: 1.0593e+02  on 48  degrees of freedom
AIC: 1775

By this time, Training_Time including Hardware_quantity It will enhance complete energy use by 0.18% per hour additional hour and 0.07% of each Chip, respectively. At that time, their cooperation can reduce power consumption by 2 × 10⁵%. These results even have additional mind as Training_Time can reach 7000 hours and Hardware_quantity Up to 16000 units.

Picture visualization is a better difference, creating two farms by comparing predictions (shown as per lines) from both models. The left panel used the modified Gamma Glm model, where safe lines were nearly a secondary and near zero, nowhere from a strong green data lines. On the other hand, the right panel used by the Gamma Glgm model linked to log-Gamma Glm, where the heading lines are closely accompanied by real lines.

test_data <- df[, c("Training_time_hour", "Hardware_quantity", "Training_hardware")]
prediction_data <- df %>%
  mutate(
    pred_energy1 = exp(predict(glm3, newdata = test_data)),
    pred_energy2 = predict(glm3_alt, newdata = test_data, type = "response"),
  )
y_limits <- c(min(df$Energy_KWh, prediction_data$pred_energy1, prediction_data$pred_energy2),
              max(df$Energy_KWh, prediction_data$pred_energy1, prediction_data$pred_energy2))

p1 <- ggplot(df, aes(x = Hardware_quantity, y = Energy_kWh, color = Training_time_group)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  geom_smooth(data = prediction_data, aes(y = pred_energy1), method = "lm", se = FALSE, 
              linetype = "dashed", size = 1) + 
  scale_y_log10(limits = y_limits) +
  labs(x="Hardware Quantity", y = "log of Energy (kWh)") +
  theme_minimal() +
  theme(legend.position = "none") 
p2 <- ggplot(df, aes(x = Hardware_quantity, y = Energy_kWh, color = Training_time_group)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  geom_smooth(data = prediction_data, aes(y = pred_energy2), method = "lm", se = FALSE, 
              linetype = "dashed", size = 1) + 
  scale_y_log10(limits = y_limits) +
  labs(x="Hardware Quantity", color = "Training Time Level") +
  theme_minimal() +
  theme(axis.title.y = element_blank()) 
p1 + p2
Figure 8. Relationships between hardware and log-for-use log in all training teams. In both panels, rural information is displayed as points, strong lines represents the amounts included from direct models, and broken lines represents ordinary prices from various specific models. The left panel uses the converted Gamma Glm, while the relevant panel uses Gamma GLM linked to Gamma GLM with similar predictions.

Why is the log change fail

To understand the reason why the model modified with log-transformed cannot capture lower results such as log-connected, let's go what happens when installing log change:

Suppose it is equal to some X and Error error:

When we use log transform to y, we actually press both f (x) and error:

That means we symbolize several variables, log (Y). When we are connecting for our g (x) – in my case g (x) = training_hour * Hour * quantity + training_khardare– Trying to capture the consequences of both “Shrunk” f (x) error term.

On the contrary, when we use the log link, we can still model the original Y, not a converted version. Instead, the model indicates our G (X) work to predict.

The model then reduces the difference between the actual y and y.

Store

Converting Log-Translopform is not like using the log link, and may always allow reliable results. Under the Hood, the Log Change is transforming the flexibility itself and is distorted variations and noise. Understanding this subtle difference of mathematics after your models is as important as trying to find the appropriate model.


[1] Epoch Ai. Data in Ai Antable Ai models . Returned from

[2] The University of Virginia Library. Translating logs to modify in a straight model.Returned from

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button