Journal of Student Research 2018
50 Journal Student Research isolate the effects of independent variables—such as 2010 population, region of the country, and population density - that may influence the dependent variable, which is taxi fare. After estimating the linear equation, the coefficients that are generated can be tested to prove if they are statistically different from zero. If they prove to be statistically significant, it can be concluded that they have a non-zero effect on taxi fare. The 2010 population of an observation location was included as a control variable in my linear estimation, which was included to serve as a proxy for time spent in a taxi due to traffic congestion. 5 The time spent in a taxi could be a result of other confounding factors, such as a city’s infrastructure at the local level, that influence the taxi fare. These confounding factors are captured by the error term. The following variables were used as control variables in the final model: the distance of the ride, regional location, and 2010 population, which are listed in Table 5, and displayed in equation (1). Taxi farei = ß0 + ß1*(Uberi) + ß2*(2010 Populationi) + ß3*(Northeasti) + ß4*(Southi) + ß5*(Midwesti) + µi (1)
Y i
Taxi fare
Dependent
Continuous
X 1 X 2 X 3 X 4 X 5
Uber
Primary regressor
Binary
2010 population
Control Control Control Control
Continuous
Northeast
Binary Binary Binary
South
Midwest
Table 5: Variables used in the multiple linear regression Table 5: Variables used in the multiple linear regression
Each of the variables used in the final model serves to represent determinant factors of taxi fare. The population of a city, the density of the population, and where the market is located can contribute to the taxi fare. The West regional variable was removed to avoid the dummy variable trap, which produces perfect multicollinearity in the regression equation, and thus, ordinary least squares (OLS) estimates cannot be computed. Results The regression model predicts that the average taxi fare (dependent variable) is a function of the presence of Uber, 2010 population, and region 5 I also estimated a linear-log model by taking the natural log of 2010 population. The results of this model showed the natural log of 2010 population to be less significant than the original.
Made with FlippingBook - Online Brochure Maker