The Cost of Flight Tickets: Statistical Analysis of Some Factors

How Does Distance and Time of Travel Impact of the Cost of Flight Tickets?

Carl Marx
Introduction

The general belief is that the further a journey the more expensive the flight ticket will be for the passenger. Others believe that, as flight paths are seldom along the shortest possible route, the in flight duration of the journey is correlated to the price.

In this project the correlation between the price of a flight ticket as the dependent variable and the distance as well as the duration of international journeys as the independent variables, have been evaluated.

The distance dimension was considered as an independent variable as passengers have little interest in the in flight route, they are mostly interested in arriving at the destination at a given cost.

The duration dimension was also considered as an independent variable because it was discovered that flights does not necessarily follow the shortest route between two destinations which will influence the time of flights. The time dimension could influence the cost of the tickets as more fuel is used the longer the airplane stays in the air on the one hand and on the other hand the faster it flies the higher the fuel consumption will be.

Ensuring Comparability of Research Data

In order to ensure that prices of various flights were comparable it was decided to find one airport from where a variety of international destinations was available using the same airline carrier. Another factor that was considered was that flights should all be departing on the same day to take seasonality out of the equation. It was further decided to only make use of one way flights for the study as return flights may vary in price depending on the duration of stay and the season at the destinations may differ that will also influence the price.

It was discovered that the economy class is subdivided into 5 different pricing categories that were not all available on the same day to the various destination and that not all destinations have first class availability, for these reasons it was decided to make use the cost of a business class ticket for the study. To further ensure consistency all quotes for flight bookings were done on the same day.

The departure airport that was selected is Dubai International Airport in the Middle East. A random date far enough into the future to ensure that business class tickets were available to all the destinations were selected. A total of 58 destinations were selected. These vary in duration from 1 hour to 16 hours and 55 minutes. The duration of the flights were taken to be the time published when the booking is made. The cost of the flights varied from AED 25 445-00 to AED 1 645-00.

For the purpose of this study the distance between Dubai airport and the various destination airports were calculated along the surface of the earth in a direct line from airport to airport. The "Great Circle Mapper" website was used to calculate these distances.

This website uses the WGS 84 model of the earth's ellipsoid in the calculation. This website is often used by pilots doing preparation for flight planning and therefore it was concluded that the distances are accurate enough for the purposes of this study. The site does however have a warning stating that "The information may not be accurate or current and is not valid for navigation or flight planning."

The ticket prices are all expressed in AED as this is the national currency of Dubai. There will be no benefit in converting the monetary value of the tickets into another currency. Therefore the Dubai monetary unit, the AED is used as for the purpose of this study.

Once it was decided to use Emirates Airlines as the study object of the project, a list of destinations was obtained from their website. These are categorized into 5 sections comprising of Europe, Americas, Middle East, Asia Pacific & Australasia and Africa

A quote to fly from Dubai to the 58 destinations was requested online. The information supplied was then compiled in a table format for ease of use.

Statistical Analysis

The purpose of the study was to establish whether there are a positive correlation between the cost of the flight and the duration of the flight and also whether the correlation between the price and the distance of the flights are similar.

The statistical technique that was used in this study was to calculate the correlation coefficient to show how strongly pairs of variables are correlated and whether the correlation is positive, neutral or negative.

Correlation is a methodology to determine the statistical relationship between the independent variable and the dependant variable.

The primary meaning of the coefficient of correlation can be found in the quantity of the variation in one variable that reflects the variable it is correlated with.

While correlation coefficients are normally reported as r = (a value between -1 and +1), squaring them results in the coefficient of determination, making it easier to understand.

When one squares the correlation coefficient it results in the coefficient of determination's value. The size of the coefficient of determination is an indication of the share of variation that one variable has on another. This variation is based on knowledge of the alternative variable. When the answer is multiplied by 100 this value returns the percentage of variance that can be attributed to the dependant variable. The coefficient of determination is the primary information that is obtained when one uses the statistics linear model. An r of ±0.5 means 25% of the variation is related (±0.5 squared = 0.25). An r value of ±0.7 means 49% of the variance is related (±0.7 squared = .49).

A key point of importance is never to assume that a correlation implies that a change in one variable is causing a change in the other. For example, the sales of mobile phones and healthy food have both risen strongly in the past number of years and a high correlation can be calculated when certain geographical areas are selected, but one cannot assume that buying mobile phones causes people to buy healthy food of the other way around.

Another limitation is that the correlation technique works best with linear relationships. This implies that as one variable increases in value, the other also increases (or decreases) in direct proportion. It does not work as well with curved relationships where the relationship does not follow a straight line. A number of various curved relationships are possible. In these types of relationships it is clear that there is a relationship between the variables but the relationship does not follow a straight line.

The coefficient of determination, r2, is a valuable determinant as it provides us with the percentage of the fluctuation of one variable that is predictable as a result of the other variable. It is a measure that allows us to determine how certain one can be in making predictions from the data.

It represents how well the regression line represents the data. In cases where the regression line passes exactly through all the points on the scatter graph, it will be possible to explain all the variations. The further the distance from the line to the points, the less it is able to explain variations.

Interpretation of the Coefficient of Correlation

Although correlation may be fairly obvious when data is superficially viewed it may contain unsuspected correlations. One may also suspect that there are correlations, without knowing which are the strongest when comparing two different data sets. An objective correlation analysis can lead to a better understanding of the behaviors of the data.

Simply calculating the coefficient of correlation does not have any significance until and unless one determines how large the coefficient must be in order to be significant. Determining what correlation tells one about the data is just as important. For the purposes of this study the size of the correlation coefficient that was considered during the interpretation included that a Perfect Positive or negative Correlation would have a value of ± 1, a Very High Positive or Negative Correlation a value of ± 0.90 to ± 0.99, a High Positive or Negative Correlation a value of ± 0.80 to ± 0.89, a Moderate Positive or Negative Correlation a value of ± 0.50 to ± 0.79, a Low Positive or Negative Correlation a value of ± 0.30 to ± 0.49, a Very low Positive or Negative Correlation a value of ± 0.10 to ± 0.29, Almost No & Negligible Positive or Negative Correlation a value of ± 0.01to ± 0.09 and No Correlation a value of ± 0.

Two random variables are described as being positively correlated if high values of one are expected to be related with high values of the other. They are probably negatively correlated if high values of one are anticipated to be associated with low values of the other.

It should be noted that correlation does not mean causation. In other words, just because two events are correlated does not mean that one causes the other, or has anything to do with it. Correlations only describe the behavior of one set of data in terms of the other. Any conclusions beyond the relative behavior cannot be concluded from correlation alone. A strong statistical correlation often warrants some more investigation to determine if there is a causal relationship.

A scatter graph is a statistical diagram drawn to compare two sets of data. It can be used to look for connections or a correlation between the two sets of data. This is the ideal tool to use for this study.

A scatter graph or sometimes also called a scatter diagram is used to plot two sets of data to graphically see whether a correlation exists between the data sets. It is an especially useful method to use when analyzing data where one is trying to prove a hypothesis such as is the case here.

A scatter chart has two value axes instead of one value axis and one category axis like most chart types.

If the data has a perfect positive correlation then the pattern of the plots in the graph will be a straight line rising from left to right. If the data has a perfect negative correlation then the pattern of the plots in the graph will be a straight line descending from left to right. The data points will be scattered over the entire graph if there is no or only a limited relationship between the data.

Calculations

Duration/Cost Correlation

Using the data for the cost and duration the values were calculated for the price/duration data set. Utilizing the correct statistical formula the correlation coefficient of the cost/duration data was determined to be r = + 0.856

As confirmation the data was also reduced to a scatter graph to graphically confirm the correlation as determined using the correlation coefficient formula. The cluster that was seen on the scatter graph approached a line in appearance, confirming that there is a strong linear positive correlation between duration of a flight and the price of the ticket. From the graph it was clear that as the duration increases so does the price.

Distance/Cost Correlation

Using the data for the cost and distance the values were calculated for the price/duration data set. The correlation coefficient of the cost/duration data set was calculated to be r = + 0.863.

As confirmation the data was also reduced to a scatter graph to graphically confirm the correlation as determined using the correlation coefficient formula. The distribution of the data in the scatter graph confirmed that there is a strong positive correlation between duration of a flight and the price of the ticket. From the graph it was clear that as the duration increases so does the price. The spread of the data points were visibly less than that of the cost/duration scatter graph. This is confirmed by the coefficient of correlation (r) of the cost distance calculation being slightly closer to +1 than that of the cost/duration coefficient of correlation.

Coefficient of Determination

Duration/cost Coefficient of Determination

From the results of the previous calculations the coefficient of determination was determined to be 0.73 which translates to 73 %. Therefore the proportion of variance in the cost of the ticket that can be explained from knowledge of the duration variable is 73%

Distance/cost Coefficient of Determination

From the results of the previous calculations the coefficient of determination was determined to be 0.74 which translates to 74 %. Therefore the proportion of variance in the cost of the ticket that can be explained from knowledge of the duration variable is 74%

Conclusion

Despite all efforts to eliminate sampling errors and obtain a sample that is large enough to be a representative sample it is realized that there are still some errors present. It is almost sure that the demand to a specific destination will have an influence on the price charged for a flight ticket to any particular destination. The study also did not take into account that the jet stream flow from west to east in the upper portion of the troposphere. This can cut the trip time of flights in this direction.

Notwithstanding the limitations of the study it is still clear from the data sets that there is a high positive correlation between the duration of the flight and the price of the ticket as well as the distance and the flight ticket.

© Carl Marx

Published by Carl Marx

A professional with +35 year management experience. With a Doctorate (DBA) & awarded the best financial management student on completion of the MBA degree a true asset. Experience includes extensive consulti...  View profile

To comment, please sign in to your Yahoo! account, or sign up for a new account.