Introduction and Research Objective

AirBNB’s website contains a collection of datasets that summarize its users’ listings internationally. Our group decided to narrow our focus on the listings dataset of the greater Los Angeles area because we are interested in analyzing possible factors that affect the rental prices. Because there is information for over 200 neighorhoods, we looked at the rentals of the major cities in western Los Angeles as this area encompasses UCLA. We decided to examine if the room type and location, described as neighborhood in the dataset, affect the price of a listing, and if so how they affect the price of a listing. We start off by visualizing the relationships between these variables in order to identify any patterns. Using a multiple linear regression model, we attempt to quantify the significance of the variables of interest in terms of a prediction standpoint.

Visualizations

Here we measure the Log of the Price variable in order to better visualize the scale of the prices. From the start, we see that in the West Los Angeles area, Malibu has the highest average listing price regardless of room type. Westchester on the other has the lowest average listing price. Examining room type, we see that entire homes/ apts dominate the higher price ranges while shared rooms mainly occupy the lower price ranges. Hotel rooms seem to rank right after entire homes/ apts followed by private rooms.

In this plot, we reaffirm our observations earlier about pricing by room type. It is important to note that there is a bit of variation with each room type, most notably with Hotel rooms and Shared rooms.

Here we examine the Minimum Number of Nights against the price. Most listings have a minimum number of nights below 50. We hypothesize that the values above 50 are outliers and may be due to user inputted errors. The graph here does not seem to indicate a clear trend between these two variables.

Here we take a look at all entires that have a Minimum Number of Nights under 50. There again does not seem to be a clear trend.

The Pearson Correlation Test results indicate that there is a very slight (almost negligible) negative correlation between Log(Price) and Minimum Number of Nights. Although this is very slight, a possible reason for this may due to the risk associated with a higher minimum number of nights. These listings have less flexibility and thus may require less of a price as compensation for the lowered flexibility.

R Code for Analysis

## # A tibble: 10 x 4
##    neighbourhood  price minimum_nights lprice
##    <fct>          <dbl>          <dbl>  <dbl>
##  1 Beverly Hills   5.50          10.1    5.50
##  2 Century City    5.30          10.1    5.30
##  3 Culver City     4.80           4      4.80
##  4 Malibu          6.65           3.91   6.65
##  5 Santa Monica    5.02          17.3    5.02
##  6 Sawtelle        4.77           6.17   4.77
##  7 Venice          5.23           5.65   5.23
##  8 West Hollywood  5.00          12.9    5.00
##  9 Westchester     4.56           6.41   4.56
## 10 Westwood        5.00          10.5    5.00
## Analysis of Variance Table
## 
## Response: price
##                 Df     Sum Sq  Mean Sq F value    Pr(>F)    
## neighbourhood    9  811443394 90160377 139.127 < 2.2e-16 ***
## room_type        3   66337192 22112397  34.122 < 2.2e-16 ***
## Residuals     7689 4982788735   648041                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call:
## lm(formula = log(price) ~ neighbourhood + room_type, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3911 -0.4003 -0.0916  0.2772  4.5005 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  5.69370    0.02707 210.347   <2e-16 ***
## neighbourhoodCentury City   -0.21859    0.12059  -1.813   0.0699 .  
## neighbourhoodCulver City    -0.60792    0.04589 -13.248   <2e-16 ***
## neighbourhoodMalibu          1.02974    0.04297  23.964   <2e-16 ***
## neighbourhoodSanta Monica   -0.44940    0.03272 -13.736   <2e-16 ***
## neighbourhoodSawtelle       -0.62133    0.03640 -17.071   <2e-16 ***
## neighbourhoodVenice         -0.31666    0.02958 -10.707   <2e-16 ***
## neighbourhoodWest Hollywood -0.48348    0.03527 -13.710   <2e-16 ***
## neighbourhoodWestchester    -0.67998    0.04470 -15.214   <2e-16 ***
## neighbourhoodWestwood       -0.38763    0.03665 -10.576   <2e-16 ***
## room_typeHotel room         -0.07080    0.07992  -0.886   0.3757    
## room_typePrivate room       -0.76011    0.01835 -41.419   <2e-16 ***
## room_typeShared room        -1.44248    0.04412 -32.698   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6547 on 7689 degrees of freedom
## Multiple R-squared:  0.4155, Adjusted R-squared:  0.4146 
## F-statistic: 455.5 on 12 and 7689 DF,  p-value: < 2.2e-16

Results

The tibble displayed at the top of the analysis summarises the mean price and minimum nights by each neighbourhood. Malibu has the highest average price per listing but it has the lowest amount of minimum nights required. This could relate to the added flexibility from having a lower amount of minimum nights dictating a higher price range. From the ANOVA table, the p-value of 0 suggests that the overall model is significant. In particular, the type of neighborhood and room type are significant in predicting the price of a listing. Furthermore, the summary of the log-linear model specifies that each city is significant in predicting the price. The type of room was only signifiant if it was a private or shared room. Although our R-squared value of 0.4146 is relatively small, it is clear that the two predictors do a good job of explaining part of the variation in price which is important for owners deciding how much to price their listing(s).

Summary

Our group was interested in AirBNB’s Los Angeles dataset and wanted to determine if certain variables could predict the price of a given rental. Specifically, we examined the variables neighborhood and type of room. We created graphics to visualize the difference of price per room type and price difference per neighborhood. There seemed to be a difference in price based on each of these variables, so we analyzed the data numerically and found that both were significant. From our third graphic, we concluded that there did not seem to be a significant relationship between the price and the minimum number of nights required for a listing, but we did see that room type and price were correlated with entire homes/ apartments costing more on average. From our analyses, we recommend that owners who wish to post listings in any of the above areas examine the average price by location, as this factor is very important in determining the price of a listing. This is especially important with Malibu, as prices in this location tend to stray further from the prices of other locations.