Loudoun Price Calculator

About the Project

A Project Caused by COVID-19

Late in spring 2020, I received a phone call that my internship would be canceled for the summer due to concerns over the increasing severity of the COVID-19 outbreak. From that day on, I was determined to not let my summer go to waste and do my best to put my skills to use.

Obtaining & Processing the Data

The project began as a simple regression project. I began by using Python's Beautiful Soup library to web scrape home listings off of a real estate listings website. I initially began web scraping previously sold houses in Ashburn, Virginia, but subsequently decided that the project's scope was too small, so I decided to expand my dataset to include recently sold houses in Sterling and Leesburg, Virginia. This resulted in a collection of over 500 sold homes between the beginning of June to early July.

The data wasn't ready to be modeled just yet. Many inconsistencies and discontinuities existed within the data. For example, some realtors include the home's lot size in acres, but others input it in square feet. Another example, high school information is included for some listings but not others. In fact, I even saw a couple listings with incorrect high school information. This made sense as several Loudoun neighborhoods have been redistricted several times over the past decade, but for my analysis to be accurate, all of this needed to be corrected. A significant chunk of time was dedicated to cleaning the data and preparing it for model building.

Model Building

Once the data was cleansed, I was ready to move into the model building phase. I transitioned from Python to R, as I prefer R's regression functions and output. I began by trying to construct a single model for all 500+ observations from each of the three towns. After separate attempts and countless hours, I decided to construct three separate models (one for each town) as opposed to one large model. This is not what I originally intended to do; however, I wanted to produce the highest quality deliverable I could. In this case, the three individual models had higher accuracy and higher predicting power compared to the one larger model, so I decided to continue with three separate models.

Reporting

Lastly, I decided that I would produce a small write-up on the project. However, before I started this phase, I decided that it would be neat to create a small website where users can input characteristics about their house and view a predicted house price. They would be able to use my regression model to deliver insights about their own home. The only problem was that I've never created a website before. I learned some primitive HTML for roughly a week in the seventh grade, but that could only get me so far.

I began by trying to make the form for users to input information about their house. I started looking around online for a way to do this. I stumbled across JavaScript forms, so that's what I used. I had never used JavaScript before, so it took some time to learn. I began by following a template to make a simple calculator and looking at lots of old questions on Stack Overflow. Eventually, I got the form up and running.

I've always believed that you are going to do something, you should do it right. I didn't want to have spent a significant amount of time on this project only to have a website that looked like it was developed in 1990, so I found a template online and learned how to alter it to my needs. I learned about CSS files, and how they can be used to style my website. With time, I ended up with the finished product you see today. This project has taught me a lot, and I hope you are able to enjoy it.