A Project Caused by COVID-19
Late in spring 2020, I received a phone call that my internship would be canceled for the summer due to
concerns over the increasing severity of the COVID-19 outbreak. From that day on, I was determined to not let my summer go to
waste and do my best to put my skills to use.
Obtaining & Processing the Data
The project began as a simple regression project. I began by using Python's
Beautiful Soup library to web scrape home listings off of a real
estate listings website. I initially began web scraping previously sold houses in Ashburn, Virginia, but subsequently decided that
the project's scope was too small, so I decided to expand my dataset to include recently sold houses in Sterling and Leesburg, Virginia.
This resulted in a collection of over 500 sold homes between the beginning of June to early July.
The data wasn't ready to be modeled just yet. Many inconsistencies and discontinuities existed within the data. For example,
some realtors include the home's lot size in acres, but others input it in square feet. Another example, high school information is
included for some listings but not others. In fact, I even saw a couple listings with incorrect high school information. This made
sense as several Loudoun neighborhoods have been redistricted several times over the past decade, but for my analysis to be accurate,
all of this needed to be corrected. A significant chunk of time was dedicated to cleaning the data and preparing it for model building.
Model Building
Once the data was cleansed, I was ready to move into the model building phase. I transitioned from Python to R, as I prefer R's regression functions and output. I began by trying to construct a single model for all 500+ observations from each of the three towns. After separate attempts and countless hours, I decided to construct three separate models (one for each town) as opposed to one large model. This is not what I originally intended to do; however, I wanted to produce the highest quality deliverable I could. In this case, the three individual models had higher accuracy and higher predicting power compared to the one larger model, so I decided to continue with three separate models.
Reporting
Lastly, I decided that I would produce a small write-up on the project. However, before I started this phase,
I decided that it would be neat to create a small website where users can input characteristics about their house and view
a predicted house price. They would be able to use my regression model to deliver insights about their own home. The only problem
was that I've never created a website before. I learned some primitive HTML for roughly a week in the seventh grade, but that could
only get me so far.
I began by trying to make the form for users to input information about their house. I started looking around online for a way to do
this. I stumbled across JavaScript forms, so that's what I used. I had never used JavaScript before, so it took some time to
learn. I began by following a template to make a simple calculator and looking at lots of old questions on Stack Overflow.
Eventually, I got the form up and running.
I've always believed that you are going to do something, you should do it right. I didn't want to have spent a significant amount of
time on this project only to have a website that looked like it was developed in 1990, so I found a template online and learned
how to alter it to my needs. I learned about CSS files, and how they can be used to style my website. With time, I ended up
with the finished product you see today. This project has taught me a lot, and I hope you are able to enjoy it.