Artificial intelligence – house price forecasts

By Christian Setzwein November 17, 2020

For a few years now, buying and selling houses has been a topic that concerns many people. Admittedly, it also concerns me, because my wife and I only bought a new house three years ago and sold our old property shortly thereafter. We did not use an agent to sell our property, and our biggest uncertainty was setting the price. This problem has now been taken up by online brokerage companies, some of which seem to be of Scottish descent because their name starts with a capital M and a small c. These companies offer to provide a price prediction for the house, supported by a small online questionnaire.

How does it work, how can software learn to make such forecasts? I would like to explain the core idea using a very simplified example: We take all the newspapers in the area and note the values “square meters of living space” and “price” for all the house ads in a table.

We can now use this data to train our forecasting model. The “square meters of living space” feature serves as input, while the price serves as the target value (or label) that is to be forecast for other data sets later on. In practice, one would

  • train with many more data sets to enable the algorithm to recognize patterns. not only with the feature “square meters of living space”, but also with other features such as “number of rooms”, “location”, “year of construction” and others. The selection of the features is crucial for the quality of the house price forecast; this process is called feature engineering.

But let’s continue with our simple example: the learning algorithm now generates a prediction model for a specific form of representation (mathematical function, neural network), which we select in a similar way to the features. In our simple example, we opt for a linear function as the prediction model. The following diagram illustrates what the learning algorithm does:

The learning algorithm generates a straight line for the training data (the red crosses) that is as close as possible to the training data, in this case the blue straight line. It could also have generated the green straight line, but we can see with the naked eye that the blue straight line fits the training data much better. Our simple forecast model, once calculated, is now ready.

If we now want to predict the price for any house, e.g. for a house with 120 square meters (blue circle on the lower line), we simply read the corresponding value for the price that is given by our prediction model: approximately €420,000 (blue circle on the vertical line).

The model is greatly simplified as described above. In practice, in addition to the points mentioned above, a more complex function would be used (not just a straight line) that better fits the patterns in the training data.

The software’s finding of the blue straight line can be imagined as follows:

The learning algorithm checks for different possible lines how far the value from the training data set (red cross) is from the value predicted by the line. In the graphic above, this is illustrated by the green spacers. The best straight line, which is then taken as the forecast model, is the one that keeps these distances to the training data as low as possible for all training data sets. For those who are interested in more detail: mathematically, this is essentially achieved by finding the straight line whose sum of squares of deviations is the smallest.

The type of learning presented here to predict the house price is called supervised learning. The software learns by being given examples (the red dots) that are required as answers by the prediction model in the future. Because a discrete value, a number, is to be predicted, it is a so-called regression problem that is solved by the learning machine.

I hope that the example of “forecasting a house price” has given you an insight into how machines can learn from training data and then provide forecasts. However, getting a good forecasting model is not quite as easy as in our example. I will cover the challenges of finding and training a model in one of the many upcoming articles on AI. However, every supervised learning regression problem is based on the principles presented, in modification and with increasing (mathematical) complexity.**

f