Data is all the buzz around real estate, but how do we categorise it, what data do we need, what are the concerns and why should I care about the content of the local water supply before I consider whether to purchase a real estate asset?
Data is simply anything that can be measured and used for reference and analysis. For the sake of an ongoing example, let’s call “data” the usual variables of size, location, amenities and market condition upon which we would base an offer for a single residential property on any given day.
Alternative data is any data which is being used for anything than its primary collection purpose, and so sits outside of the realm of traditional data. So if the crime rates in an area on any day are being used to decide the price a person may offer for our same example property, this makes the local crime statistics an alternative data set.
Big data is traditionally defined through the three V’s: information which is produced with high velocity, variety and volume. However, all three of these require benchmarking to know exactly what constitutes their scales of magnitude, with no clear guidelines as to how. Within real estate, big data can be thought of as that which is being produced in near real time, too large for traditional regression and spreadsheet models to interpret. Within our example this could represent the social media activity and demographic profiles of those who have been convicted of crime in the local neighbourhood of our example property.
The reason for the recent ‘data buzz’ is due to the rapidly increasing ability of Machine Learning: a set of self-refining computer algorithms, able to find correlation in disparate data sets. This increased ability has been brought about by the exponential development of more efficient microchip processors in the computer hardware industry. This breakthrough has suddenly made the analysis of alternative big data sets possible to the world of real estate: a revolution that is fuelling the rise of increasingly intelligent PropTech organisations.
To once again return to my opening questions, what data do we need? Well, that all depends on what outcome we are hoping to achieve, and answered simply, we need any and all data that may offer insight or competitive advantage for our pricing decision of the example property. To tie all previous examples together, my final opening question, on the relevance of the local water supply to property pricing decisions, is an one being explored by start-up residential loan company Proportunity. They claim to offer more competitive loan terms than traditional companies based on their ability to predict the future value of a property they lend against through the analysis of alternative big data sets. Macaulay (2018) writes of some of Proportunity’s more novel methods: “Analysis of police arrests and the chemical compounds in sewers that people flush down their drains shows that when the use of crack cocaine drops gentrification could soon arrive, but when the crack is replaced by cocaine gentrification may already be complete.”
This level of granular data and attention to detail is revolutionising the way we think about the role of data in driving our investment decisions. But here is the caveat, while these statistics may seem clever, novel and objective, there needs to be an understanding of the assumptions on which the machine learning algorithms are developed. You see, in Proportunity’s case, their assumption of ‘coolness’ driven regional house price growth assumes that Richard Florida’s famed “creative class” drivers of gentrification holds true for all regions. While this is typically exemplified in the UK by Shoreditch’s Britpop driven growth of the 1990s, should we blindly believe that is the path that all gentrification of undervalued, inner-city UK regions must therefore follow? and accordingly must we all begin measuring water supplies to stay up with the market? However, unless you are aware of this assumption upon which you are basing your machine learning algorithms and can control for this variable, you could end up with an all together incorrect analysis, due to the overweighting of incorrect alternative big data in your predictive models.