MLOps at Redfin
Redfin is a real estate brokerage founded by software developers that makes the process of buying and selling houses more efficient. They are the biggest real estate brokerage site on the web. When Akshat initially joined the company, the machine learning team at Redfin was a separate unit that focused on both the company’s early ML use cases as well as the infrastructure required to support them. However, this “siloed” structure became difficult to scale as Redfin continued to expand their operations. Now, the infrastructure team is working to standardize and democratize the ML platform, so data scientists within the company’s various product teams can focus on building the models they need while taking advantage of the centralized infrastructure Akshat’s team provides. While AI is built into many aspects of Redfin’s product, their two main ML-driven features are their estimation and recommendation algorithms. Redfin Estimate is a product that calculates the market value of any given home. The algorithm takes in a number of different variables about the market, the neighborhood, and aspects of the house itself, inputting over 500 data points for each calculation. Initially, Redfin Estimate was developed as a big user growth driver, as anyone could use it to estimate the value of their home. However, now that it’s being used to inform Redfin’s instant buying business, it has also become a key internal tool for the company’s agents. Redfin Recommendations is the company’s newer ML-driven product, ultimately responsible for 25% of the site’s traffic. Recommendations provide a list of recommended homes to customers on the website, and send personalized emails and push notifications to alert customers when a home they might be interested in arrives on the market. One interesting finding Akshat shared was that the recommendation algorithm is actually better at predicting homes people are interested in than their own self-identified saved searches. For example, when Akshat bought his own home in the Seattle area, the recommendation algorithm showed him a home in a different area than he had initially searched. Turned out that this home was within reasonable commuting distance for him, and a better fit for him than the others he was considering. Akshat ended up buying the home and moving in there!
Data Sources & Model Types at Redfin
Akshat shared that much of the data which informs the Redfin Estimate is pulled from the local multiple listing services, MLSes, which provide a central database for all homes on the market and previously listed. The MLS contains information about nearby amenities, like Starbucks or recreation centers. The algorithm also pulls geographical data, like nearby flood zones. Another data point the algorithm includes is user engagement on Redfin.com; if a neighborhood is getting a lot of engagement on the site, it could influence the valuation of a home in that neighborhood. Redfin employs both classical and deep learning models for various business applications. For the Redfin Estimate model, there are different models that estimate listed and unlisted homes. The listed home is more of a real-time model as things are changing more regularly. The team uses a combination of many different techniques to keep track of this complexity, including random forest and gradient boosting. Ensemble and hierarchical models are also present, like calculating a walk score and feeding that into the overall estimate. Since it’s a fairly complex problem, the team has built a fairly complex model with a multitude of techniques to make the estimate as accurate as possible.
Responsible AI at Redfin
When it comes to Responsible AI, the Redfin teams are very conscientious of the data they use and how it can create biases in their models. One statistic they don’t include in order to avoid programming these biases is crime data. It’s an interesting cycle as the market definition influences the algorithm, and as people buy homes based on the algorithm’s recommendation, the algorithm goes back and influences the market.
Impact of the Pandemic on Real Estate
The pandemic did not diminish the housing market the way it put a dent in many other industries. Instead, low mortgage rates and lots of time inside propelled many people into taking the leap to buy a home. The resultant yearly increase in the price of homes was 26% in May 2021, which went up to 40% in some popular areas. This resulted in a very fast and competitive market for homebuyers. In this competitive market, even a few hours’ head start could determine whether or not you got the house. A product the Redfin team created to help their users be the first to respond to popular new homes was Hot Homes, which recognizes early which houses will sell quickly and marks them so their users can get an early bid. Initially the feature was informed by user engagement, but in order to increase the amount of time prospective buyers had to make their move, the team modified the program so it was based on the model’s prediction on how long until the house was sold, a much quicker feature to process. Another unique phenomenon that came during the pandemic was that 63% of buyers on Redfin bought a home without even visiting it. In order to support this, Redfin increased the number of virtual tour options, including 3D scans and tagging floor plan images. Other interesting trends emerged during the pandemic. The sale of single-family homes increased significantly as people were looking to get away from spaces where they would have to frequently interact with others. Because the pandemic had such a big impact on people’s housing preferences and the market was so topsy-turvy the last 18 months, many of the models were selectively retrained in order to improve accuracy. Obviously, this retraining had a time and resource cost, but the market was so dynamic that it was deemed worthwhile in order to keep the program up-to-date.
Internal ML Uses
The teams at Redfin are very interested in better leveraging different kinds of data. They created a designated Document Intelligence team that is dedicated to writing models that can extract accurate information from real estate transactions. They are also working on building image tags and extracting useful information from unstructured data. Automating the Comparative Market Analysis (CMA) document is another way ML enhances the workflow at Redfin. A CMA is a document that real estate agents typically take a few hours to make for customers looking to sell their homes. Redfin has models that can automatically generate these documents, saving their agents the hours they would have spent writing them, improving the agent’s workflow. Akshat’s priority for ML in the company is finding ways technology can help improve human efficiency. He is skeptical of the impact of conversational AI (e.g. “chatbots”) at Redfin — people generally still want people to help them find their new homes.
MLOps at Redfin
Recently, Redfin has created their own MLOps platform, which they call “Redeye”. Redeye brings together the company’s common tools, including Kubernetes, AWS, Spark, Airflow, and MLFlow. Redeye also features an internal feature library and data catalogue. They have worked to configure it so it’s accessible for both data scientists and machine learning engineers.
“We are not trying to recreate SageMaker. What we’re trying to do is bring together all of these already existing tools and make it easy for those things to be used with our data.”
Akshat hopes that having a single location featuring centralized tools and best practices will improve the current workflow and facilitate the onboarding process for new data and machine learning engineers. Going forward, Akshat foresees Redfin investing in building computer vision systems to help improve estimation and recommendation model accuracy. He also hopes that the company will assemble programs that facilitate the process of home touring, and maybe even help with mortgages if the company moves in that direction in the future.