Lisa: Parsing Inquiries with Seq2Seq Transformers

In our previous blog post we introduced AppFolio AI Leasing Assistant, Lisa, giving a high-level overview of her capabilities and insight into the value she can offer. One of the key technical components which enables Lisa to perform so effectively is the Inquiry Parser.

What is Lisa’s Inquiry Parser?

The leasing conversation flow is often initiated by a prospective resident submitting an inquiry via an Internet Listing Service (ILS), such as Zillow, Apartments.com, or a private homepage for the property. The first component of Lisa to spring into action in response is the inquiry parser. We use it to extract information from inquiries, and process the data collected to start and facilitate a productive conversation in hopes it will lead to a showing, an application, and finally a signed lease.

Once an inquiry is submitted, Lisa receives an e-mail and parses it. All PII (Personal Identifiable Information) is processed and stored securely and not disclosed to anyone not directly involved in the leasing process. At a minimum, a phone number or email address is required to begin a text conversation with the prospective resident. However, with more information such as the prospect’s full name, their desired move-in date, and their unit type preference, Lisa can streamline the conversation as she doesn’t have to ask for it again.

Other than parsing data pertaining to basic information, source attribution is another key component of the inquiry parser. Lisa determines the source of each inquiry, enabling us to generate reports showing which ILS is driving the most business for property managers. 

The Regex Parser has close to 100% precision, but over time its recall will drop as new listing sites come online, or existing sites change their format. We continue to run the RegEx parser first and then augment it with fields from the ML parser. The parsed info is then used to create new, or update existing contacts and threads.

How Does Lisa’s Inquiry Parser work? 

Because there are hundreds of different listing sites, each with different and evolving formats through which they collect their customer’s data, it is a difficult task to parse the wide array of inbound inquiries to Lisa. Prior to the current iteration, our solution was a file with 4,000 lines of RegEx parsing code, that was frequently amended to keep up with formatting changes or addition of new listing sites. This ended up being a significant time sink and chore for our developers. 

Instead, we opted for a more effective solution. In addition to the RegEx, we added a Machine Learning powered parser that generalizes much better by drawing upon data collected from past listing emails and their parsed fields. Lisa now utilizes a Transformers-based, Seq2Seq (sequence-to-sequence) model to map a message derived from an inquiry into a structured string that makes the data trivial to parse. Transformers are a state of the art class of model architectures for Natural Language Processing (NLP) tasks. We leverage pre-trained language models and fine tune them to focus on specific tasks.

As its name suggests, Seq2Seq models transform a sequence into another sequence. A simple example is transforming a German sentence into French. The Transformer generates a target sequence by analyzing both the input and output generated so far, to determine the next token (sub word unit). With the information learned from pre-training on a very large corpus of data, it only needs a fairly modest amount of task specific training data to achieve strong performance.

An illustration of the activations of the attention mechanism that underpins the Transformer architecture. As we are encoding the word "it", part of the attention mechanism is focusing on "The Animal", and baked a part of its representation into the encoding of that word. Source: The Illustrated Transformer.

In our application, we want to extract information from an ILS message. We input the entirety of a message, and have the model output a structured summary sentence of that message. The following is a sample input and output from our model. The input sequence is the ILS message in the top-most text block, the middle text block contains the generated output sequence, and the bottom-most text-block contains the fully parsed output: 

The input sequence consisting of source domain, email subject and body (we remove URLs and HTML tags before passing it into the model) is mapped to a string that resembles natural language and is trivial to parse. We then check whether each value actually exists in the input (e.g. by regex matching phone numbers), and compute confidence scores for each field.

When generating the text in the middle block, the model decides which word to generate based on its relevance to the input text. To explain the model behavior, this relevance can be visualized as a score from each word in the input, and these scores can be added up to determine the final score for the output word (see images below). For example, to generate a part of the phone number, the model almost exclusively looks at the keyword “Phone” and the number that follows. However, when generating the first name, the model actually looks at multiple sentences in the input that mention the first name, even the email address. By looking at these visualizations we can understand how the model works and when its predictions are likely to be correct or incorrect.

Sample output of the SHAP explainer package. It shows the distribution of overall importance when generating part of the phone number (substring “555” in the green circle). The colors indicate which parts of the input the model deemed to have positive (red) and negative (blue) contributions. In this case the model mainly looked at the keyword “Phone” and the phone number itself.

Sample output of the SHAP explainer package. It shows the distribution of overall importance when generating the potential resident’s first name  (substring “Jon” in the green circle). The colors indicate which parts of the input the model deemed to have positive (red) and negative (blue) contributions. In this case the model mainly looked at the keyword(s) “first name”, “Jon” and “My name is Jon…”. 

We chose this model class because the label generation is straight-forward and performance is strong. Lisa simply maps input to target string, and we do not have to annotate exactly where to copy the data from, as would be required of more traditional token classification models. Lisa can read the input and determine the relevant information. There is also no need to post-process the parsed fields to obtain their canonical representation, such as for dates and phone numbers.

One important catch during data generation is that we have to ensure that the value we want to parse is actually present in the source. Otherwise, the model will tend to generate information that is not present. We implemented the same safeguard as a post-processing step, in order to avoid returning occasional “typos.”

In addition to the possibility of typos, another drawback of Seq2Seq models is that there is no obvious way to generate confidence scores. Seq2Seq models output a whole sequence, with the confidence of each predicted word depending on all the previously predicted words in the sentence. This makes it difficult to get a confidence score for the generated sequence or subsequences. Lisa generates confidence scores based on the similarity between the new ILS message and the messages previously used for training the model, as well as the score of the words from which we extract the information.

Lisa’s ML parser has reduced the number of unparsed inquiries to nearly zero and greatly improved the accuracy of data when conducting source attribution. Additionally, the parser has significantly reduced the workload of our operators, who would have had to parse them manually, and our developers who had to maintain the complex parsing code.

The inquiry parser is just one of many exciting components that make up Lisa. Stay tuned for the next post, as we deep dive into the main driver of our conversational system that will leave you questioning whether or not you are actually speaking to an AI.


Authors and contributors: Christfried Focke, Shyr-Shea Chang, Tony Froccaro, Miguel Rivera