extractive qa

Goal: Given user question about the product, extract relevant documents and answer the question.

Model: miniLM (deepset/minilm-uncased-squad2)

Dataset:  SubjQA pairs of question & answers

Steps:

Example: Extractive QA for e-commerce website

DATASET : SubQA electronics

Train examples: 1,295

Validation examples: 255

Test examples:  358

Example Context

I really like this keyboard. I give it 4 stars because it doesn’t have a CAPS LOCK key so I never know if my caps are on.  But for the price, it really suffices as a wireless keyboard.  I have very large hands and this keyboard is compact, but I have no complaints.

Example Question

Does the keyboard lightweight?

Example Answer

this keyboard is compact

MODEL


Span classification

Common QA frameworks:



Let's check Wright Flyer example with miniML:

Span classification QA framework:

Example : The Wright brothers flew the motor-operated airplane on December 17, 1903. Their aircraft, the W-Flyer, used ailerons for control and had a 12-horsepower engine.

It works well! 

However in real life, we have questions only 🤔 So we need to somehow find relevant passages in the entire corpus. 

The simpliest way: concat all reviews into huge context. But this will have unnaceptable latency☹️The smartest way is:

Retriever-Reader architectuer

Retriever: embed contexts, select one with high dot product @ query

Reader: extract answer from the best documents provided by retriever

Document store: docs database provided to the retriever at query time

Set up Retriever

We use BM25 retriever (TF-IDF based).

Let's query "What is the length of the cord?"

The retriever managed to extract related reviews where potential answer might be found (check the table)

Set up Reader

The reader is basically the elastic abstraction for the model we already played with (miniML)

Example : The Wright brothers flew the motor-operated airplane on December 17, 1903. Their aircraft, the W-Flyer, used ailerons for control and had a 12-horsepower engine.

Extractive QA

Finally, we achieved our Goal!

Product: Amazon Kindle e-book (code: B0074BW614)
Query : "Is it good for reading?"
Retriever: BM25
Document dataset: SubjQA
Reader: miniML (BART) model

top 3 answers for query "Is it good for reading?"& product Kindle e-book

Evaluate Retriever

The quality mainly cames from neuro-Reader (BERT-based), but Retriever sets a bottleneck, providing relevant documents.

Evaluation

Result:



Sparse (BM25) vs Dense (DPR) retriever evaluation

Evaluate Reader

Extractive QA has two reader metrics.

Representative score=balance of both

Domain Adaptation

EM and F1 score on SubjQA dataset for 3 models.

base model: MiniLM-L12-H384-uncased



Evaluate QA Pipeline

Let's compare Reader vs Retrieval&Reader. 

We can see the Retriever impact on overall performance

Reader-only vs Retriever & Reader scores on SubjQA

Generative QA

We implemeted only extractive QA which tries to find answer's start token in the context (token classification)

Generative QA can synhesize the answer from scattered parts along the entire context

Retrieval-Augmented Generation (RAG): 

Conclusion