For Keyword Terms: For each keyword term in the query, highlight the term whenever
it appears in the document through pattern matching from regular expressions; For Formula Terms: For a query formula, at first its math tuples are extracted. Then
for each document formula, also extract its math tuples and then compare them
against the math tuples from the query. A matching percentage is computed by
inferring from the math tuples the number of symbols that are matched, divided by
the total number of symbols from the query formula. As an example, a document
formula with $ax^2 + bx + c + d$ will match a query formula $ax^2 + bx + c$ with a 100%
matching percentage, while a document formula a will match the same query formula
with a 12.5% matching percentage. The matching percentage of the formulas are then
reflected in shades of the highlight color, as shown in Figure 24.
Last edited by hbghlyj 2023-7-31 21:03MSE–How to search on this site?
Approach0 is able - at least to some extent - to find posts where the same expression is written differently.
Python binding to approach0 It includes the operator tree parser that approach0 uses, and a linear tokenizer for latex, of course, it has core functionalities of the approach0 system. The tokenizer is useful if you want to apply it for transformer because it can reduce the vocabulary (e.g., \frac vs \dfrac) at the same time treating the tokens specially.
A fork of the Tangent-CFT search engine (Python) that recognizes all LaTeX operators in arXMLiv 2019 and M-SE. Can be used to produce SLT and OPT out of MathML.