搜索公式

hbghlyj · 2022-7-16 01:59

建议修改Discuz默认帖子搜索使其便于搜索公式,参考MathDowsers Question_panel_explained[1].png

论文 Formula-centric: Selecting Visually Matching Formulas 搜索在视觉上相似的公式

For Keyword Terms: For each keyword term in the query, highlight the term whenever it appears in the document through pattern matching from regular expressions;
For Formula Terms: For a query formula, at first its math tuples are extracted. Then for each document formula, also extract its math tuples and then compare them against the math tuples from the query. A matching percentage is computed by inferring from the math tuples the number of symbols that are matched, divided by the total number of symbols from the query formula. As an example, a document formula with $ax^2 + bx + c + d$ will match a query formula $ax^2 + bx + c$ with a 100% matching percentage, while a document formula a will match the same query formula with a 12.5% matching percentage. The matching percentage of the formulas are then reflected in shades of the highlight color, as shown in Figure 24. 匹配的公式或关键词被高亮出来 freakedsmiley[1].png

鼠标悬停显示相似度 freakedsmiley[1].png

hbghlyj · 2022-8-21 08:02

mathweb
MathWebSearch 系统 (MWS) 是一个基于内容的数学公式搜索引擎.它使用源自"定理自动证明"的技术"术语索引"为 MathML 公式编制索引. MWS 搜索引擎针对快速查询响应和交互式应用程序进行了优化.统一查询构成了具有明确定义语义的表达性查询语言的基础.任何可以通过 URI 引用并转换为 MathML 的公式都可以被 MWS 索引.
mathpix search tab
翻页.gif

搜索目前支持以下网站：

Wikipedia
Math Stack Exchange
MathOverflow
Wolfram MathWorld
Google Socratic
PlanetMath
NIST Digital Library of Mathematical Functions

searchonmath
一个强大的数学公式搜索引擎,支持 LaTeX.
approach0.xyz
搜索目前支持以下网站：Aops,MathStackExchange和MathOverflow
作者是Heavy Buyer
GitHub页面
翻页.gif

hbghlyj · 2022-12-10 09:37

Last edited by hbghlyj 2023-7-31 21:03MSE–How to search on this site?
Approach0 is able - at least to some extent - to find posts where the same expression is written differently.

目前还没有该搜索引擎后端算法的Python实现
但是作者Wei Zhong为a0搜索引擎提供了一个Python 绑定 pya0。
pya0/tests/test-parser.py
Demo for PYA0
遗憾的是不支持Windows平台（见comment）

hbghlyj · 2023-2-11 09:37

列举了一些math-aware search engine
将公式canonicalize(正规化), 根据相似度, 建立内容索引

Python binding to approach0 It includes the operator tree parser that approach0 uses, and a linear tokenizer for latex, of course, it has core functionalities of the approach0 system. The tokenizer is useful if you want to apply it for transformer because it can reduce the vocabulary (e.g., \frac vs \dfrac) at the same time treating the tokens specially.
The WebMIaS web search engine of EuDML (available also as a Docker image [1], and its individual components (Java):
* MathML canonicalizer
- MathML unificator
- Math processing plugin for Apache Lucene and Solr
template for a math-aware search engine powered by the pv211-utils library (Python)
An implementation of the soft cosine document similarity measure in the Gensim [2], [3] library (Python).
A fork of the Tangent-CFT search engine (Python) that recognizes all LaTeX operators in arXMLiv 2019 and M-SE.
Can be used to produce SLT and OPT out of MathML.

Classification of Presentation MathML Expressions Using Multilayer Perceptron

hbghlyj · 2023-7-31 21:01

ECIR 2020 paper Accelerating Substructure Similarity Search for Formula Retrieval Wei Zhong, Shaurya Rohatgi, Jian Wu, C. Lee Giles, and Richard Zanibbi

Arxiv Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval Wei Zhong, Jheng-Hong Yang, Yuqing Xie, Jimmy Lin

SIGIR ’21, Virtual Event, Canada, PYA0: A Python Toolkit for Accessible Math-Aware Search Wei Zhong, Jimmy Lin

Applying Structural and Dense Semantic Matching for the ARQMath Lab 2022, CLEF Wei Zhong, Yuqing Xie and Jimmy Lin

Scharpf et al., “ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open?”, ARQMath Lab @ CLEF 2020, Virtual Event

BrushSearch Doc & Demo & Video

hbghlyj · 2023-7-31 21:02

ARQMath Home - RIT

ARQMath is a cooperative evaluation task aiming to advance math-aware search and the semantic analysis of mathematical notation and texts.

Overview of ARQMath-3 (2022)
TU_DBS in the ARQMath Lab 2021, CLEF

hbghlyj · 2023-7-31 21:04

Intro
数学信息检索（Mathematical Information Retrieval，简称MIR）是指在包含数学内容的文档中查找信息。这对于经常使用数学的技术学科（如物理学和计算机科学）以及在线寻找信息（如教程、维基百科和WolframAlpha中的信息）都非常重要。例如，使用关键词和/或公式查找表达数学概念的用例，找到使用相似数学模型的技术论文，或浏览包含给定公式的文档，并添加关键词以帮助缩小搜索范围，针对特定主题/社区（如几何学、特定出版商或期刊）和资源（如教程、证明或计算机程序）。

数学检索

有关MIR的最新调查可从以下链接获取。早期研究始于1990年代中期，并在近年来变得越来越活跃，部分原因是NTCIR数学检索任务的推动。早在2000年代初，就为NIST数学函数数字图书馆（DLMF）创建了第一个通用的数学感知搜索引擎。有关MIR的其他信息可通过上面的“参考文献”链接获得。
《Recognition and Retrieval of Mathematical Expressions》（Zanibbi＆Blostein，IJDAR 15（4）：331-357，2012）
《A Survey on Retrieval of Mathematical Knowledge》（Guidi＆Coen，Proc. CICM，LNAI 9150，pp. 296-315，2015）

为什么MIR很困难，有哪些主要挑战？

数学符号的使用是具有方言特征的。例如，不同的学术界使用不同的变量命名约定（如使用“variance”与“v”）和定义操作符的方式（如上划线 $\overline{x}$ 表示布尔否定，或表示一组值的平均值）。个别作者会根据自身需求重新定义和调整符号表示法。这种灵活性对于作者和读者都是有益的，却为自动解释增加了困难。

此外，评估或计算公式需要了解符号的定义和值。自动恢复这种计算环境是一项非常困难的语言处理任务，涉及对文本和数学符号的分析。

鉴于这种情况，出现了以下具有挑战性的问题：

如何找到和(eq)语义上相似的数学公式？
如何找到和(eq)视觉上相似的数学公式？
在公式检索中，是否应该整合外观和语义？
在查询中应如何组合文本和公式，即应支持哪种类型的查询语言？
文本和公式匹配在搜索结果的最终排序中应发挥什么作用？

查询以何种形式表示？

任务的查询包括一些公式和关键词的组合。数学公式以可机器读取的形式表示为树形结构。我们使用上面的思维气泡中显示的公式和关键词来提供一个示例查询编码。

我们的示例查询表示如下的XML树形结构。查询公式有三种表示方式：作为LaTeX字符串和XML编码（两种MathML）。公式的外观由LaTeX字符串（由<TeXquery>标签界定）和presentation MathML（由<pquery>标签界定）描述，而表达式中的数学运算则用content MathML表示（由<cquery>标签界定）。

<words>标签用于列出查询关键词（“infinite series conditionally convergent”）。

数学树结构示例

我如何获得示例主题和测试集合？

NTCIR-11数学主题数据和集合可在此处找到：research.nii.ac.jp/ntcir/permission/ntcir-11/perm-en-MATH.html
NTCIR-10数学主题数据可从IDR/NII（Informatics Research Data Repository）下载：nii.ac.jp/cscenter/idr/en/ntcir/ntcir.html。NTCIR-10测试集合可在此处获取：research.nii.ac.jp/ntcir/data/data-en.html

在哪里可以了解有关以前的NTCIR数学检索任务的更多信息？

有关以前的NTCIR数学任务的网页和论文可在上面的“References”链接处找到。

Account		Remember me	Forgot password?
Password			Register account

搜索公式

Related threads

NTCIR-12 math search task organizers