LLM(1): Exa，基于 LLM 的搜索

大家好，欢迎来到IT知识分享网。

LLM(1): Exa，基于 LLM 的搜索

1. Exa 是什么

Exa 指的是网站 https://exa.ai, 是一个基于 LLM 的搜索网站:

Exa 也提供了 python api。

Exa 的 API 之下，是基于 transformer 架构训练的模型，当你输入一个段文本，它预测出对应的链接。

官方主页的一句话介绍：

Exa understands the context beyond the keywords you type.

很晕？继续看官方的文档里的 searching guide:

Exa uses a transformer architecture to predict links given text, and it gets its power from having been trained on the way that people talk about links on the Internet. This training produces a model that returns links that are both high in relevance and quality. However, the model does expect queries that look like how people describe a link on the Internet. For example, ‘best restaurants in SF” is a bad query, whereas “Here is the best restaurant in SF:” is a good query.

翻译成大白话：

Exa 的功能是“你给我文本，我预测链接”，文本指的是输入到 Exa 的“搜索内容”
这里说的文本和链接，是相关性非常高的一组内容，质量也很高
关于输入文本，举例体会下：
- “杭州滨江最好吃的餐厅” 是一个不太好的查询输入
- “这里是杭州滨江最最好吃的餐厅” 是一个好的查询输入

还有一种直白的理解：当你读到一篇不错的公众号文章、知乎问答时，你分享到微信朋友圈，并且还写了一句概括性质的话：“发现了一篇关于罗马架构历史的文章：[链接]”。我们人类在分享的时候，先看文章，再分享； Exa 的使用则是颠倒过来：用户输入 “发现了一篇关于罗马架构历史的文章” 这样的文本， Exa 则预测对应的链接。

Exa API 的入门使用

https://docs.exa.ai/reference/getting-started

获取 API Key

https://dashboard.exa.ai/overview

安装 Python 的包

https://github.com/exa-labs/exa-py

pip install exa_py

from exa_py import Exa import os MY_EXA_API = os.environ["MY_EXA_API"] exa = Exa(MY_EXA_API) results = exa.search('hottest AI agent startups', use_autoprompt=True) print(results)

输出内容：

Title: AgentOps URL: https://www.agentops.ai/ ID: tz-F56tReaJJZ9g13dB3eA Score: 0. Published Date: 2000-01-01 Author: None Text: None Highlights: None Highlight Scores: None Title: HiOperator | Generative AI-Enhanced Customer Service URL: https://www.hioperator.com/ ID: MIXJhTDGLrn9VmqKh6UOTA Score: 0.22641 Published Date: 2000-01-01 Author: None Text: None Highlights: None Highlight Scores: None Title: imbue URL: https://imbue.com/ ID: kOYHjR-2wEIOZc9Nv4bUHQ Score: 0. Published Date: 2023-09-07 Author: None Text: None Highlights: None Highlight Scores: None ...

Exa API 的进一步使用

Exa 希望用户以搜索 API 的方式使用它，因此围绕搜索功能，提供了细分的功能：

（https://docs.exa.ai/reference/cheat-sheet）

from exa_py import Exa # 初始化 Exa 客户端 # instantiate the Exa client exa = Exa("YOUR API KEY") # 基本的查询：只输入搜索关键字 # basic search results = exa.search("This is a Exa query:") # 稍微高级一点： 设置 use_autoprompt 为 True， 意思是输入查询的文本不用非得是 prompt 文本， Exa API 会自动帮你转为 prompt 文本 # autoprompted search results = exa.search("autopromptable query", use_autoprompt=True) # 带有时间过滤的查询 # search with date filters results = exa.search("This is a Exa query:", start_published_date="2019-01-01", end_published_date="2019-01-31") # 带有指定网站范围的查询 # search with domain filters results = exa.search("This is a Exa query:", include_domains=["www.cnn.com", "www.nytimes.com"]) # 搜索bing获取文本内容 # search and get text contents results = exa.search_and_contents("This is a Exa query:") # 搜索并且高亮结果 # search and get highlights results = exa.search_and_contents("This is a Exa query:", highlights=True) # 搜索，并且给出如何加工搜索结果，例如：包含html的tag，1000字以内 # search and get contents with contents options results = exa.search_and_contents("This is a Exa query:", text={ 
   "include_html_tags": True, "max_characters": 1000}, highlights={ 
   "highlights_per_url": 2, "num_sentences": 1, "query": "This is the highlight query:"}) # 查询相似的文档 # find similar documents results = exa.find_similar("https://example.com") # 查找相似的内容 # find similar excluding source domain results = exa.find_similar("https://example.com", exclude_source_domain=True) # 根据内容查询相似的 # find similar with contents results = exa.find_similar_and_contents("https://example.com", text=True, highlights=True) # 获取文本 # get text contents results = exa.get_contents(["ids"]) # 获取高亮 # get highlights results = exa.get_contents(["ids"], highlights=True) # get contents with contents options results = exa.get_contents(["ids"], text={ 
   "include_html_tags": True, "max_characters": 1000}, highlights={ 
   "highlights_per_url": 2, "num_sentences": 1, "query": "This is the highlight query:"})

Exa 的检索范围

Exa 专门建立了索引，每个被索引的“方向”/“关键词”，预期是能得到还不错的搜索结果：

类别1: 公司

比如输入 “Here is the homepage of a company working on making space travel cheaper:”

from exa_py import Exa import os MY_EXA_API = os.environ["MY_EXA_API"] exa = Exa(MY_EXA_API) q = "Here is the homepage of a company working on making space travel cheaper:" result = exa.search(q, use_autoprompt=True) print(result)

Title: Venus Aerospace :: A New Approach to Hypersonic Transportation URL: https://www.venusaero.com/ ID: aU6USG9MOjAjE-__6sKoWw Score: 0. Published Date: 2020-09-29 Author: None Text: None Highlights: None Highlight Scores: None Title: Manufacturing in Microgravity URL: https://varda.com/ ID: EurzS7kqWK4C1rPz3yFFOQ Score: 0. Published Date: 2022-05-20 Author: None Text: None Highlights: None Highlight Scores: None

类别2: Research papers

from exa_py import Exa import os MY_EXA_API = os.environ["MY_EXA_API"] exa = Exa(MY_EXA_API) q = "If you're looking for the most helpful academic paper on \"embeddings for document retrieval\", check this out (pdf:" result = exa.search(q, use_autoprompt=True) print(result)

结果：

Title: Structure with Semantics: Exploiting Document Relations for Retrieval URL: https://arxiv.org/pdf/2201.03720v2.pdf ID: t5QlAL4osVjsgk0lWpiLVw Score: 0.48297 Published Date: None Author: None Text: None Highlights: None Highlight Scores: None Title: Dense Passage Retrieval for Open-Domain Question Answering URL: https://arxiv.org/pdf/2004.04906.pdf ID: k6cwFCTzELLqYSt-cbnXow Score: 0. Published Date: None Author: None Text: None Highlights: None Highlight Scores: None

类别3: Github repos

比如搜索如何用 pnnx 转换 pytorch 模型到 ncnn

from exa_py import Exa import os MY_EXA_API = os.environ["MY_EXA_API"] exa = Exa(MY_EXA_API) q = "Here's a Github repo if you want to convert pytorch to ncnn by using pnnx" result = exa.search(q, use_autoprompt=True) print(result)

Title: GitHub - pnnx/pnnx: PyTorch Neural Network eXchange URL: https://github.com/pnnx/pnnx ID: ClKdd0rQsP3TOCqrGvaVdA Score: 0.60071 Published Date: 2023-02-17 Author: Pnnx Text: None Highlights: None Highlight Scores: None Title: GitHub - kouxichao/pytorch2ncnn: pytorch_converter URL: https://github.com/kouxichao/pytorch2ncnn ID: rOd-b0MCXckVuvCxKLwstQ Score: 0. Published Date: 2023-01-01 Author: Kouxichao Text: None Highlights: None Highlight Scores: None

类别4: 个人主页

结果不太行，和宣传的相差较大

类别5: News 新闻

类别6: 维基百科

类别7: Events

Events 意思是活动，其实和 News 新闻有点像，个人感觉传统的搜索引擎不体会记录 Event。

类别8: 博客

个人觉得博客和个人主页没必要分开。博客的结果好挺多的。

比如我直接在 exa.ai 搜 “If you’re a huge fan of opencv, checkout these blogs”

竟然找到了年头特别久的一篇： https://opencv.blogspot.com/ , 标题是 “I Hate (Love) OpenCV”

类别9: Jobs 找工作

其实 Exa 已经支持中文了。我输入的是:

如果你在寻找一份在创业公司做基于LLM的健身产品的研发的工作，请点击这里

试试通过 API 调用:

from exa_py import Exa import os MY_EXA_API = os.environ["MY_EXA_API"] exa = Exa(MY_EXA_API) q = "如果你在寻找一份在创业公司做基于LLM的健身产品的研发的工作，请点击这里" result = exa.search(q, use_autoprompt=True) print(result)

Title: Jobs — Business Model Innovation Lab | BMI Lab | Spinoff from the University of St.Gallen URL: https://bmilab.com/jobs ID: DwraeSmD6ibOwaLbEIP6RQ Score: 0.56726 Published Date: 1996-01-01 Author: None Text: None Highlights: None Highlight Scores: None Title: Who We Are URL: https://egym.com/us/careers ID: NqYMN7Cnv-Lt68z1jOEsBQ Score: 0.070526 Published Date: 2023-01-01 Author: None Text: None Highlights: None Highlight Scores: None Title: Careers URL: https://8fit.com/careers/ ID: vWS1mya-1-fzkAtjhjMRpQ Score: 0. Published Date: None Author: None Text: None Highlights: None Highlight Scores: None

类别10: Places and things

感觉是 “周末去哪玩” 的另一种叫法。比如我输入：

在1月份的时候，在杭州去哪里玩比较有意思?

我从 python API 搜索，结果的前3个：

西湖
灵隐寺
六和塔

from exa_py import Exa import os MY_EXA_API = os.environ["MY_EXA_API"] exa = Exa(MY_EXA_API) q = "在1月份的时候，在杭州去哪里玩比较有意思?" result = exa.search(q, use_autoprompt=True) print(result)

Title: West Lake
URL: https://www.visitourchina.com/hangzhou/attraction/west-lake.html
ID: e8i-QT_gISqP7iwC4_wVsA
Score: 0.046844 Published Date: 2023-01-01 Author: None Text: None Highlights: None Highlight Scores: None Title: Scan to follow SHINE's official Wechat account. URL: http://www.shine.cn/tags/lingyintemple/ ID: rq0v4ubX4uwyM1TdnRmNEg Score: 0. Published Date: 2020-09-12
Author: None
Text: None
Highlights: None
Highlight Scores: None


Title: Hangzhou Attractions: Hangzhou Liuhe Pagoda, Six Harmonies Pagoda
URL: https://www.hangzhouprivatetour.com/attractions/show/six_harmonies_pagoda.htm
ID: 7glcxF2Xx8-aDYEqu7RYzw
Score: 0. Published Date: 2017-12-22 Author: None Text: None Highlights: None Highlight Scores: None

总结

这篇简要介绍了 Exa 的使用，是 LLM 应用到搜索引擎上的案例。

Exa 的输入可以是英文，也可以是中文，估计内部执行了翻译。

Exa 的输出是中文，需要自行翻译。

Exa 相当于是在10个细分领域提升了搜索效率和质量。不过具体的类别应该是 API 内部自行判断的。

Exa 最初是需要输入 prompt 形式的查询文本，现在则是支持任意关键词，自动补充为 prompt。

大致流程是：

用于输入文本 -> 转换为英语 -> 自动补充为 prompt -> 判断检索内容属于哪个类别 -> 在这个类别里进行生成。

10个类别，那就是10个领域模型。

其他

Metaphor 的介绍视频： https://www.bilibili.com/video/BV1om4y1M7XM

免责声明：本站所有文章内容,图片，视频等均是来源于用户投稿和互联网及文摘转载整编而成，不代表本站观点，不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益，请在线联系站长,一经查实,本站将立刻删除。本文来自网络,若有侵权，请联系删除，如若转载，请注明出处：https://haidsoft.com/158538.html

LLM(1): Exa，基于 LLM 的搜索

LLM(1): Exa，基于 LLM 的搜索

1. Exa 是什么

Exa API 的入门使用

Exa API 的进一步使用

Exa 的检索范围

类别1: 公司

类别2: Research papers

类别3: Github repos

类别4: 个人主页

类别5: News 新闻

类别6: 维基百科

类别7: Events

类别8: 博客

类别9: Jobs 找工作

类别10: Places and things

总结

其他

相关推荐

发表回复