phploaderimport技巧_AI大年夜模型框架Langchain熟习Langchain中document loader

文章目录 [+]

from langchain_community.document_loaders.csv_loader import CSVLoaderloader = CSVLoader( file_path='1.csv')data = loader.load()print(data)

对付如何详细利用文档加载器，后续将将实战篇中给出。

文本分割器 (Text Splitters)

加载文档后，常日须要对其进行转换以更好地适应运用需求。
最大略的例子是将长文档分割成较小的块，以适应模型的高下文窗口。
LangChain 供应了许多内置的文档转换器，可以轻松地分割、合并、过滤和操作文档。

phploaderimport技巧_AI大年夜模型框架Langchain熟习Langchain中document loader

处理长文本时，有必要将文本分割成多个块。
只管这听起来很大略，但个中存在很多潜在的繁芜性。
空想情形下，希望将语义干系的文本片段保留在一起。
详细的 "语义干系" 可能取决于文本的类型。
以下是几种方法的概述：

（图片来自网络侵删）

高等文本分割方法分割文本为小的、语义干系的块（常日是句子）。
开始将这些小块合并成较大的块，直到达到某个大小（由某个函数丈量）。
一旦达到该大小，将该块作为独立的文本，并开始创建一个新的文本块，同时保留一些重叠（以保持块之间的高下文）。

这意味着可以从两个不同的维度自定义文本分割器：

如何分割文本如何丈量块的大小

以下是如何利用文本分割器的示例：

from langchain.text_splitter import CharacterTextSplittertext_splitter = CharacterTextSplitter( separator = "\n", chunk_size = 100, chunk_overlap = 20)text = '''Notice that the response from the model is an AIMessage. This contains a string response along with other metadata about the response. Oftentimes we may just want to work with the string response. We can parse out just this response by using a simple output parser.We first import the simple output parser.from langchain_core.output_parsers import StrOutputParserparser = StrOutputParser()API Reference:StrOutputParserOne way to use it is to use it by itself. For example, we could save the result of the language model call and then pass it to the parser.result = model.invoke(messages)parser.invoke(result)'Ciao!'More commonly, we can "chain" the model with this output parser. This means this output parser will get called every time in this chain. This chain takes on the input type of the language model (string or list of message) and returns the output type of the output parser (string).We can easily create the chain using the | operator. The | operator is used in LangChain to combine two elements together.'''docs = text_splitter.split_text(text)print(docs)

运行的结果为：

在上面的示例中，CharacterTextSplitter 根据换行符将文本分割成块，每个块的大小为 100 个字符，并且块之间有 20 个字符的重叠。

对付如何详细利用文本分割器，可以拜会干系的操作指南。

总结

LangChain 的文档加载器和文本分割器是处理和转换文档的强大工具。
文档加载器能够从各种数据源中加载 Document 工具，而文本分割器可以将长文本分割成更适宜处理的小块。
通过精确利用这些工具，可以更好地管理和处理文本数据以适应不同的运用需求。