from langchain_community.document_loaders.csv_loader import CSVLoaderloader = CSVLoader( file_path='1.csv')data = loader.load()print(data)
对付如何详细利用文档加载器,后续将将实战篇中给出。
文本分割器 (Text Splitters)加载文档后,常日须要对其进行转换以更好地适应运用需求。最大略的例子是将长文档分割成较小的块,以适应模型的高下文窗口。LangChain 供应了许多内置的文档转换器,可以轻松地分割、合并、过滤和操作文档。
处理长文本时,有必要将文本分割成多个块。只管这听起来很大略,但个中存在很多潜在的繁芜性。空想情形下,希望将语义干系的文本片段保留在一起。详细的 "语义干系" 可能取决于文本的类型。以下是几种方法的概述:

这意味着可以从两个不同的维度自定义文本分割器:
如何分割文本如何丈量块的大小以下是如何利用文本分割器的示例:
from langchain.text_splitter import CharacterTextSplittertext_splitter = CharacterTextSplitter( separator = "\n", chunk_size = 100, chunk_overlap = 20)text = '''Notice that the response from the model is an AIMessage. This contains a string response along with other metadata about the response. Oftentimes we may just want to work with the string response. We can parse out just this response by using a simple output parser.We first import the simple output parser.from langchain_core.output_parsers import StrOutputParserparser = StrOutputParser()API Reference:StrOutputParserOne way to use it is to use it by itself. For example, we could save the result of the language model call and then pass it to the parser.result = model.invoke(messages)parser.invoke(result)'Ciao!'More commonly, we can "chain" the model with this output parser. This means this output parser will get called every time in this chain. This chain takes on the input type of the language model (string or list of message) and returns the output type of the output parser (string).We can easily create the chain using the | operator. The | operator is used in LangChain to combine two elements together.'''docs = text_splitter.split_text(text)print(docs)
运行的结果为:
在上面的示例中,CharacterTextSplitter 根据换行符将文本分割成块,每个块的大小为 100 个字符,并且块之间有 20 个字符的重叠。
对付如何详细利用文本分割器,可以拜会干系的操作指南。
总结LangChain 的文档加载器和文本分割器是处理和转换文档的强大工具。文档加载器能够从各种数据源中加载 Document 工具,而文本分割器可以将长文本分割成更适宜处理的小块。通过精确利用这些工具,可以更好地管理和处理文本数据以适应不同的运用需求。