Documentation Index
Fetch the complete documentation index at: https://langchain-5e9cc07a-preview-cbclea-1779438149-9d7c578.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Theunstructuredpackage from Unstructured.IO extracts clean text from raw source documents like PDFs and Word documents. This page covers how to use theunstructuredecosystem within LangChain.
Installation and setup
If you are using a loader that runs locally, use the following steps to getunstructured and its
dependencies running.
-
For the smallest installation footprint and to take advantage of features not available in the
open-source
unstructuredpackage, install the Python SDK withpip install unstructured-clientalong withpip install langchain-unstructuredto use theUnstructuredLoaderand partition remotely against the Unstructured API. This loader lives in a LangChain partner repo instead of thelangchain-communityrepo and you will need anapi_key. You can generate a free key on the Unstructured API key page.- Unstructured’s documentation for the sdk can be found here: https://docs.unstructured.io/api-reference/api-services/sdk
-
To run everything locally, install the open-source python package with
pip install unstructuredalong withpip install langchain-communityand use the sameUnstructuredLoaderas mentioned above.- You can install document specific dependencies with extras, e.g.
pip install "unstructured[docx]". Learn more about extras in the full installation documentation. - To install the dependencies for all document types, use
pip install "unstructured[all-docs]".
- You can install document specific dependencies with extras, e.g.
-
Install the following system dependencies if they are not already available on your system with e.g.
brew installfor Mac. Depending on what document types you’re parsing, you may not need all of these.libmagic-dev(filetype detection)poppler-utils(images and PDFs)tesseract-ocr(images and PDFs)qpdf(PDFs)libreoffice(MS Office docs)pandoc(EPUBs)
- When running locally, Unstructured also recommends using Docker by following this guide to ensure all system dependencies are installed correctly.
Data loaders
The primary usage ofUnstructured is in data loaders.
UnstructuredLoader
See a usage example to see how you can use this loader for both partitioning locally and remotely with the serverless Unstructured API.Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

