Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions api-reference/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ The Unstructured API provides the following benefits beyond the [Unstructured op
* Unstructured manages code dependencies, for instance for libraries such as Tesseract.
* Unstructured manages its own infrastructure, including parallelization and other performance optimizations.

[Learn more](/open-source/introduction/overview#limits).

## Pricing

To call the Unstructured API, you must have an Unstructured account.
Expand Down
6 changes: 6 additions & 0 deletions api-reference/partition/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ sidebarTitle: Quickstart
[Skip ahead](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Partition_Endpoint_Quickstart.ipynb) to run this quickstart as a notebook on Google Colab now!

Do you want to just copy the sample code for use on your local machine? [Skip ahead](#sample-code) to the code now!

This quickstart uses the Unstructured Partition Endpoint and focuses on a single, local file for ease-of-use demonstration purposes. This quickstart also
focuses only on a limited set of Unstructured's full capabilities. To unlock the full feature set, as well as use Unstructured to do
large-scale batch processing of multiple files and semi-structured data that are stored in remote locations,
[skip over](/api-reference/workflow/overview#quickstart) to an expanded, advanced version of this quickstart that uses the
Unstructured Workflow Endpoint instead.
</Tip>

<iframe
Expand Down
41 changes: 38 additions & 3 deletions api-reference/workflow/workflows.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -977,7 +977,7 @@ In the request body, specify the settings for the workflow. For the specific set

## Custom workflow DAG nodes

import EnrichmentImagesTablesHiResOnly from '/snippets/general-shared-text/enrichment-images-tables-hi-res-only.mdx';
import EnrichmentImagesTablesOCRHiResOnly from '/snippets/general-shared-text/enrichment-images-tables-ocr-hi-res-only.mdx';

If `WorkflowType` is set to `CUSTOM` (for the Python SDK), or if `workflow_type` is set to `custom` (for `curl` or Postman), you must also specify the settings for the workflow's
directed acyclic graph (DAG) nodes. These nodes' settings are specified in the `workflow_nodes` array.
Expand All @@ -989,7 +989,7 @@ directed acyclic graph (DAG) nodes. These nodes' settings are specified in the `
- You can specify [Partitioner](#partitioner-node), [Enrichment](#enrichment-node),
[Chunker](#chunker-node), and [Embedder](#embedder-node) nodes.

<EnrichmentImagesTablesHiResOnly />
<EnrichmentImagesTablesOCRHiResOnly />

- The order of the nodes in the `workflow_nodes` array will be the same order that these nodes appear in the DAG,
with the first node in the array added directly after the **Source** node. The **Destination** node
Expand Down Expand Up @@ -1379,7 +1379,7 @@ An **Enrichment** node has a `type` of `prompter`.

[Learn about the available enrichments](/ui/enriching/overview).

<EnrichmentImagesTablesHiResOnly />
<EnrichmentImagesTablesOCRHiResOnly />

#### Image Description task

Expand Down Expand Up @@ -1598,6 +1598,41 @@ Allowed values for `<subtype>` include:
- `openai_ner`
- `anthropic_ner`

#### Text Fidelity Optimization task

import EnrichmentOCRHiResOnly from '/snippets/general-shared-text/enrichment-ocr-high-res-only.mdx';

<EnrichmentOCRHiResOnly />

<AccordionGroup>
<Accordion title="Python SDK">
```python
text_fidelity_optimization_enrichment_workflow_node = WorkflowNode(
name="Enrichment",
subtype="<subtype>",
type="prompter",
settings={}
)
```
</Accordion>
<Accordion title="curl, Postman">
```json
{
"name": "Enrichment",
"type": "prompter",
"subtype": "<subtype>",
"settings": {}
}
```
</Accordion>
</AccordionGroup>

Allowed values for `<subtype>` include:

- `anthropic_ocr`
- `openai_ocr`
- `vertexai_ocr`

### Chunker node

A **Chunker** node has a `type` of `chunk`.
Expand Down
3 changes: 2 additions & 1 deletion docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,8 @@
"ui/enriching/image-descriptions",
"ui/enriching/table-descriptions",
"ui/enriching/table-to-html",
"ui/enriching/ner"
"ui/enriching/ner",
"ui/enriching/ocr"
]
},
"ui/embedding",
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/single-file/bounding-box.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/single-file/download.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/single-file/generative-refinement.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/single-file/get-code.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/single-file/json-view.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/single-file/partitioning-strategies.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/single-file/results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/single-file/show-all-bounding-boxes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/single-file/smb-workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/single-file/welcome.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ui/single-file/workflow-editor.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/ui/walkthrough/EnrichedWorkflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/ui/walkthrough/GoToEnrichmentNode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<Warning>
Unstructured can potentially generate image summary descriptions, table summary descriptions, and table-to-HTML output only for workflows that are configured as follows:
Unstructured can potentially generate image summary descriptions, table summary descriptions, table-to-HTML output, and text fidelity optimization, only for workflows that are configured as follows:

- With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and an image summary description node, table summary description node, or table-to-HTML output node is added.
- With a **Partitioner** node set to use the **VLM** partitioning strategy. No image summary description node, table summary description node, or table-to-HTML output node is needed (or allowed).
- With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and an image summary description node, table summary description node, table-to-HTML output node, or text fidelity optimization node is added.
- With a **Partitioner** node set to use the **VLM** partitioning strategy. No image summary description node, table summary description node, table-to-HTML output node, or text fidelity optimization node is needed (or allowed).

Even with these configurations, Unstructured actually generates image summary descriptions, table summary descriptions, and table-to-HTML output only for files that contain images or tables and are also eligible
for processing with the following partitioning strategies:
Expand All @@ -14,4 +14,6 @@

- With a **Partitioner** node set to use the **Fast** partitioning strategy.
- With a **Partitioner** node set to use the **Auto**, **High Res**, or **VLM** partitioning strategy, for all files that Unstructured encounters that do not contain images or tables.

Unstructured never generates text fidelity optimizations for workflows with a **Partitioner** node set to use the **Fast** partitioning strategy.
</Warning>
8 changes: 8 additions & 0 deletions snippets/general-shared-text/enrichment-ocr-high-res-only.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<Warning>
Unstructured can optimize text fidelity for workflows that are configured as follows:

- With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and a text fidelity optimization node is added.
- With a **Partitioner** node set to use the **VLM** partitioning strategy. No text fidelity optimization node is needed (or allowed).

Unstructured never generates text fidelity optimizations for workflows with a **Partitioner** node set to use the **Fast** partitioning strategy.
</Warning>
Loading