how to correctly to resume work #1616

tusik · 2025-01-12T02:20:31Z

tusik
Jan 12, 2025

my config is

cache:
  type: file # one of [blob, cosmosdb, file]
  base_dir: "cache"

i'm index 100M files, embedding api have some problem when extract graph is finished, i'm tring to resume work

graphrag index --root . --resume [timestamp find in index-eninge.log]

# index-engine.log
09:14:04,701 graphrag.cli.index INFO Starting pipeline run for: 20250110-091404, dry_run=False
# why hide the timestamp, it just only write in index-eninge.log now, no screen output, no named cache folder

And just re-extract graph now, is something i did wrong?
So resume not work just waste my 200M tokens from last work?

Answered by VanillaTY

Aug 26, 2025

I added workflows configuration in the settings.yaml file, and successfully resolved! https://microsoft.github.io/graphrag/config/yaml/

View full answer

ultrageopro · 2025-01-17T09:19:57Z

ultrageopro
Jan 17, 2025

same problem

2 replies

schauerstoff Feb 20, 2025

same here

hejiang1972 Apr 1, 2025

same problem

quying · 2025-03-07T08:49:09Z

quying
Mar 7, 2025

I have the same situation where I have only embedding pipeline failed and do not want to rerun everything else. I checked doc and code, I didn't find resume option. But I did find a "workflow" config where you can specify the pipelines.
Check out here doc

4 replies

schauerstoff Mar 8, 2025

When I just re-ran it (v1.2.0) without any changes it automatically used the cache after the graph-creation process. Might be worth a try

chenfan001 Apr 29, 2025

Have you succeeded in your attempt? Could you please share the configuration? I tried according to the documentation, but I found that I couldn't skip the workflows that have already been run. When setting the workflow list, it will show that the specific workflow is not configured. Looking forward to your reply!

schauerstoff Apr 29, 2025

Yes, it did succeed on v1.2, did not attempt again on v2.1. I left the workflow list as is and just called the indexing routine again which automatically used the stored data in /cache

chenfan001 Apr 29, 2025

thank u

EmmittXu · 2025-03-11T08:31:54Z

EmmittXu
Mar 11, 2025

I tried --resume and skip_workflows, both not working. But here is a hardcode way out, comment "_get_workflow_list" function

0 replies

VanillaTY · 2025-08-26T03:46:02Z

VanillaTY
Aug 26, 2025

I added workflows configuration in the settings.yaml file, and successfully resolved! https://microsoft.github.io/graphrag/config/yaml/

0 replies

natoverse · 2025-09-09T21:46:04Z

natoverse
Sep 9, 2025
Maintainer

GraphRAG caches LLM calls aggressively, so re-running after a failure should skip over all the previous work. If that still takes a long time due to CPU processing as it runs through the workflows, the solution by @VanillaTY is correct: the workflows key in your settings.yml can be used to run exactly and only the listed workflows.

0 replies

how to correctly to resume work #1616

Uh oh!

Uh oh!

Replies: 5 comments · 6 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

natoverse Sep 9, 2025 Maintainer

Replies: 5 comments 6 replies

natoverse
Sep 9, 2025
Maintainer