why GeminiPlugin zero3+offloading cannot training a 7B model

I got resource with 1T cpu mem and 4 2080ti22GB cards

I try zero3+offloading like

+        plugin = GeminiPlugin(precision=args.mixed_precision, 
+                              initial_scale=2**16, 
+                              shard_param_frac = 1,
+                              offload_optim_frac = 1,
+                              offload_param_frac =1,
+                              tp_size =4,
+                              max_norm=args.grad_clip

with a 7B model Llama2-Chinese-7b-Chat-ms
but it report GPU OOM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

why GeminiPlugin zero3+offloading cannot training a 7B model #6272

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

why GeminiPlugin zero3+offloading cannot training a 7B model #6272

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions