-
Notifications
You must be signed in to change notification settings - Fork 56
Closed as not planned
Labels
Description
Capturing some observations from #157
Pipelinehas "batching" support - it can shard the dataset and spawn an instance of the pipeline for each shard -batch_num_workersandbatch_sizeLLMBlockhas "batching" support - it can request multiple chat completions from the OpenAI server using thenargument -num_instructions_to_generate
In ilab we disable (1) with llama-cpp by passing batch_size=None - see instructlab/instructlab#346
In LLMBlock we disable (2) with llama-cpp with the server_supports_batched which checks whether the n argument works
Resolve:
- Do we want to call both of these "batching"
- Do we want to different ways of handling backend-specific capabilities?
- Should the library be trying to probe the backend for its capabilities, or should the library user give it information about the backend?
server_supports_batchedshould be a property onPipelineContext, not something we set on the OpenAI client object