Skip to content

Conversation

@sufubao
Copy link
Collaborator

@sufubao sufubao commented Dec 5, 2025

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sufubao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds a comprehensive benchmarking script for the LightLLM API, specifically targeting its performance on the GSM8K mathematical reasoning dataset. The script streamlines the process of data acquisition, prompt engineering with few-shot examples, parallel execution of API requests, and detailed performance analysis, including accuracy and latency metrics. This addition is crucial for systematically evaluating and monitoring the model's capabilities in solving complex math problems.

Highlights

  • New GSM8K Test Script: Introduces test_gsmk.py, a new Python script to benchmark the LightLLM API's performance on the Grade School Math 8K (GSM8K) dataset.
  • LightLLM API Benchmarking: The script evaluates the LightLLM API by making parallel requests for text generation, measuring accuracy on mathematical reasoning problems, and recording inference latency.
  • Parallel Processing and Data Handling: Implements ThreadPoolExecutor for concurrent API calls and includes functionality to download and cache the GSM8K dataset, ensuring efficient and reproducible testing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new test script for benchmarking the GSM8K task. The script is well-structured, leveraging ThreadPoolExecutor for concurrent requests and tqdm for progress indication. My review focuses on improving the script's robustness, portability, and maintainability. Key suggestions include properly handling the --data-path argument, avoiding hardcoded paths for better portability, adding type hints for clarity, and refining some implementation details for better adherence to Python best practices.


# Read data
url_data = "https://raw.githubusercontent.com/openai/grade-school-math/master/grade_school_math/data/test.jsonl"
filename = download_and_cache_file(url_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The --data-path command-line argument is defined but is not being used. The script currently ignores it and always attempts to download the data. To allow users to specify a local data file or a custom cache path, you should pass this argument to the download_and_cache_file function.

Suggested change
filename = download_and_cache_file(url_data)
filename = download_and_cache_file(url_data, args.data_path)

def download_and_cache_file(url: str, filename: Optional[str] = None):
"""Read and cache a file from a url."""
if filename is None:
filename = os.path.join("/tmp", url.split("/")[-1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding the /tmp directory is not portable and will fail on non-Unix systems like Windows. It's better to use the tempfile module to get the path to the system's temporary directory. You'll need to add import tempfile at the top of the file.

Suggested change
filename = os.path.join("/tmp", url.split("/")[-1])
filename = os.path.join(tempfile.gettempdir(), url.split("/")[-1])

return filename


def call_generate_lightllm(prompt, temperature, max_tokens, stop=None, url=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Adding type hints to function signatures improves code clarity, makes it easier to understand for other developers, and enables static analysis tools to catch potential bugs.

Suggested change
def call_generate_lightllm(prompt, temperature, max_tokens, stop=None, url=None):
def call_generate_lightllm(prompt: str, temperature: float, max_tokens: int, stop: Optional[list] = None, url: Optional[str] = None) -> str:


def call_generate_lightllm(prompt, temperature, max_tokens, stop=None, url=None):
"""Call LightLLM API for text generation."""
assert url is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using assert for input validation is not ideal, as assertions can be disabled in optimized builds (e.g., running with python -O). It's more robust to raise a ValueError to ensure the check is always performed.

Suggested change
assert url is not None
if url is None:
raise ValueError("The 'url' parameter must be provided.")

Comment on lines +100 to +103
ret = "Question: " + lines[i]["question"] + "\nAnswer:"
if include_answer:
ret += " " + lines[i]["answer"]
return ret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using f-strings is generally more readable and can be more performant than repeated string concatenation with +.

Suggested change
ret = "Question: " + lines[i]["question"] + "\nAnswer:"
if include_answer:
ret += " " + lines[i]["answer"]
return ret
ret = f"Question: {lines[i]['question']}\nAnswer:"
if include_answer:
ret += f" {lines[i]['answer']}"
return ret

Comment on lines +107 to +110
ret = ""
for i in range(k):
ret += get_one_example(lines, i, True) + "\n\n"
return ret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Building a string in a loop using += can be inefficient for a large number of iterations. A more Pythonic and performant approach is to use a generator expression with str.join().

Suggested change
ret = ""
for i in range(k):
ret += get_one_example(lines, i, True) + "\n\n"
return ret
return "".join(get_one_example(lines, i, True) + "\n\n" for i in range(k))

print(f"Latency: {latency:.3f} s")

# Dump results
dump_state_text("tmp_output_lightllm.txt", states)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The output filename tmp_output_lightllm.txt is hardcoded. This can be inconvenient, especially if running multiple tests, as they would overwrite the same file. Consider making this configurable via a command-line argument (e.g., --output-file).

Comment on lines +220 to +223
"other": {
"num_questions": args.num_questions,
"parallel": args.parallel,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The result JSON contains num_requests at the top level and num_questions inside the other dictionary, both holding the same value. This is redundant. To improve clarity, it's best to remove the duplicate key from the other dictionary.

            "other": {
                "parallel": args.parallel,
            },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants