add gsmk test script #1136

sufubao · 2025-12-05T09:03:27Z

No description provided.

gemini-code-assist · 2025-12-05T09:03:44Z

Summary of Changes

Hello @sufubao, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds a comprehensive benchmarking script for the LightLLM API, specifically targeting its performance on the GSM8K mathematical reasoning dataset. The script streamlines the process of data acquisition, prompt engineering with few-shot examples, parallel execution of API requests, and detailed performance analysis, including accuracy and latency metrics. This addition is crucial for systematically evaluating and monitoring the model's capabilities in solving complex math problems.

Highlights

New GSM8K Test Script: Introduces test_gsmk.py, a new Python script to benchmark the LightLLM API's performance on the Grade School Math 8K (GSM8K) dataset.
LightLLM API Benchmarking: The script evaluates the LightLLM API by making parallel requests for text generation, measuring accuracy on mathematical reasoning problems, and recording inference latency.
Parallel Processing and Data Handling: Implements ThreadPoolExecutor for concurrent API calls and includes functionality to download and cache the GSM8K dataset, ensuring efficient and reproducible testing.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new test script for benchmarking the GSM8K task. The script is well-structured, leveraging ThreadPoolExecutor for concurrent requests and tqdm for progress indication. My review focuses on improving the script's robustness, portability, and maintainability. Key suggestions include properly handling the --data-path argument, avoiding hardcoded paths for better portability, adding type hints for clarity, and refining some implementation details for better adherence to Python best practices.

gemini-code-assist · 2025-12-05T09:05:16Z

test/test_api/test_gsmk.py

+
+    # Read data
+    url_data = "https://raw.githubusercontent.com/openai/grade-school-math/master/grade_school_math/data/test.jsonl"
+    filename = download_and_cache_file(url_data)


The --data-path command-line argument is defined but is not being used. The script currently ignores it and always attempts to download the data. To allow users to specify a local data file or a custom cache path, you should pass this argument to the download_and_cache_file function.

Suggested change

filename = download_and_cache_file(url_data)

filename = download_and_cache_file(url_data, args.data_path)

gemini-code-assist · 2025-12-05T09:05:16Z

test/test_api/test_gsmk.py

+def download_and_cache_file(url: str, filename: Optional[str] = None):
+    """Read and cache a file from a url."""
+    if filename is None:
+        filename = os.path.join("/tmp", url.split("/")[-1])


Hardcoding the /tmp directory is not portable and will fail on non-Unix systems like Windows. It's better to use the tempfile module to get the path to the system's temporary directory. You'll need to add import tempfile at the top of the file.

Suggested change

filename = os.path.join("/tmp", url.split("/")[-1])

filename = os.path.join(tempfile.gettempdir(), url.split("/")[-1])

gemini-code-assist · 2025-12-05T09:05:17Z

test/test_api/test_gsmk.py

+    return filename
+
+
+def call_generate_lightllm(prompt, temperature, max_tokens, stop=None, url=None):


Adding type hints to function signatures improves code clarity, makes it easier to understand for other developers, and enables static analysis tools to catch potential bugs.

Suggested change

def call_generate_lightllm(prompt, temperature, max_tokens, stop=None, url=None):

def call_generate_lightllm(prompt: str, temperature: float, max_tokens: int, stop: Optional[list] = None, url: Optional[str] = None) -> str:

gemini-code-assist · 2025-12-05T09:05:17Z

test/test_api/test_gsmk.py

+
+def call_generate_lightllm(prompt, temperature, max_tokens, stop=None, url=None):
+    """Call LightLLM API for text generation."""
+    assert url is not None


Using assert for input validation is not ideal, as assertions can be disabled in optimized builds (e.g., running with python -O). It's more robust to raise a ValueError to ensure the check is always performed.

Suggested change

assert url is not None

if url is None:

raise ValueError("The 'url' parameter must be provided.")

gemini-code-assist · 2025-12-05T09:05:17Z

test/test_api/test_gsmk.py

+    ret = "Question: " + lines[i]["question"] + "\nAnswer:"
+    if include_answer:
+        ret += " " + lines[i]["answer"]
+    return ret


Using f-strings is generally more readable and can be more performant than repeated string concatenation with +.

Suggested change

ret = "Question: " + lines[i]["question"] + "\nAnswer:"

if include_answer:

ret += " " + lines[i]["answer"]

return ret

ret = f"Question: {lines[i]['question']}\nAnswer:"

if include_answer:

ret += f" {lines[i]['answer']}"

return ret

gemini-code-assist · 2025-12-05T09:05:17Z

test/test_api/test_gsmk.py

+    ret = ""
+    for i in range(k):
+        ret += get_one_example(lines, i, True) + "\n\n"
+    return ret


Building a string in a loop using += can be inefficient for a large number of iterations. A more Pythonic and performant approach is to use a generator expression with str.join().

Suggested change

ret = ""

for i in range(k):

ret += get_one_example(lines, i, True) + "\n\n"

return ret

return "".join(get_one_example(lines, i, True) + "\n\n" for i in range(k))

gemini-code-assist · 2025-12-05T09:05:17Z

test/test_api/test_gsmk.py

+    print(f"Latency: {latency:.3f} s")
+
+    # Dump results
+    dump_state_text("tmp_output_lightllm.txt", states)


The output filename tmp_output_lightllm.txt is hardcoded. This can be inconvenient, especially if running multiple tests, as they would overwrite the same file. Consider making this configurable via a command-line argument (e.g., --output-file).

gemini-code-assist · 2025-12-05T09:05:17Z

test/test_api/test_gsmk.py

+            "other": {
+                "num_questions": args.num_questions,
+                "parallel": args.parallel,
+            },


The result JSON contains num_requests at the top level and num_questions inside the other dictionary, both holding the same value. This is redundant. To improve clarity, it's best to remove the duplicate key from the other dictionary.

"other": { "parallel": args.parallel, },

add gsmk test script

8c8089f

gemini-code-assist bot reviewed Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add gsmk test script #1136

add gsmk test script #1136

Uh oh!

sufubao commented Dec 5, 2025

Uh oh!

gemini-code-assist bot commented Dec 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

gemini-code-assist bot Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	filename = download_and_cache_file(url_data)
	filename = download_and_cache_file(url_data, args.data_path)

	filename = os.path.join("/tmp", url.split("/")[-1])
	filename = os.path.join(tempfile.gettempdir(), url.split("/")[-1])

		return filename


		def call_generate_lightllm(prompt, temperature, max_tokens, stop=None, url=None):

	def call_generate_lightllm(prompt, temperature, max_tokens, stop=None, url=None):
	def call_generate_lightllm(prompt: str, temperature: float, max_tokens: int, stop: Optional[list] = None, url: Optional[str] = None) -> str:

	assert url is not None
	if url is None:
	raise ValueError("The 'url' parameter must be provided.")

add gsmk test script #1136

Are you sure you want to change the base?

add gsmk test script #1136

Uh oh!

Conversation

sufubao commented Dec 5, 2025

Uh oh!

gemini-code-assist bot commented Dec 5, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants