-
Notifications
You must be signed in to change notification settings - Fork 24
Arkane matching method name #814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for optional 4-digit year suffixes to distinguish method variants in the Arkane database (e.g., b97d3 vs b97d32023). The implementation allows users to omit the year (ARC matches the latest available), specify an explicit year, or receive warnings when a requested year is unavailable.
Key changes:
- Added
yearparameter to theLevelclass with validation requiring 4-digit integers - Implemented flexible method/basis matching logic with year-aware searching in Arkane database
- Enhanced error reporting to list available years when a requested year is not found
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/source/advanced.rst | Documents the new year parameter with usage examples |
| arc/level.py | Adds year attribute with 4-digit validation and updates string representations |
| arc/level_test.py | Adds test for year validation (rejects 2-digit, accepts 4-digit) |
| arc/statmech/arkane.py | Core implementation: adds normalization functions, year-aware matching logic, and improved error messages; also adds spinMultiplicity to non-SMILES species template and thread limit environment variables |
| arc/statmech/arkane_test.py | Adds test for year-not-found warning behavior and year in _level_to_str |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7955fd4 to
67e2cfe
Compare
1926671 to
773ab92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
docs/source/advanced.rst
Outdated
| and are not valid QC methods. Do not include year suffixes in ``level_of_theory``; instead, set | ||
| ``arkane_level_of_theory`` with a ``year`` value if you need a specific correction year. |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation incorrectly advises users to set year in arkane_level_of_theory, but based on the PR description and implementation, the year parameter should be set directly on sp_level (or other level arguments like freq_level, opt_level). The arkane_level_of_theory is typically derived from sp_level unless explicitly overridden. Users should be advised to set the year parameter on sp_level or other level_of_theory parameters, not on arkane_level_of_theory specifically.
| and are not valid QC methods. Do not include year suffixes in ``level_of_theory``; instead, set | |
| ``arkane_level_of_theory`` with a ``year`` value if you need a specific correction year. | |
| and are not valid QC methods. Do not include year suffixes in ``level_of_theory``; instead, specify a | |
| ``year`` key on ``sp_level`` (and/or other job-specific level dictionaries such as ``opt_level``, | |
| ``freq_level``, or ``scan_level``) if you need a specific correction year. |
docs/source/examples.rst
Outdated
| set ``arkane_level_of_theory`` with a ``year`` value instead. | ||
| If ``year`` is omitted, ARC will prefer the no-year Arkane entry for that method/basis; if none exists, | ||
| ARC will fall back to the latest available year in the Arkane database. |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation incorrectly advises users to set year in arkane_level_of_theory. Based on the PR implementation and examples shown earlier in the documentation (lines 98-112), the year parameter should be set directly on sp_level or other level parameters (opt_level, freq_level, etc.), not specifically on arkane_level_of_theory. The arkane_level_of_theory is typically auto-derived from sp_level unless explicitly overridden.
| set ``arkane_level_of_theory`` with a ``year`` value instead. | |
| If ``year`` is omitted, ARC will prefer the no-year Arkane entry for that method/basis; if none exists, | |
| ARC will fall back to the latest available year in the Arkane database. | |
| set a ``year`` field on the relevant level parameter (e.g., ``sp_level``, ``opt_level``, ``freq_level``, | |
| etc.) rather than on ``level_of_theory`` or ``arkane_level_of_theory``. If ``year`` is omitted, ARC will | |
| prefer the no-year Arkane entry for that method/basis; if none exists, ARC will fall back to the latest | |
| available year in the Arkane database. |
arc/statmech/arkane.py
Outdated
| f"available years: {_format_years(years)}. " | ||
| f"Specify a year to select a matching entry." | ||
| ) | ||
| return _level_to_str(sp_level) |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent behavior when Arkane AEC entry is not found: For composite methods (line 978), the function returns _level_to_str(sp_level) even when no matching entry exists in the database. However, for non-composite methods with freq_scale_factor (line 1005), it returns None. This inconsistency could lead to Arkane attempting to use a level of theory that lacks proper corrections. Both cases should handle missing entries consistently, likely by returning None after logging the warning.
| return _level_to_str(sp_level) | |
| # No matching AEC level in Arkane DB for this composite method | |
| return None |
| # Iodine SOC calculated as a weighted average of the electronic spin splittings of the lowest energy state. | ||
| # The splittings are obtained from Huber, K.P.; Herzberg, G., Molecular Spectra and Molecular Structure. IV. | ||
| # Constants of Diatomic Molecules, Van Nostrand Reinhold Co., 1979 | ||
| # Spin orbit correction for F, Si, Cl, Br and B taken form https://cccbdb.nist.gov/elecspin.asp |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: 'form' should be 'from'
| # Spin orbit correction for F, Si, Cl, Br and B taken form https://cccbdb.nist.gov/elecspin.asp | |
| # Spin orbit correction for F, Si, Cl, Br and B taken from https://cccbdb.nist.gov/elecspin.asp |
| def _normalize_method(method: str) -> str: | ||
| """ | ||
| Normalize method names for comparison: | ||
| - lowercase | ||
| - remove all hyphens | ||
| Examples: | ||
| "DLPNO-CCSD(T)-F12" -> "dlpnoccsd(t)f12" | ||
| "dlpnoccsd(t)f122023" -> "dlpnoccsd(t)f122023" | ||
| """ | ||
| return method.lower().replace('-', '') | ||
|
|
||
|
|
||
| def _split_method_year(method_norm: str) -> "Tuple[str, Optional[int]]": | ||
| """ | ||
| Split a normalized method into (base_method, year). | ||
| Examples: | ||
| "dlpnoccsd(t)f122023" -> ("dlpnoccsd(t)f12", 2023) | ||
| "dlpnoccsd(t)f12" -> ("dlpnoccsd(t)f12", None) | ||
| """ | ||
| m = re.match(r"^(.*?)(\d{4})$", method_norm) | ||
| if not m: | ||
| return method_norm, None | ||
| base, year_str = m.groups() | ||
| return base, int(year_str) | ||
|
|
||
|
|
||
| def _normalize_basis(basis: Optional[str]) -> Optional[str]: | ||
| """ | ||
| Normalize basis names for comparison: | ||
| - lowercase | ||
| - remove hyphens and spaces | ||
| Examples: | ||
| "cc-pVTZ-F12" -> "ccpvtzf12" | ||
| "ccpvtzf12" -> "ccpvtzf12" | ||
| """ | ||
| if basis is None: | ||
| return None | ||
| return basis.replace('-', '').replace(' ', '').lower() | ||
|
|
||
|
|
||
| def _parse_lot_params(lot_str: str) -> dict: | ||
| """ | ||
| Parse method, basis, and software from a LevelOfTheory(...) string. | ||
| Example lot_str: | ||
| "LevelOfTheory(method='dlpnoccsd(t)f122023',basis='ccpvtzf12',software='orca')" | ||
| """ | ||
| params = {'method': None, 'basis': None, 'software': None} | ||
| for key in params.keys(): | ||
| m = re.search(rf"{key}='([^']+)'", lot_str) | ||
| if m: | ||
| params[key] = m.group(1) | ||
| return params | ||
|
|
||
|
|
||
| def _iter_level_keys_from_section(file_path: str, | ||
| section_start: str, | ||
| section_end: str) -> list[str]: | ||
| """ | ||
| Return all LevelOfTheory(...) key strings that appear as dictionary keys | ||
| in a given section of data.py. | ||
| These look like: | ||
| "LevelOfTheory(method='...',basis='...',software='...')" : { ... } | ||
| """ | ||
| section = _extract_section(file_path, section_start, section_end) | ||
| if section is None: | ||
| return [] | ||
|
|
||
| # Match things like: "LevelOfTheory(...)" : { ... } | ||
| pattern = r'"(LevelOfTheory\([^"]*\))"\s*:' | ||
| return re.findall(pattern, section, flags=re.DOTALL) | ||
|
|
||
|
|
||
| def _available_years_for_level(level: "Level", | ||
| file_path: str, | ||
| section_start: str, | ||
| section_end: str) -> list[Optional[int]]: | ||
| """ | ||
| Return a sorted list of available year suffixes for a given Level in a section. | ||
| """ | ||
| if level is None or level.method is None: | ||
| return [] | ||
|
|
||
| target_method_norm = _normalize_method(level.method) | ||
| target_base, _ = _split_method_year(target_method_norm) | ||
| target_basis_norm = _normalize_basis(level.basis) | ||
| target_software = level.software.lower() if level.software else None | ||
|
|
||
| years = set() | ||
| for lot_str in _iter_level_keys_from_section(file_path, section_start, section_end): | ||
| params = _parse_lot_params(lot_str) | ||
| cand_method = params.get('method') | ||
| cand_basis = params.get('basis') | ||
| cand_sw = params.get('software') | ||
|
|
||
| if cand_method is None: | ||
| continue | ||
|
|
||
| cand_method_norm = _normalize_method(cand_method) | ||
| cand_base, cand_year = _split_method_year(cand_method_norm) | ||
|
|
||
| if cand_base != target_base: | ||
| continue | ||
| if target_basis_norm is not None: | ||
| cand_basis_norm = _normalize_basis(cand_basis) | ||
| if cand_basis_norm != target_basis_norm: | ||
| continue | ||
| if target_software is not None and cand_sw is not None: | ||
| if cand_sw.lower() != target_software: | ||
| continue | ||
|
|
||
| years.add(cand_year) | ||
|
|
||
| # Sort with None first to represent "no year suffix" | ||
| return sorted(years, key=lambda y: (-1 if y is None else y)) | ||
|
|
||
|
|
||
| def _format_years(years: list[Optional[int]]) -> str: | ||
| """ | ||
| Format a list of years for logging. | ||
| """ | ||
| if not years: | ||
| return "none" | ||
| return ", ".join("none" if y is None else str(y) for y in years) | ||
|
|
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding unit tests for the new helper functions (_normalize_method, _split_method_year, _normalize_basis, _parse_lot_params, _find_best_level_key_for_sp_level, _available_years_for_level). These functions contain important logic for parsing and matching level of theory specifications, and unit tests would help ensure correctness and prevent regressions, especially around edge cases like method names with special characters, different year formats, etc.
arc/statmech/arkane.py
Outdated
| def _available_years_for_level(level: "Level", | ||
| file_path: str, | ||
| section_start: str, | ||
| section_end: str) -> list[Optional[int]]: |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hint inconsistency: This function uses lowercase list[Optional[int]] for the return type, but the file imports and uses List from typing module elsewhere (see line 9 and line 683). For consistency with the rest of the codebase, use List[Optional[int]] instead of list[Optional[int]].
arc/statmech/arkane.py
Outdated
| return sorted(years, key=lambda y: (-1 if y is None else y)) | ||
|
|
||
|
|
||
| def _format_years(years: list[Optional[int]]) -> str: |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hint inconsistency: This function parameter uses lowercase list[Optional[int]], but the file imports and uses List from typing module elsewhere (see line 9 and line 683). For consistency with the rest of the codebase, use List[Optional[int]] instead of list[Optional[int]].
| def _format_years(years: list[Optional[int]]) -> str: | |
| def _format_years(years: List[Optional[int]]) -> str: |
arc/statmech/arkane.py
Outdated
|
|
||
| target_method_norm = _normalize_method(level.method) | ||
| target_base, method_year = _split_method_year(target_method_norm) | ||
| target_year = getattr(level, 'year', None) if getattr(level, 'year', None) is not None else method_year |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code allows year suffixes in method names (e.g., 'b97d32023') and extracts them for matching, but the documentation explicitly advises against including year suffixes in method names. Consider adding validation to warn or raise an error if a user provides both a year suffix in the method name AND an explicit year parameter with conflicting values, to prevent user confusion and potential bugs.
| target_year = getattr(level, 'year', None) if getattr(level, 'year', None) is not None else method_year | |
| explicit_year = getattr(level, 'year', None) | |
| if explicit_year is not None and method_year is not None and explicit_year != method_year: | |
| raise InputError( | |
| f"Conflicting year specifications for level '{level}': " | |
| f"explicit year={explicit_year}, method suffix year={method_year}. " | |
| "Please remove the year suffix from the method name or update the 'year' attribute to match." | |
| ) | |
| target_year = explicit_year if explicit_year is not None else method_year |
arc/statmech/arkane.py
Outdated
| If multiple entries only differ by year, the one with the *latest* year | ||
| is chosen (year=0 if no year in that entry). |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring description at lines 936-937 is misleading. It states that "the one with the latest year is chosen" but this only applies when the user does NOT specify a year AND no no-year entry exists. When a user specifies a year explicitly via the year parameter, only that exact year will be matched. The docstring should clarify that the latest-year fallback only occurs when no year is specified and no no-year entry exists.
| If multiple entries only differ by year, the one with the *latest* year | |
| is chosen (year=0 if no year in that entry). | |
| When a year is explicitly specified in the Level, only entries with that exact | |
| year are matched. If no year is specified and an entry without a year exists, | |
| that entry is used. Only when no year is specified and no no-year entry exists, | |
| if multiple entries differ only by year, the one with the *latest* year is | |
| chosen (treating entries with no year as year=0). |
| qm_corr_files = _get_qm_corrections_files() | ||
| if not qm_corr_files: | ||
| return None |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When no quantum corrections files are found (qm_corr_files is empty), the function silently returns None without any warning or error message. Consider adding a logger.warning() to inform users that no quantum corrections database files were found, which would help with debugging configuration issues.
… tests for year handling
55882cd to
aeabe6c
Compare
Adds debugging for CI workflow and configures environment variables to automatically confirm conda installations. Updates the RMG installation script to use strict channel priority. This ensures more consistent and reproducible environments, which is especially important for CI and user installations.
Updates Arkane's level of theory matching logic to prioritize entries with the latest year when no target year is specified. This ensures that the most up-to-date level of theory is selected when multiple options are available.
RMG-database for AEC has the suffix of years for methods and this can cause issues for ARC when it attempts to find a method/basis if the user does not provide the year. This PR is meant to allow for the following: