Skip to content

Conversation

@sfluegel05
Copy link
Collaborator

No description provided.

schnamo and others added 30 commits June 6, 2024 17:41
@schnamo
Copy link
Collaborator

schnamo commented Oct 31, 2025

  • Adding support for regression problems and a wider range of classification problems

@schnamo
Copy link
Collaborator

schnamo commented Nov 3, 2025

add loading from checkpoint pretrained model fix

@sfluegel05
Copy link
Collaborator Author

I added some comments. It would be great if you could have a look at them. Also, you have added quite a number of config files. Some seem to be very specific (e.g. an ELECTRA config with a different learning rate for a specific experiment). My suggestion would be to either remove those configs (and publish it in a paper-specific zenodo archive or mention the parameters in the paper) or group them so that new users don't get overwhelmed (e.g. all moleculenet dataset configs could be one folder).

use_sigmoidal_implication: bool = False,
weight_epoch_dependent: Union[bool | tuple[int, int]] = False,
weight_epoch_dependent: Union[bool, Tuple[int, int]] = False,
weight_epoch_dependent: Union[bool, Tuple[int, int]] = False,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does weight_epoch_dependent appear twice here?

if self.pass_loss_kwargs:
loss_kwargs = loss_kwargs_candidates
loss_kwargs["current_epoch"] = self.trainer.current_epoch
# loss_kwargs["current_epoch"] = self.trainer.current_epoch
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this commented out? Afaik we don't have any loss function at the moment that needs this (this was added for some experimental semantic loss features that didn't perform well). Does this break anything?


from chebai.loss.semantic import DisjointLoss as ElectraChEBIDisjointLoss # noqa
# TODO: put back in before pull request
# from chebai.loss.semantic import DisjointLoss as ElectraChEBIDisjointLoss # noqa
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess you wanted to uncomment this :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be a problem for merging. I have added new smiles tokens on a different branch (from pubchem) so the new pubchem-pretrained model (and all models based on that) will depend on those tokens.

Are the tokens you added here actually used by a model or are those just artifacts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the part in question and will open an issue and look into what is going on with this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for deleting this file?

restructering of config files
fixing small issues from merging
@schnamo
Copy link
Collaborator

schnamo commented Nov 11, 2025

addressed all comments

@sfluegel05 sfluegel05 changed the title merge into dev New regression and classification datasets for ontology pre-training Nov 11, 2025

def _get_token_index(self, token: str) -> int:
"""Returns a unique number for each token, automatically adds new tokens."""
print(str(token))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is a leftover from debugging?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will remove it!

@sfluegel05 sfluegel05 added this to the v1.1 milestone Nov 17, 2025
@schnamo
Copy link
Collaborator

schnamo commented Dec 11, 2025

Lint issues fixed. Unit tests most likely need to be adjusted, missing labels might cause issues in some places

@sfluegel05
Copy link
Collaborator Author

The unittests can be fixed by adjusting the mock data for the Tox21Challenge dataset. You just need to add the missing_labels attribute for all entries that have missing labels.

Below are the fixed functions for Tox21ChallengeMockData from tests.unit.mock_data.tox_mock_data.

    @staticmethod
    def data_in_dict_format() -> List[Dict]:
        data_list = [
            {
                "labels": [
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    0,
                    None,
                    None,
                ],
                "ident": "25848",
            },
            {
                "labels": [
                    0,
                    None,
                    None,
                    1,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                ],
                "ident": "2384",
            },
            {
                "labels": [
                    0,
                    None,
                    0,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                ],
                "ident": "27102",
            },
            {
                "labels": [
                    1,
                    1,
                    1,
                    1,
                    1,
                    1,
                    1,
                    1,
                    1,
                    1,
                    1,
                    1,
                ],
                "ident": "26792",
            },
            {
                "labels": [
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    1,
                    None,
                    1,
                    None,
                    None,
                ],
                "ident": "26401",
            },
            {
                "labels": [
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                ],
                "ident": "25973",
            },
        ]

        for dict_ in data_list:
            dict_["features"] = Tox21ChallengeMockData.FEATURE_OF_SMILES
            dict_["group"] = None
            # missing labels get added here
            if any(label is None for label in dict_["labels"]):
                dict_["missing_labels"] = [True if label is None else False for label in dict_["labels"]]

        return data_list
    @staticmethod
    def get_setup_processed_output_data() -> List[Dict]:
        """
        Returns mock processed data used for testing the `setup_processed` method.

        The data contains molecule identifiers and their corresponding toxicity labels for multiple endpoints.
        Each dictionary in the list represents a molecule with its associated labels, features, and group information.

        Returns:
            List[Dict]: A list of dictionaries where each dictionary contains:
                        - "features": The SMILES features of the molecule.
                        - "labels": A list of toxicity endpoint labels (0, 1, or None).
                        - "ident": The sample identifier.
                        - "group": None (default value for the group key).
        """

        # "NR-AR", "NR-AR-LBD", "NR-AhR", "NR-Aromatase", "NR-ER", "NR-ER-LBD", "NR-PPAR-gamma", "SR-ARE", "SR-ATAD5",
        # "SR-HSE", "SR-MMP", "SR-p53",
        data_list = [
            {
                "labels": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                "ident": "NCGC00260869-01",
            },
            {
                "labels": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                "ident": "NCGC00261776-01",
            },
            {
                "labels": [
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                ],
                "ident": "NCGC00261380-01",
            },
            {
                "labels": [0, 0, 0, None, 0, 0, 0, 0, 0, 0, None, 1],
                "ident": "NCGC00261842-01",
            },
            {
                "labels": [0, 0, 1, None, 1, 1, 1, None, 1, 1, None, 1],
                "ident": "NCGC00261662-01",
            },
            {
                "labels": [0, 0, None, None, 1, 0, 0, 1, 0, 0, 1, 1],
                "ident": "NCGC00261190-01",
            },
        ]

        complete_list = []
        for dict_ in data_list:
            complete_list.append(
                {
                    "features": Tox21ChallengeMockData.FEATURE_OF_SMILES,
                    **dict_,
                    "group": None,
                }
            )
            # add missing labels
            if any(label is None for label in dict_["labels"]):
                complete_list[-1]["missing_labels"] = [
                    True if label is None else False for label in dict_["labels"]
                ]

        return complete_list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants