Egork/flake8 by egork520 · Pull Request #171 · allenai/mmda

egork520 · 2022-10-24T18:18:07Z

Refactored the code to follow PEP8 style guide

Adding flake8 command to CICD pipeline

… pytest

… specify it in pytest.ini

… and coverage reporting tool

…re coverage.

Co-authored-by: Luca Soldaini <lucas@allenai.org>

soldni

In general, I am a little bit scared of such a giant PR touching so many files. Many of the changes make the code a lot less readable than before, and it would be good to manually refactor where necessary.

Some questions:

1. Is this a flavor of PEP8 we like?

Some non-standard things I noticed:

120 chars lines
ok with mix of single and double quotes
non-multiple indentation allowed
new lines on binary operators

As an example, I went with the following in smashed:

[tool.black]
line-length = 79
include = '\.pyi?$'
exclude = '''
(
      __pycache__
    | \.git
    | \.mypy_cache
    | \.pytest_cache
    | \.vscode
    | \.venv
    | \bdist\b
    | \bdoc\b
)
'''

[tool.isort]
profile = "black"
line_length = 79
multi_line_output = 3

[tool.autopep8]
max_line_length = 79
in-place = true
recursive = true
aggressive = 3

[tool.mypy]
python_version = 3.8
ignore_missing_imports = true
no_site_packages = true
allow_redefinition = false

[tool.mypy-tests]
strict_optional = false

[tool.flake8]
exclude = [
    ".venv/",
    "tmp/"
]
per-file-ignores = [
    '*.py:E203',
    '__init__.py:F401',
    '*.pyi:E302,E305'
]

(Note that the pyproject.toml above requires flake8-pyi and Flake8-pyproject to run properly)

We could also align with other AI2 projects are using, e.g. Tango's.

2. we should probably adopt some auto-formatting tools we run before release

Again, in smashed I have a combo of flake8, isort, and autopep8 I ask contributors to run black . && flake8 . &&o isort .

3. Should we adopt `mypy` too?

Probably not immediately? But again, other projects use it.

4. Some of the automatic refactor kills semantics of comments

I left a note in a couple of places where this happens.

soldni · 2022-10-24T20:23:33Z


        Modified Oct 2022 (kylel): Changed return value to be List[int]
-        """
+        """ # noqa


#noqa is a blanket ignore, we should use specific errors ignore instead.

soldni · 2022-10-24T20:24:13Z


    ###################################################################
-    ##################### Necessary Model Variables ###################
+    # Necessary Model Variables #


we shoudn't refactor this automatically

soldni · 2022-10-24T20:24:57Z

+                if next_row_first_token_text[-len(plural_suffix):] == plural_suffix:
                    next_row_first_token_text = next_row_first_token_text[
-                        : -len(plural_suffix)
-                    ]
+                                                : -len(plural_suffix)
+                                                ]


this is not very legible

soldni · 2022-10-24T20:25:31Z

+                # input string list: [' Anon ', '1934', ' ', 'University and Educational Intelligence', ' ', 'Nature',
+                # ' ', '133', ' ', '805–805']
+                # tokenization removes empty string: ['[CLS]', 'an', '##on', '1934', 'university', 'and',
+                # 'educational',
+                # 'intelligence', 'nature', '133', '80', '##5', '–', '80', '##5', '[SEP]']
+                # skipping empty string results in skipping word id: [None, 0, 0, 1, 3, 3, 3, 3, 5, 7, 9, 9, 9, 9,
+                # 9, None]


This comment is now very hard to read.

soldni · 2022-10-24T20:26:18Z

+from mmda.predictors.hf_predictors.bibentry_predictor.types import (BibEntryPredictionWithSpan,
+                                                                    BibEntryStructureSpanGroups)


there are a ton of non-multiple-of-4 indentation added by this PR–are we ok with them?

soldni · 2022-10-24T20:27:17Z

 import layoutparser as lp

 from mmda.types import Document, Box, BoxGroup, Metadata
-from mmda.types.names import *


iirc, the star import was intentional here? or maybe it was intentional somewhere else.

soldni · 2022-10-24T20:28:50Z

-from typing import Union, List, Dict, Any, Optional
+from typing import List, Dict, Optional

+from PIL.Image import Image


explicitly importing PIL instead of type annotations via PIL.Image might cause import errors if layoutparser dependencies are not installed. Please check on a minimal installation.

soldni · 2022-10-24T20:30:41Z

+    mmda/types/old/boundingbox.old.py
+per-file-ignores =
+
+max-line-length = 119


Why 119 vs 79 vs something else?

Why is ai2_internal, tests, and examples not checked?

egork520 · 2022-10-24T20:52:47Z

In general, I am a little bit scared of such a giant PR touching so many files. Many of the changes make the code a lot less readable than before, and it would be good to manually refactor where necessary.

I agree and should probably asked before opening. Mix of styles in different parts of the code does not please my eyes.

Some questions:

1. Is this a flavor of PEP8 we like?

Some non-standard things I noticed:

120 chars lines
It is up for discussion, the resolution of the screens I personally prefer longer lines.

ok with mix of single and double quotes
Personally I am used to single quotes unless double is needed. It is up for discussion.

non-multiple indentation allowed
new lines on binary operators

As an example, I went with the following in smashed:

[tool.black]
line-length = 79
include = '\.pyi?$'
exclude = '''
(
      __pycache__
    | \.git
    | \.mypy_cache
    | \.pytest_cache
    | \.vscode
    | \.venv
    | \bdist\b
    | \bdoc\b
)
'''

[tool.isort]
profile = "black"
line_length = 79
multi_line_output = 3

[tool.autopep8]
max_line_length = 79
in-place = true
recursive = true
aggressive = 3

[tool.mypy]
python_version = 3.8
ignore_missing_imports = true
no_site_packages = true
allow_redefinition = false

[tool.mypy-tests]
strict_optional = false

[tool.flake8]
exclude = [
    ".venv/",
    "tmp/"
]
per-file-ignores = [
    '*.py:E203',
    '__init__.py:F401',
    '*.pyi:E302,E305'
]

(Note that the pyproject.toml above requires flake8-pyi and Flake8-pyproject to run properly)

We could also align with other AI2 projects are using, e.g. Tango's.

Agree, might be worth formalizing style requirements for s2? Can even be part of 3 year vision plan (sharpen the saw)

2. we should probably adopt some auto-formatting tools we run before release

Again, in smashed I have a combo of flake8, isort, and autopep8 I ask contributors to run black . && flake8 . &&o isort .

3. Should we adopt mypy too?

Probably not immediately? But again, other projects use it.

4. Some of the automatic refactor kills semantics of comments

I left a note in a couple of places where this happens.

egork520 and others added 30 commits October 19, 2022 16:29

Moving vila test to a class so that pytest ./test works locally

4f4c5c6

Added notes on how to run unit tests locally

ff18984

Moving change of directory to the setUp class. Locally tests fails in…

7f7bd66

… pytest

Moving change of directory to the setUp class. Locally tests fails in…

747ace4

… pytest

Creating variable for the fixtures path

a425583

Adding pytest.ini no need to type the folder for the tests

38b76b3

Adding exact command for selecting tests, removing directory name, we…

425b072

… specify it in pytest.ini

Adding test install dependencies, for running tests on multiple cpus,…

5c1ec12

… and coverage reporting tool

Command for running tests on multiple cpus

062f953

Adding test requirements installation to the mmda-ci.yml

55689ae

Adding plugin for running converage more easily, removing coverage

9e1bf61

Adding coverage config file,

6fbf179

Adding coverage lower bound 57%, specifying module for which to measu…

4035404

…re coverage.

Removing individual config files in favore of setup.cfg file

9d9d4d7

Update tests/test_predictors/test_vila_predictors.py

7739963

Co-authored-by: Luca Soldaini <lucas@allenai.org>

Update tests/test_predictors/test_vila_predictors.py

d0f7117

Co-authored-by: Luca Soldaini <lucas@allenai.org>

Update tests/test_predictors/test_vila_predictors.py

faded77

Co-authored-by: Luca Soldaini <lucas@allenai.org>

Update tests/test_predictors/test_vila_predictors.py

ac54653

Co-authored-by: Luca Soldaini <lucas@allenai.org>

Update tests/test_predictors/test_figure_table_predictors.py

c1b97fd

Co-authored-by: Luca Soldaini <lucas@allenai.org>

Update tests/test_parsers/test_pdf_plumber_parser.py

cd06980

Co-authored-by: Luca Soldaini <lucas@allenai.org>

Update tests/test_parsers/test_pdf_plumber_parser.py

fb00985

Co-authored-by: Luca Soldaini <lucas@allenai.org>

Update tests/test_parsers/test_pdf_plumber_parser.py

69e1c7b

Co-authored-by: Luca Soldaini <lucas@allenai.org>

Update tests/test_parsers/test_pdf_plumber_parser.py

55cd37d

Co-authored-by: Luca Soldaini <lucas@allenai.org>

Rolling in test dependencies into dev

ffa7b56

Fixing typos

c26c46b

Adding parameters for the coverage

5e4b09f

Specifying percentage of the coverage in the builds

45eff40

Updating comments about pytest

641b542

Merge branch 'main' of github.com:allenai/mmda into egork/flake8

cf0436b

First part of lint fixing

6ba65aa

egork520 added 4 commits October 24, 2022 11:12

Second part of lint fixing

690c690

Skip old code

7ad0d57

Adding flake8 run to the compile step

0a9786a

Adding a note on how to run the flake8 test

46961d6

egork520 requested review from geli-gel, kyleclo and soldni October 24, 2022 18:18

soldni reviewed Oct 24, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Egork/flake8#171

Egork/flake8#171
egork520 wants to merge 34 commits into
mainfrom
egork/flake8

egork520 commented Oct 24, 2022

Uh oh!

soldni left a comment

Uh oh!

soldni Oct 24, 2022 •

edited

Loading

Uh oh!

soldni Oct 24, 2022

Uh oh!

soldni Oct 24, 2022

Uh oh!

soldni Oct 24, 2022

Uh oh!

soldni Oct 24, 2022

Uh oh!

soldni Oct 24, 2022

Uh oh!

soldni Oct 24, 2022

Uh oh!

soldni Oct 24, 2022

Uh oh!

egork520 commented Oct 24, 2022

1. Is this a flavor of PEP8 we like?

2. we should probably adopt some auto-formatting tools we run before release

3. Should we adopt `mypy` too?

4. Some of the automatic refactor kills semantics of comments

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from mmda.predictors.hf_predictors.bibentry_predictor.types import (BibEntryPredictionWithSpan,
		BibEntryStructureSpanGroups)

Conversation

egork520 commented Oct 24, 2022

Uh oh!

soldni left a comment

Choose a reason for hiding this comment

1. Is this a flavor of PEP8 we like?

2. we should probably adopt some auto-formatting tools we run before release

3. Should we adopt mypy too?

4. Some of the automatic refactor kills semantics of comments

Uh oh!

soldni Oct 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soldni Oct 24, 2022

Choose a reason for hiding this comment

Uh oh!

soldni Oct 24, 2022

Choose a reason for hiding this comment

Uh oh!

soldni Oct 24, 2022

Choose a reason for hiding this comment

Uh oh!

soldni Oct 24, 2022

Choose a reason for hiding this comment

Uh oh!

soldni Oct 24, 2022

Choose a reason for hiding this comment

Uh oh!

soldni Oct 24, 2022

Choose a reason for hiding this comment

Uh oh!

soldni Oct 24, 2022

Choose a reason for hiding this comment

Uh oh!

egork520 commented Oct 24, 2022

1. Is this a flavor of PEP8 we like?

2. we should probably adopt some auto-formatting tools we run before release

3. Should we adopt mypy too?

4. Some of the automatic refactor kills semantics of comments

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

3. Should we adopt `mypy` too?

soldni Oct 24, 2022 •

edited

Loading

3. Should we adopt `mypy` too?