-
Notifications
You must be signed in to change notification settings - Fork 18
Introduce SelectFields
#118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## feat-fields-disabled #118 +/- ##
========================================================
- Coverage 99.57% 99.18% -0.40%
========================================================
Files 25 25
Lines 937 976 +39
========================================================
+ Hits 933 968 +35
- Misses 4 8 +4
|
d07f361 to
fee1e3f
Compare
fee1e3f to
e1e2451
Compare
| @property | ||
| def fields_to_extract(self) -> Iterable[str]: | ||
| """Returns an Iterable of field names which should populate the designated | ||
| ``item_cls``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm proposing to expose the fields_to_extract property method since it's a handy utility for frameworks like scrapy-poet to see which fields to populate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we choose a name in line with the corresponding attribute of @field to exclude or include fields from to_item output? (e.g. to_item_fields)
|
|
||
| unknown_fields = set(fields) - set(page_obj_fields.keys()).union({"*"}) | ||
| if unknown_fields: | ||
| self._handle_unknown_field(unknown_fields, select_fields.on_unknown_field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling of unknown fields seems to be more optimal to be placed here as opposed to something like item_from_fields() which could've already extracted some fields (which could be expensive, i.e., having additional request) before erroring out.
| item_cls: Type[T] = dict, # type: ignore[assignment] | ||
| *, | ||
| skip_nonitem_fields: bool = False, | ||
| field_names: Optional[Iterable[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an unknown field has been unintentionally passed, it'd error out with:
# omitted stack trace
File ~/dev/z/web-poet/web_poet/fields.py:235, in <dictcomp>(.0)
233 if skip_nonitem_fields:
234 field_names = _without_unsupported_field_names(item_cls, field_names)
--> 235 return item_cls(**{name: getattr(obj, name) for name in field_names})
AttributeError: 'SomePage' object has no attribute 'unknown_field'
I think this is descriptive enough and we don't need additional handling of unknown fields here. Handling it in the .fields_to_extract property method should be enough.
| field_names=self.fields_to_extract, | ||
| ) | ||
|
|
||
| def _get_select_fields(self) -> Optional[SelectFields]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_select_fields() is not being cached.
I think this should be okay since the user might want to re-use the same page instance but with different SelectFields later on:
page = SomePage(SelectFields(fields={"*": False, "x": True}))
item1 = page.to_item()
page.select_fields = SelectFields(fields={"*": False, "x": True, "y": True})
item2 = page.to_item()Any thoughts about this usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with not caching.
However, I think we should avoid covering this scenario in the docs, and I am not sure if we should even cover it in tests. Changing the value of select_fields does not feel like a very elegant API to me.
| assert page.fields_to_extract == ["y"] | ||
| assert page.call_counter == {"y": 1} | ||
|
|
||
| # boolean-like values are not supported. They are simply ignored and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should raise an error, not ignore wrong values.
| assert page.call_counter == {"x": 1, "y": 1} | ||
|
|
||
| # If a non-SelectFields instance was passed to the `select_fields` init | ||
| # parameter, it would simply ignore it and use the default field directives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes more sense to me to raise an error in this case.
| #: | ||
| #: Setting it to ``"raise"`` would raise an :class:`AttributeError`, | ||
| #: ``"warn"`` produces a :class:`UserWarning`, while ``"ignore"`` does nothing. | ||
| on_unknown_field: UnknownFieldActions = "raise" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Could you please elaborate on the use cases for on_unknown_field?
Also, what does unknown field mean? Does it mean that the field is not in the item, or does it mean that the field is not defined on the page object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean that the field is not defined on the page object?
It's mostly regarding about this but because of the parity between the page object and its item class, the field should generally not be in the item as well.
The unknown field can be derived by something like set(fields_to_include) - set(fields_to_exclude). This means unknown fields that are to be excluded won't be considered as actual unknown fields.
I'm also thinking of removing this parameter and simply raise any unknown fields by default. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I see.
It seems the case with field not in the item and a field not in the page object are very different.
If a field is not in the item, we should probably raise an exception. But there is a case where skip_nonitem_fields=True is used; in this case it actually could make sense to have fields in SelectFields which are not in the item.
If a field is not defined in the page object, but is present in the item, then final item.field returned by to_item is going to have a default value, None in most cases. It could also be the case that the attribute is populated in to_item method without using web-poet fields. It's not an error. It seems that we should allow to pick any fields which are present in the item, even if they're not fields of the page object.
We should have a clear answer though what happens when some of the item attributes are extracted using @fields, and some of them are added later, in the to_item method.
There is also a complicated case where there is a field defined, but then the value assigned to the final item is changed in to_item method. Probably it's fine not to support this case, but we should at least document it.
Overall the problem looks more complicated than I initially thought :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I suggested such an option, I was thinking of a scenario like this one:
B team writes BPage, subclassing APage from A team, and returning BItem, which does not subclass AItem. B team might not want new fields in APage to break BPage. But they might want a warning or error to look into it eventually in case we want to do something about it.
But it can indeed be more complex than that. When a field is removed, B team may want a different resolution. For example, they might want a warning for new fields, but an exception for missing fields.
|
Closing in favor of #138 |
This is related to #115 which implements approach 3.
swap_item_clsis omitted for now to reduce its complexity. We can add it later.TODO: