[webconnectivity]: Support user-agent strings in request headers to bypass scrapper security

We observed that several e-commerce sites in Portugal had failing measurements: 
https://explorer.ooni.org/domain/www.asics.com
https://explorer.ooni.org/domain/www.elcorteingles.pt

although the websites are accessible through the browser. On further inspection, we found that this is also the case with other http clients like `curl` and we require a minimum number of user-agent request headers in order to access the website: 

```
curl 'https://www.elcorteingles.pt/' \
  -H 'sec-ch-ua: "Not;A=Brand";v="99", "Brave";v="139", "Chromium";v="139"' \
  -H 'sec-ch-ua-platform: "Android"' \
  -H 'user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Mobile Safari/537.36'
```

is successful while: 

```
curl -v 'https://www.elcorteingles.pt/' \
  -H 'sec-ch-ua: "Not;A=Brand";v="99", "Brave";v="139", "Chromium";v="139"' \
  -H 'user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Mobile Safari/537.36'
```
errors out with: 

```
...
* Request completely sent off
* HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
```

To this end, we should support using user-agent strings in request headers so we can measure domains with webconnectivity more confidently

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[webconnectivity]: Support user-agent strings in request headers to bypass scrapper security #1741

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[webconnectivity]: Support user-agent strings in request headers to bypass scrapper security #1741

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions