Current `fetch` logic is bad for parallelism

Despite all my efforts in `postgres-lopez`, the logic behind fetching new pages to be crawled does not play nicely with parallelism.

I explain: the query fetches a bunch of URLs, but fails to deliver a lot of _plurality of domains_. Keep in mind that crawl speed within the same domain is a heavily limiting factor. It is better to crawl a bunch of different domains in parallel.

Idea: eschew the `fetch` in the `MasterBackend` interface and move it to the `WorkerBackend`. Also, add a parameter `origin: url::Origin` so that the worker can control from which domain to fetch. Then, move fancy Url choosing logic to `Worker`.

This is also a step towards making `lopez` distributed. I suppose (?), the `fetch` in `MasterBackend` is a major bottleneck.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current `fetch` logic is bad for parallelism #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Current fetch logic is bad for parallelism #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Current `fetch` logic is bad for parallelism #5