I’m running into a persistent issue when trying to run the scraper.
I'm using this command (taken from https://typesense.org/docs/guide/docsearch.html#run-the-scraper:~:text=host.docker.internal-,Run%20the%20scraper%3A,-docker%20run%20%2Dit)
docker run -it --env-file=.env -e "CONFIG=$(cat config.json | jq -r tostring)" typesense/docsearch-scraper:0.11.0
but the scraper fails with this error:
Traceback (most recent call last):
File "/home/seleuser/src/config/config_loader.py", line 102, in _load_config
data = json.loads(config, object_pairs_hook=OrderedDict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/__init__.py", line 359, in loads
return cls(**kw).decode(s)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/seleuser/src/index.py", line 138, in <module>
run_config(environ['CONFIG'])
File "/home/seleuser/src/index.py", line 34, in run_config
config = ConfigLoader(config)
^^^^^^^^^^^^^^^^^^^^
File "/home/seleuser/src/config/config_loader.py", line 70, in __init__
data = self._load_config(config)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/seleuser/src/config/config_loader.py", line 107, in _load_config
raise ValueError('CONFIG is not a valid JSON')
ValueError: CONFIG is not a valid JSON
config.json is valid JSON (validated through jq, ConvertFrom-Json).
here there is the config.json used:
{
"index_name": "docs",
"start_urls": ["http://host.docker.internal:3000/docs/"],
"sitemaps_urls": ["http://host.docker.internal:3000/sitemap.xml"],
"sitemap_alternate_links": true,
"stop_urls": [],
"selectors": {
"lvl0": {
"selector": ".menu__link--sublist.menu__link--active",
"global": true,
"default_value": "Documentation"
},
"lvl1": "[class^='docItemContainer_'] h1",
"lvl2": "[class^='docItemContainer_'] h2",
"lvl3": "[class^='docItemContainer_'] h3",
"lvl4": "[class^='docItemContainer_'] h4",
"lvl5": "[class^='docItemContainer_'] h5",
"text": "[class^='docItemContainer_'] p, [class^='docItemContainer_'] li"
},
"selectors_exclude": [
".hash-link"
]
}
I'm missing something in the configuration that could leads to this error?
I’m running into a persistent issue when trying to run the scraper.
I'm using this command (taken from https://typesense.org/docs/guide/docsearch.html#run-the-scraper:~:text=host.docker.internal-,Run%20the%20scraper%3A,-docker%20run%20%2Dit)
docker run -it --env-file=.env -e "CONFIG=$(cat config.json | jq -r tostring)" typesense/docsearch-scraper:0.11.0but the scraper fails with this error:
config.json is valid JSON (validated through jq, ConvertFrom-Json).
here there is the config.json used:
I'm missing something in the configuration that could leads to this error?