Skip to content

ValueError: CONFIG is not a valid JSON #100

Description

@stewashes

I’m running into a persistent issue when trying to run the scraper.
I'm using this command (taken from https://typesense.org/docs/guide/docsearch.html#run-the-scraper:~:text=host.docker.internal-,Run%20the%20scraper%3A,-docker%20run%20%2Dit)

docker run -it --env-file=.env -e "CONFIG=$(cat config.json | jq -r tostring)" typesense/docsearch-scraper:0.11.0

but the scraper fails with this error:

Traceback (most recent call last):
  File "/home/seleuser/src/config/config_loader.py", line 102, in _load_config
    data = json.loads(config, object_pairs_hook=OrderedDict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 359, in loads
    return cls(**kw).decode(s)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/seleuser/src/index.py", line 138, in <module>
    run_config(environ['CONFIG'])
  File "/home/seleuser/src/index.py", line 34, in run_config
    config = ConfigLoader(config)
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/seleuser/src/config/config_loader.py", line 70, in __init__
    data = self._load_config(config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/seleuser/src/config/config_loader.py", line 107, in _load_config
    raise ValueError('CONFIG is not a valid JSON')
ValueError: CONFIG is not a valid JSON

config.json is valid JSON (validated through jq, ConvertFrom-Json).

here there is the config.json used:

{
  "index_name": "docs",
  "start_urls": ["http://host.docker.internal:3000/docs/"],
  "sitemaps_urls": ["http://host.docker.internal:3000/sitemap.xml"],
  "sitemap_alternate_links": true,
  "stop_urls": [],
  "selectors": {
    "lvl0": {
      "selector": ".menu__link--sublist.menu__link--active",
      "global": true,
      "default_value": "Documentation"
    },
    "lvl1": "[class^='docItemContainer_'] h1",
    "lvl2": "[class^='docItemContainer_'] h2",
    "lvl3": "[class^='docItemContainer_'] h3",
    "lvl4": "[class^='docItemContainer_'] h4",
    "lvl5": "[class^='docItemContainer_'] h5",
    "text": "[class^='docItemContainer_'] p, [class^='docItemContainer_'] li"
  },
  "selectors_exclude": [
    ".hash-link"
  ]
}

I'm missing something in the configuration that could leads to this error?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions