Skip to content
This repository was archived by the owner on Jan 15, 2026. It is now read-only.
This repository was archived by the owner on Jan 15, 2026. It is now read-only.

Error unzipping large file downloaded with chunk in Windows #44

@mfagundes

Description

@mfagundes

I'm using Windows 10, with Powershell (with base conda environment automatically activated).

Tried to download the biggest file (Estabelecimentos0.zip). Had the following error:

(base) PS C:\Users\mauricio\chunk_teste> ..\chunk-v1.0.0-windows-amd64.exe https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip --force-restart
Downloading 622.4MB of 878.1MB  70.88%  1.4MB/s2022/12/26 18:51:31 error downloadinf chunk #90073: error downloading https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip: All attempts fail:
#1: request to https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip ended due to timeout: context deadline exceeded
#2: request to https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip ended due to timeout: context deadline exceeded
#3: request to https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip ended due to timeout: context deadline exceeded
#4: request to https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip ended due to timeout: context deadline exceeded
#5: request to https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip ended due to timeout: context deadline exceeded
(base) PS C:\Users\mauricio\chunk_teste>

Tried to restart download, and the following error was reported:

(base) PS C:\Users\mauricio\chunk_teste> ..\chunk-v1.0.0-windows-amd64.exe https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip
2022/12/26 18:52:46 could not creat a progress file: error loading existing progress file: error decoding progress file C:\Users\mauricio\.chunk\c811d2999ff5d6a15340c98b44fd8126-Estabelecimentos0.zip: unexpected EOF
(base) PS C:\Users\mauricio\chunk_teste>

With the flag --force-restart the download worked, however from the beggining of the file. Once again, after over 500Mb downloaded, the prior timeout error occurred. Can't restart without --force-restart flag`

The zip file, however, is downloaded and, when I try to unzip it (using 7-zip) it reports a data error, but saves the content (a csv file). But this file cannot be loaded in pandas or even in a spreadsheet software. In a text editor (Notepad++) it shows coherent data for the first lines (about 4.000.000), but after that it's clearly cluttered.

With a smaller file (Empresas1.zip), it worked correctly. The file was downloaded, unzipped and opened in Pandas (4.494.859 lines)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingwindows

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions