Hi, this is the same exact issue as here
minimal reproducible example
import os
from pathlib import Path
import time
import fastexcel
from openpyxl import Workbook
os.environ["POLARS_VERBOSE"] = "1"
fp = Path("repro_xfd.xlsx")
wb = Workbook()
ws_explode = wb.active
ws_explode.title = "explodes"
ws_no_explode = wb.create_sheet(title="no_explode")
# Small real dataset in A:C
ws_explode["A1"] = 1
ws_explode["B1"] = 2
ws_explode["C1"] = 3
ws_no_explode["A1"] = 1
ws_no_explode["B1"] = 2
ws_no_explode["C1"] = 3
for row in range(2, 100_000):
ws_explode.cell(row=row, column=1, value=row)
ws_explode.cell(row=row, column=2, value=row * 10)
ws_explode.cell(row=row, column=3, value=row * 100)
ws_no_explode.cell(row=row, column=1, value=row)
ws_no_explode.cell(row=row, column=2, value=row * 10)
ws_no_explode.cell(row=row, column=3, value=row * 100)
# Stray value in the last Excel column
ws_explode["XFD1"] = "stray"
wb.save(fp)
print("saved workbook")
reader = fastexcel.read_excel(fp)
sheet1 = reader.load_sheet_by_name(
"no_explode",
header_row=None,
use_columns=[0, 1, 2],
column_names=["A", "B", "C"],
dtypes="string",
)
df1 = sheet1.to_polars()
print("read no_explode")
print("df1 shape:", df1.shape)
del df1
del sheet1
print("sleeping for 10 seconds")
time.sleep(10)
sheet2 = reader.load_sheet_by_name(
"explodes",
header_row=None,
use_columns=[0, 1, 2],
column_names=["A", "B", "C"],
dtypes="string",
)
df2 = sheet2.to_polars()
print("read explodes")
print("df2 shape:", df2.shape)
log output
saved workbook
async thread count: 8
blocking thread count: 512
read no_explode
df1 shape: (99999, 3)
sleeping for 10 seconds
memory allocation of 52428275712 bytes failed
stack backtrace:
0: 0x7ffa929fabe0 - PyInit__fastexcel
1: 0x7ffa92a0b941 - PyInit__fastexcel
2: 0x7ffa929fe204 - PyInit__fastexcel
3: 0x7ffa929f73a9 - PyInit__fastexcel
4: 0x7ffa929f4acc - PyInit__fastexcel
5: 0x7ffa929f441f - PyInit__fastexcel
6: 0x7ffa929f743b - PyInit__fastexcel
7: 0x7ffa929f52e8 - PyInit__fastexcel
8: 0x7ffa92a23259 - PyInit__fastexcel
9: 0x7ffa92a23273 - PyInit__fastexcel
10: 0x7ffa9230cdd6 - PyInit__fastexcel
11: 0x7ffa92321905 - PyInit__fastexcel
12: 0x7ffa922843b8 - <unknown>
13: 0x7ffa922736eb - <unknown>
14: 0x7ffa922da2e2 - PyInit__fastexcel
15: 0x7ffa922d0a95 - PyInit__fastexcel
16: 0x7ffa92338b29 - PyInit__fastexcel
17: 0x7ffa9233b729 - PyInit__fastexcel
18: 0x7ffa922df4c2 - PyInit__fastexcel
19: 0x7ffa922dfb64 - PyInit__fastexcel
20: 0x7ffab6a81de1 - PyObject_GC_Track
21: 0x7ffab69e641c - PyObject_Vectorcall
22: 0x7ffab69e6379 - PyObject_Vectorcall
23: 0x7ffab6a055c6 - PyEval_EvalFrameDefault
24: 0x7ffab6a7bca8 - PyEval_EvalCode
25: 0x7ffab6a7bb5e - PyEval_EvalCode
26: 0x7ffab6a7ba69 - PyAST_Compile
27: 0x7ffab6a7b8ec - PyAST_Compile
28: 0x7ffab6acfd5f - PyUnicode_EqualToUTF8
29: 0x7ffab6acff48 - PyUnicode_EqualToUTF8
30: 0x7ffab6acf395 - PyEval_MakePendingCalls
31: 0x7ffab6acf232 - Py_fopen_obj
32: 0x7ffab6acf417 - PyEval_MakePendingCalls
33: 0x7ffab6a63f6b - PyInterpreterState_SetRunningMain
34: 0x7ffab6a629fc - Py_RunMain
35: 0x7ffab6a629a3 - Py_Main
36: 0x7ff6f26d1230 - <unknown>
37: 0x7ffb1eba7ac4 - BaseThreadInitThunk
38: 0x7ffb209ba8c1 - RtlUserThreadStart
Issue Description
Issue description
While reading data from excel workbooks, I noticed that my jupyter notebook's kernel would crash, but only on certain sheets of certain workbooks. On further inspection, I noticed that the workbooks it would crash on, had empty columns after, say, column Z, but then one value in column XFD (the right-most column). Even after explicitly selecting the columns to read from, it still crashes.
I guess it's reading all of the null columns in between the actual data and the stray value in XFD, before selecting the passed columns.
Notice in the snippet above how both sheets are identical except for one stray value. One runs fine, but the other explodes into 52.5 GB
Expected behavior
It should read the sheet with the stray value without crashing
details
this may be of use to you
polars.show_versions() returns the following.
--------Version info---------
Polars: 1.40.1
Index type: UInt32
Platform: Windows-2019Server-10.0.17763-SP0
Python: 3.13.13 (tags/v3.13.13:01104ce, Apr 7 2026, 19:25:48) [MSC v.1944 64 bit (AMD64)]
Runtime: rt32
----Optional dependencies----
Azure CLI 'az' is not recognized as an internal or external command,
operable program or batch file.
<not installed>
adbc_driver_manager <not installed>
altair <not installed>
azure.identity <not installed>
boto3 1.43.5
cloudpickle <not installed>
connectorx <not installed>
deltalake <not installed>
fastexcel 0.20.2
fsspec <not installed>
gevent <not installed>
google.auth <not installed>
great_tables <not installed>
matplotlib 3.10.8
numpy 2.4.2
openpyxl 3.1.5
pandas 3.0.0
polars_cloud <not installed>
pyarrow 23.0.0
pydantic <not installed>
pyiceberg <not installed>
sqlalchemy 2.0.49
torch <not installed>
xlsx2csv <not installed>
xlsxwriter 3.2.9
Hi, this is the same exact issue as here
minimal reproducible example
log output
Issue Description
Issue description
While reading data from excel workbooks, I noticed that my jupyter notebook's kernel would crash, but only on certain sheets of certain workbooks. On further inspection, I noticed that the workbooks it would crash on, had empty columns after, say, column Z, but then one value in column XFD (the right-most column). Even after explicitly selecting the columns to read from, it still crashes.
I guess it's reading all of the null columns in between the actual data and the stray value in XFD, before selecting the passed columns.
Notice in the snippet above how both sheets are identical except for one stray value. One runs fine, but the other explodes into 52.5 GB
Expected behavior
It should read the sheet with the stray value without crashing
details
this may be of use to you
polars.show_versions() returns the following.