Skip to content

Problem writing string columns to avro file #17

@kobusherbst

Description

@kobusherbst

When I try to convert an Arrow file to Avro using this code:

function t_arrowtoavro(path, file)
    a = Arrow.Table(joinpath(path, "$(file).arrow")) |> DataFrame
    println(Tables.schema(a))
    Avro.writetable(joinpath(path, "$(file).avro"), a; compress=:zstd)
end

I get the following error whenever the input file contains a string column, e.g.

Tables.Schema:
 :IndividualId  Int32
 :ResultDate    Union{Missing, Date}
 :HIVResult     Union{Missing, String}
ERROR: ArgumentError: internal writing error: buffer too small, len = 1563130

Whereas this version does not produce this error:

function t_arrowtoavro(path, file)
    a = Arrow.Table(joinpath(path, "$(file).arrow")) |> DataFrame
    println(Tables.schema(a))
    b = select(a, :IndividualId, :ResultDate)
    println(Tables.schema(b))
    Avro.writetable(joinpath(path, "$(file).avro"), b; compress=:zstd)
end

Output:

Tables.Schema:
 :IndividualId  Int32
 :ResultDate    Union{Missing, Date}
 :HIVResult     Union{Missing, String}
Tables.Schema:
 :IndividualId  Int32
 :ResultDate    Union{Missing, Date}
"D:\\Data\\Demography\\AHRI\\Staging\\HIVResults.avro"

How do I resolve this problem if I need a string column in my output file?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions