🔥 Blazingly fast DataFrames for Ruby, powered by Polars
Add this line to your application’s Gemfile:
gem "polars-df"This library follows the Polars Python API.
Polars.scan_csv("iris.csv")
.filter(Polars.col("sepal_length") > 5)
.group_by("species")
.agg(Polars.all.sum)
.collectYou can follow Polars tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
From a CSV
Polars.read_csv("file.csv")
# or lazily with
Polars.scan_csv("file.csv")From Parquet
Polars.read_parquet("file.parquet")
# or lazily with
Polars.scan_parquet("file.parquet")From Active Record
Polars.read_database(User.all)
# or
Polars.read_database("SELECT * FROM users")From JSON
Polars.read_json("file.json")
# or
Polars.read_ndjson("file.ndjson")
# or lazily with
Polars.scan_ndjson("file.ndjson")From Feather / Arrow IPC
Polars.read_ipc("file.arrow")
# or lazily with
Polars.scan_ipc("file.arrow")From Avro
Polars.read_avro("file.avro")From Iceberg (experimental, requires iceberg)
Polars.scan_iceberg(table)From Delta Lake (experimental, requires deltalake-rb)
Polars.read_delta("./table")
# or lazily with
Polars.scan_delta("./table")From a hash
Polars::DataFrame.new({
a: [1, 2, 3],
b: ["one", "two", "three"]
})From an array of hashes
Polars::DataFrame.new([
{a: 1, b: "one"},
{a: 2, b: "two"},
{a: 3, b: "three"}
])From an array of series
Polars::DataFrame.new([
Polars::Series.new("a", [1, 2, 3]),
Polars::Series.new("b", ["one", "two", "three"])
])Get number of rows
df.heightGet column names
df.columnsCheck if a column exists
df.include?(name)Select a column
df["a"]Select multiple columns
df[["a", "b"]]Select first rows
df.headSelect last rows
df.tailFilter on a condition
df.filter(Polars.col("a") == 2)
df.filter(Polars.col("a") != 2)
df.filter(Polars.col("a") > 2)
df.filter(Polars.col("a") >= 2)
df.filter(Polars.col("a") < 2)
df.filter(Polars.col("a") <= 2)And, or, and exclusive or
df.filter((Polars.col("a") > 1) & (Polars.col("b") == "two")) # and
df.filter((Polars.col("a") > 1) | (Polars.col("b") == "two")) # or
df.filter((Polars.col("a") > 1) ^ (Polars.col("b") == "two")) # xorBasic operations
df["a"] + 5
df["a"] - 5
df["a"] * 5
df["a"] / 5
df["a"] % 5
df["a"] ** 2
df["a"].sqrt
df["a"].absRounding
df["a"].round(2)
df["a"].ceil
df["a"].floorLogarithm
df["a"].log # natural log
df["a"].log(10)Exponentiation
df["a"].expTrigonometric functions
df["a"].sin
df["a"].cos
df["a"].tan
df["a"].arcsin
df["a"].arccos
df["a"].arctanHyperbolic functions
df["a"].sinh
df["a"].cosh
df["a"].tanh
df["a"].arcsinh
df["a"].arccosh
df["a"].arctanhSummary statistics
df["a"].sum
df["a"].mean
df["a"].median
df["a"].quantile(0.90)
df["a"].min
df["a"].max
df["a"].std
df["a"].varGroup
df.group_by("a").countWorks with all summary statistics
df.group_by("a").maxMultiple groups
df.group_by(["a", "b"]).countAdd rows
df.vstack(other_df)Add columns
df.hstack(other_df)Inner join
df.join(other_df, on: "a")Left join
df.join(other_df, on: "a", how: "left")One-hot encoding
df.to_dummiesArray of hashes
df.to_aHash of series
df.to_hCSV
df.to_csv
# or
df.write_csv("file.csv")Parquet
df.write_parquet("file.parquet")JSON
df.write_json("file.json")
# or
df.write_ndjson("file.ndjson")Feather / Arrow IPC
df.write_ipc("file.arrow")Avro
df.write_avro("file.avro")Iceberg (experimental)
df.write_iceberg(table, mode: "append")Delta Lake (experimental)
df.write_delta("./table")Numo array
df.to_numoYou can specify column types when creating a data frame
Polars::DataFrame.new(data, schema: {"a" => Polars::Int32, "b" => Polars::Float32})Supported types are:
- boolean -
Boolean - decimal -
Decimal - float -
Float32,Float64 - integer -
Int8,Int16,Int32,Int64,Int128 - unsigned integer -
UInt8,UInt16,UInt32,UInt64,UInt128 - string -
String,Categorical,Enum - temporal -
Date,Datetime,Duration,Time - nested -
Array,List,Struct - other -
Binary,Object,Null,Unknown
Get column types
df.schemaFor a specific column
df["a"].dtypeCast a column
df["a"].cast(Polars::Int32)Add Vega to your application’s Gemfile:
gem "vega"And use:
df.plot("a", "b", type: "line")Supports line, pie, column, bar, area, and scatter plots
Group data
df.plot("a", "b", group: "c", type: "line")Stacked columns or bars
df.plot("a", "b", group: "c", type: "column", stacked: true)Plot a series [unreleased]
df["a"].plot.histSupports hist, kde, and line plots
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/ruby-polars.git
cd ruby-polars
bundle install
bundle exec rake compile
bundle exec rake test
bundle exec rake test:docs