Skip to content

Commit 746f18c

Browse files
committed
updates to series section
1 parent b67e56b commit 746f18c

File tree

1 file changed

+16
-8
lines changed

1 file changed

+16
-8
lines changed

lectures/polars.md

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,10 @@ as [statsmodels](https://www.statsmodels.org/) and [scikit-learn](https://scikit
6565

6666
This lecture will provide a basic introduction to polars.
6767

68+
```{tip}
69+
**Why use Polars over pandas?** The main reason is `performance`. As a general rule, it is recommended to have 5 to 10 times as much RAM as the size of the dataset to carry out operations in pandas, compared to 2 to 4 times needed for Polars. In addition, Polars is between 10 and 100 times as fast as pandas for common operations. A great article comparing the Polars and pandas can be found [in this JetBrains blog post](https://blog.jetbrains.com/pycharm/2024/07/polars-vs-pandas/)
70+
```
71+
6872
Throughout the lecture, we will assume that the following imports have taken
6973
place
7074

@@ -89,7 +93,6 @@ A `DataFrame` is a two-dimensional object for storing related columns of data.
8993

9094
Let's start with Series.
9195

92-
9396
We begin by creating a series of four random observations
9497

9598
```{code-cell} ipython3
@@ -98,11 +101,11 @@ s
98101
```
99102

100103
```{note}
101-
You may notice the above series has no indexes, unlike in [](pandas:series).
104+
You may notice the above series has no indices, unlike in [pd.Series](pandas:series).
102105
103106
This is because Polars' is column centric and accessing data is predominantly managed through filtering and boolean masks.
104107
105-
Here is [an interesting blog post discussing this in more detail](https://medium.com/data-science/understand-polars-lack-of-indexes-526ea75e413)
108+
Here is [an interesting blog post discussing this in more detail](https://medium.com/data-science/understand-polars-lack-of-indexes-526ea75e413).
106109
```
107110

108111
Polars `Series` are built on top of Apache Arrow arrays and support many similar
@@ -134,7 +137,10 @@ s.index = ['AMZN', 'AAPL', 'MSFT', 'GOOG']
134137
s
135138
```
136139

137-
However, in Polars you will need to use the `DataFrame` object to do the same task.
140+
However, in Polars you will need to use the `DataFrame` object to do the same task.
141+
142+
This means you will use the `DataFrame` object more commonly when using polars if you
143+
are interested in relationships between data.
138144

139145
Essentially any column in a Polars `DataFrame` can be used as an indices through the `filter` method.
140146

@@ -146,15 +152,16 @@ df_temp = pl.DataFrame({
146152
df_temp
147153
```
148154

149-
To access specific values by company name, we can filter the DataFrame
155+
To access specific values by company name, we can filter the DataFrame filtering on
156+
the `AMZN` ticker code and selecting the `daily returns`.
150157

151158
```{code-cell} ipython3
152-
# Get AMZN's return
153159
df_temp.filter(pl.col('company') == 'AMZN').select('daily returns').item()
154160
```
155161

162+
If we want to update `AMZN` return to 0, you can use the following chain of methods.
163+
156164
```{code-cell} ipython3
157-
# Update AMZN's return to 0
158165
df_temp = df_temp.with_columns(
159166
pl.when(pl.col('company') == 'AMZN')
160167
.then(0)
@@ -164,8 +171,9 @@ df_temp = df_temp.with_columns(
164171
df_temp
165172
```
166173

174+
You could also check if `AAPL` is in a column.
175+
167176
```{code-cell} ipython3
168-
# Check if AAPL is in the companies
169177
'AAPL' in df_temp.get_column('company')
170178
```
171179

0 commit comments

Comments
 (0)