Skip to content

Commit 8b8a676

Browse files
Add files via upload
1 parent 1a5269c commit 8b8a676

File tree

2 files changed

+33
-0
lines changed

2 files changed

+33
-0
lines changed

code/Descriptive_Analysis.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Python 3.12.0 (v3.12.0:0fb18b02c8, Oct 2 2023, 09:45:56) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
2+
Type "help", "copyright", "credits" or "license()" for more information.
3+
>>> df.show(5)
4+
...
5+
... df.agg({"MonthlyIncome": "avg"}).show()
6+
...
7+
... df.agg({"YearsAtCompany": "max"}).show()
8+
...
9+
... df.agg({"YearsAtCompany": "min"}).show()
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
Python 3.12.0 (v3.12.0:0fb18b02c8, Oct 2 2023, 09:45:56) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
2+
Type "help", "copyright", "credits" or "license()" for more information.
3+
>>> !unzip /home/sat3812/Downloads/archive.zip -d /home/sat3812/Downloads/
4+
...
5+
... from pyspark.sql import SparkSession
6+
... spark = SparkSession.builder.appName("Employee Attrition Analysis").getOrCreate()
7+
...
8+
... df = spark.read.csv("/home/sat3812/employee_attrition/WA_Fn-UseC_-HR-Employee-Attrition.csv", header=True, inferSchema=True)
9+
...
10+
... df.printSchema()
11+
... df = df.dropna()
12+
...
13+
... df.describe().show()
14+
...
15+
... df_pandas = df.select("Attrition").toPandas()
16+
...
17+
... import matplotlib.pyplot as plt
18+
...
19+
... attrition_counts = df_pandas["Attrition"].value_counts()
20+
... attrition_counts.plot(kind='bar')
21+
... plt.xlabel('Attrition')
22+
... plt.ylabel('Number of Employees')
23+
... plt.title('Employee Attrition Count')
24+
... plt.show()

0 commit comments

Comments
 (0)