LLMs for Data Science

Repository for the "LLMs for Data Science" course by Bruno Gonçalves (Data For Science, Inc).

Large Language Models (LLMs) are powerful tools that put state-of-the-art AI capabilities at the tip of our fingers. They can process large amounts of data, understand nuance and context, and perform complex tasks at our request. Over the past few years, LLMs have multiplied as have the tools specially built to leverage their capabilities.

In this course, you will learn how to use large language models to perform data science tasks such as summarization, translation, named entity recognition, audio generation, and data processing. We’ll explore the possibilities afforded by the tools and APIs developed by OpenAI, Hugging Face, LangChain, and Pandas AI and how best to apply them to our data science work.

Environment Setup

This project manages dependencies using uv (recommended) or standard pip.

Prerequisites

Python 3.13 or higher (as specified in pyproject.toml)

Option 1: Using `uv` (Recommended)

This repository includes a uv.lock file for reproducible environments.

Install uv: Follow instructions at docs.astral.sh/uv.
Sync dependencies:
```
uv sync
```
Run Jupyter:
```
uv run jupyter notebook
```

Option 2: Using `pip`

You can install the dependencies directly from the pyproject.toml file.

pip install .

Data

The data/ directory contains datasets and media files used in the notebooks, including:

Datasets: Apple-Twitter-Sentiment-DFE.csv, Northwind_small.sqlite
Audio: gettysburg10.wav, pratchett.mp3
Scripts: EpiModel.py
Images: Logo and other assets.

Author

Bruno Gonçalves

Data For Science, Inc.

Web: www.data4sci.com
Twitter/X: @bgoncalves
LinkedIn: @bmtgoncalves
Email: [email protected]
Schedule a Call: https://data4sci.com/call

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
slides		slides
.gitignore		.gitignore
1. Generative AI.ipynb		1. Generative AI.ipynb
2. Prompt Engineering.ipynb		2. Prompt Engineering.ipynb
3. NLP with HuggingFace.ipynb		3. NLP with HuggingFace.ipynb
4. Whisper.ipynb		4. Whisper.ipynb
LICENSE		LICENSE
README.md		README.md
d4sci.mplstyle		d4sci.mplstyle
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMs for Data Science

Contents

1. Generative AI for Data Science

2. Prompt Engineering

3. NLP with HuggingFace

4. Text to Speech with Open AI

Environment Setup

Prerequisites

Option 1: Using `uv` (Recommended)

Option 2: Using `pip`

Data

Author

Bruno Gonçalves

Data For Science, Inc.

About

Uh oh!

Releases

Packages

Languages

License

DataForScience/LLMsForDataScience

Folders and files

Latest commit

History

Repository files navigation

LLMs for Data Science

Contents

1. Generative AI for Data Science

2. Prompt Engineering

3. NLP with HuggingFace

4. Text to Speech with Open AI

Environment Setup

Prerequisites

Option 1: Using uv (Recommended)

Option 2: Using pip

Data

Author

Bruno Gonçalves

Data For Science, Inc.

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Option 1: Using `uv` (Recommended)

Option 2: Using `pip`

Packages