# clean-chatgpt-html.py
A Python script that cleans saved ChatGPT HTML pages by removing unnecessary elements like external references, scripts, and extraneous HTML. It outputs a stand-alone HTML file that does not rely on any external resources (images, CSS, or JavaScript). The output file can be easily viewed offline.
## Table of Contents
- [Overview](#overview)
- [Installation](#installation)
- [Usage](#usage)
- [Features](#features)
- [Dependencies](#dependencies)
- [License](#license)
- [Contact](#contact)
## Overview
The `clean-chatgpt-html.py` script is designed to clean up saved HTML files of ChatGPT outputs by removing unwanted scripts, external references, and tags. The resulting HTML file is simplified, self-contained, and can be viewed offline without dependencies. This is particularly useful for saving and sharing ChatGPT interactions in a clean format, making them portable and easy to read.
## Installation
### Prerequisites
To run this script, you will need the following Python packages:
- `BeautifulSoup` (from `bs4`): A library for parsing HTML content.
- `lxml`: Required for parsing HTML content efficiently (it’s the default parser for BeautifulSoup).
### Step 1: Install Dependencies
You can install the required dependencies using `pip`:
```bash
pip install beautifulsoup4 lxmlClone the repository containing this script or download the Python file directly.
git clone https://github.com/yourusername/clean-chatgpt-html.gitMake sure you are running this script in an environment where the dependencies are installed (either a virtual environment or globally).
To use the script, follow these steps:
-
Run the Python script:
python clean-chatgpt-html.py
-
The script will prompt you to enter the full path of the HTML file you want to process.
Enter the full path of the HTML file to be processed: /path/to/chatgpt_output.html
-
The script will clean up the HTML file, remove unnecessary content, and create a new file with the suffix
-cleanadded to the original file name.Example:
- Input file:
chatgpt_output.html - Output file:
chatgpt_output-clean.html
- Input file:
-
The cleaned file will be saved in the same directory as the original file.
Enter the full path of the HTML file to be processed: /home/user/chatgpt_output.html
File is opened.
Processed HTML has been saved to /home/user/chatgpt_output-clean.html- Removes Unnecessary Tags: Automatically removes
<script>,<iframe>, and unwanted<div>or<button>elements. - Self-contained HTML: Cleans the page so it doesn't require any external images, CSS, or JavaScript files.
- Title Injection: Adds a custom title to the
<head>section based on the file name. - Invert Color Button: Adds a button to toggle between light and dark modes for better readability.
- Stand-alone Output: The output HTML file is fully self-contained and can be opened offline without relying on external resources.
This script requires the following Python libraries:
- BeautifulSoup4: For parsing HTML and making modifications.
- lxml: Efficient HTML parser for BeautifulSoup.
Install them via pip:
pip install beautifulsoup4 lxmlThis script is licensed under the MIT License. Feel free to modify and redistribute it under the terms of the license.
For questions, feedback, or suggestions, please reach out to the author:
- Author: Yahya Hamidaddin
- Email: [email protected]