A Python script that automatically retrieves and stores landlord registration data from the Landlord Registration Scotland website.
- Scrapes Addresses: Fetches all registered addresses for a given postcode from the Landlord Registration Scotland website.
- Extracts Address Details: Collects detailed information about each registered address, such as the application owner, joint owners, agent details, local authority, and contact address.
- Stores Data in JSON: Saves the collected data in a JSON file for easy access and analysis.
- Multithreaded Scraping: Utilizes a thread pool to scrape multiple postcodes concurrently, improving the overall performance.
- Python 3.x
curl_cffilibrary for making HTTP requestsfake_headersfor generating realistic user agentsBeautifulSoupfor parsing HTMLurllib.parsefor URL encodingloggingfor error handling and loggingthreadingfor multithreaded scrapingdatetimefor timestamp generationcontextlibfor concurrency managementsqlite3for database operations
- Clone the repository:
git clone https://github.com/Tentoxa/LandlordScraper cd LandlordScraper - Install required Python packages:
pip3 install -r requirements.txt
- Prepare the
postcodes.txtfile:- Create a text file named
postcodes.txtin the project directory. - Add one or more postcodes (e.g.,
EH1 1AA) to the file, each on a new line.
- Create a text file named
- Run the script:
python3 main.py
The database is named address_data.db and contains a single table called addresses. This table is designed to store essential information related to property addresses.
| Column Name | Data Type | Description |
|---|---|---|
id |
Integer (Auto-increment) | Unique identifier for each address record |
postcode |
Text | The postcode associated with the address |
application_by |
Text | The applicant for the landlord registration |
joint_owners |
Text | The joint owners of the property |
agent_details |
Text | The agent details for the property |
local_authority |
Text | The local authority responsible for the property |
contact_address |
Text | The contact address for the property |
address |
Text | The full address of the property |
created_at |
Timestamp | The timestamp when the address record was created |
This project is licensed under the MIT License. See the License file for more information.