🔍 Website Risk Checker

A Java 17 security tool with GUI and CLI interfaces that analyzes websites for potential security risks. Built for the Cal Hacks Hackathon.

📋 Table of Contents

Overview
Quick Start
How It Works
Project Structure
Usage
Development
Additional Guides

🎯 Overview

Website Risk Checker scans URLs and analyzes them for security red flags commonly found in phishing sites, malware distributors, and other malicious websites. It provides a risk score (0-100+) and detailed findings to help users make informed decisions.

Risk Scoring

Score Range	Rating	Meaning
0-20	🟢 LOW	Appears relatively safe
21-50	🟡 MEDIUM	Some concerning characteristics
51+	🔴 HIGH	Multiple security red flags

Features

🖥️ GUI - Modern JavaFX interface with split buttons (quick test + dropdown for 10 URLs per risk level)
💻 CLI - Full-featured command-line for automation
Advanced Security Checks:
- URL Analysis: HTTP/HTTPS, IP addresses, suspicious TLDs/keywords, phishing indicators, SQL injection patterns, XSS patterns
- Content Analysis: Forms, redirects, obfuscated JS, hidden iframes, malware downloads, SQL injection in forms, XSS in content, unvalidated redirects, credential stuffing indicators
- Reputation Check: Optional Google Safe Browsing API integration
Batch Scanning - Scan multiple URLs from a file
JSON Output - Machine-readable output for automation
Detailed Reports - Comprehensive findings with explanations grouped by category

🚀 Quick Start

# Clone and build
git clone https://github.com/aidan-marra/Cyber-Hacks.git
cd Cyber-Hacks/sitecheck
mvn clean package

# Launch GUI (default - includes test site buttons!)
java -jar target/sitecheck.jar

# Or use CLI
java -jar target/sitecheck.jar scan https://example.com

💡 Tip: The GUI includes test site buttons so you can test functionality without searching for sites online!

🔧 How It Works

High-Level Flow

User Input (URL)
    ↓
Main/MainGUI (Entry Point)
    ↓
SiteScanner (Orchestrator)
    ↓
┌─────────────────────────────────────┐
│  Three Parallel Analysis Phases    │
├─────────────────────────────────────┤
│  1. URL Heuristics                  │
│  2. Content Heuristics              │
│  3. Reputation Check                │
└─────────────────────────────────────┘
    ↓
Aggregate Findings → Calculate Score → Determine Rating
    ↓
ScanResult (Output)

Detailed Workflow

1. Entry Point (`Main.java` / `MainGUI.java`)

CLI: Parses command-line arguments (scan <url>, scan-file <file>, --json, --verbose)
GUI: JavaFX application with URL input field and test site buttons
Routes to SiteScanner for actual scanning

2. Orchestration (`SiteScanner.java`)

Input Validation: Ensures URL has scheme (adds https:// if missing)
URL Parsing: Converts string to URI object, validates host exists
Three-Phase Analysis:
1. URL Heuristics - Analyzes URL structure (no network needed)
2. Content Heuristics - Fetches and analyzes HTML content (requires network)
3. Reputation Check - Queries external API (optional, requires API key)
Error Handling: Each phase has try-catch, failures don't stop entire scan
Result Aggregation: Combines all findings into ScanResult

3. URL Analysis (`UrlHeuristics.java`)

Static analysis of URL structure - No network required, deterministic results:

HTTP vs HTTPS: Checks scheme (http:// = +15 points)
IP Address Detection: Raw IP in hostname (192.168.1.1 = +25 points)
URL Length: Long URLs (>100 chars = +10 points)
Suspicious Characters: @, %, = in path (+10 points)
Suspicious TLDs: .zip, .xyz, .tk (+15 points)
Suspicious Keywords: login, verify, secure in path (+12 each)
Subdomain Spoofing: Excessive subdomains (+12 points)
Brand Spoofing: Patterns like paypal-security.com (+20 points)
Phishing Indicators: Homoglyph attacks, typosquatting (+25 points)
SQL Injection Patterns: Detects SQL keywords in URL (union select, or 1=1, etc.) (+30 points)
XSS Patterns: Detects script injection patterns in URL (<script>, javascript:, etc.) (+30 points)

Output: List<Finding> with category URL, message, and score impact

4. Content Analysis (`ContentHeuristics.java`)

Dynamic analysis of HTML content - Requires fetching the page:

Form Analysis:
- Forms posting to external domains (+20 points)
- Forms using HTTP instead of HTTPS (+18 points)
- SQL injection patterns in forms (+30 points)
Redirect Detection:
- Meta refresh redirects (+20 points)
- JavaScript redirects via location.href (+15 points)
- Unvalidated redirects with external domains (+25 points)
Script Analysis:
- Obfuscated scripts using eval() (+25 points)
- String.fromCharCode() obfuscation (+25 points)
- unescape() obfuscation (+25 points)
- document.write() usage (+8 points)
- XSS patterns in content (+30 points)
Malware Detection: Suspicious file downloads (.exe, .bat, .scr, etc.) (+35 points)
Iframe Detection: Hidden iframes (+20 points)
Password Fields: Password inputs on HTTP pages (+25 points)
Credential Stuffing: Weak login forms (GET method, HTTP, no CSRF) (+20 points)
Link Analysis: High ratio of external links (>80% = +8 points)

Output: List<Finding> with category CONTENT, message, and score impact

5. Reputation Check (`ReputationClient.java`)

External API integration - Optional, requires API key:

Default: Stub implementation returns UNKNOWN (0 points)
Optional: Google Safe Browsing API integration
- Set GOOGLE_SAFE_BROWSING_API_KEY environment variable
- Queries Google's database of known malicious sites
- Returns SAFE, MALICIOUS, or UNKNOWN

Output: ReputationResult converted to Finding with category REPUTATION

6. Score Calculation (`ScoreCalculator.java`)

Sum all findings: Adds up scoreImpact from all findings
Determine rating:
- Score 0-20 → LOW
- Score 21-50 → MEDIUM
- Score 51+ → HIGH

7. Result Formatting (`ScanResult.java`)

CLI Output: Formatted text with banner, findings list, score, rating
JSON Output: Machine-readable JSON (when --json flag used)
GUI Display: Visual cards with color-coded ratings and findings

Data Flow

URL String
    ↓
URI Object (parsed)
    ↓
┌─────────────────┐
│ URL Heuristics  │ → List<Finding>
└─────────────────┘
    ↓
┌─────────────────┐
│ Fetch HTML      │ → Document (jsoup)
└─────────────────┘
    ↓
┌─────────────────┐
│ Content Heur.   │ → List<Finding>
└─────────────────┘
    ↓
┌─────────────────┐
│ Reputation API  │ → ReputationResult → Finding
└─────────────────┘
    ↓
All Findings → ScoreCalculator → Rating
    ↓
ScanResult (url, score, rating, findings)

📁 Project Structure

sitecheck/
├── pom.xml                          # Maven configuration & dependencies
├── README.md                         # This file
├── run-gui.bat                       # Windows GUI launcher script
│
├── src/
│   ├── main/java/com/team/sitecheck/
│   │   ├── Main.java                 # CLI entry point (argument parsing)
│   │   ├── MainGUI.java              # GUI application (JavaFX)
│   │   ├── SiteScanner.java          # Core orchestration logic
│   │   ├── UrlHeuristics.java        # URL structure analysis
│   │   ├── ContentHeuristics.java    # HTML content analysis
│   │   ├── ReputationClient.java     # Reputation API interface
│   │   ├── SafeBrowsingClient.java   # Google Safe Browsing implementation
│   │   ├── Finding.java              # Individual finding model
│   │   ├── ScanResult.java           # Complete scan result model
│   │   ├── ReputationResult.java     # Reputation check result
│   │   └── ScoreCalculator.java      # Score → Rating conversion
│   │
│   └── test/java/com/team/sitecheck/ # Comprehensive test suite
│       ├── GUITest.java              # GUI test cases (LOW/MEDIUM/HIGH URLs)
│       ├── UrlHeuristicsTest.java    # URL analysis tests
│       ├── ContentHeuristicsTest.java # Content analysis tests
│       ├── SiteScannerTest.java      # Integration tests
│       └── ... (8 test classes total)
│
├── target/                           # Build output (generated)
│   └── sitecheck.jar                 # Executable JAR with dependencies
│
└── guides/                           # Detailed documentation
    ├── QUICK_START.md                # Quick reference
    ├── GUI_GUIDE.md                  # GUI usage guide
    ├── TESTING.md                    # Testing guide
    ├── INTELLIJ_SETUP.md             # IntelliJ setup
    └── ... (10+ guides)

Module Responsibilities

Module	Purpose	Key Methods
Main	CLI entry point	`main()`, argument parsing, output formatting
MainGUI	GUI entry point	`start()`, UI components, result display
SiteScanner	Orchestration	`scan()`, `scanMultiple()`, coordinates all analysis
UrlHeuristics	URL analysis	`evaluate(URI)` - static URL pattern checks
ContentHeuristics	HTML analysis	`evaluate(Document, URI)` - dynamic content checks
ReputationClient	API interface	`check(String)` - external reputation lookup
ScoreCalculator	Rating logic	`getRating(int)` - converts score to rating
Finding	Data model	Individual security finding with category/score
ScanResult	Result model	Aggregates findings, score, rating, error info

💻 Usage

GUI Mode (Recommended)

java -jar target/sitecheck.jar

Features:

Split Buttons: Main button for quick test (random URL) + dropdown arrow for selecting specific URLs (10 URLs per risk level)
URL input field with "Scan URL" button
Visual results with color-coded ratings and explanations
Findings list with detailed information and contextual explanations
Detailed output tab with comprehensive grouped reports (more detailed than findings list)

CLI Mode

# Scan single URL
java -jar target/sitecheck.jar scan https://example.com

# Verbose output (shows analysis steps)
java -jar target/sitecheck.jar scan https://example.com --verbose

# JSON output (for automation)
java -jar target/sitecheck.jar scan https://example.com --json

# Batch scan from file
java -jar target/sitecheck.jar scan-file urls.txt

# Help
java -jar target/sitecheck.jar --help

Example Output

Overall risk rating: 🔴 HIGH
Score: 72

Findings:
  [URL] Uses HTTP instead of HTTPS (+15)
  [URL] Contains raw IP address in hostname (+25)
  [URL] Path contains suspicious keyword: login (+12)
  [URL] Path contains suspicious keyword: verify (+12)
  [URL] Possible SQL injection pattern detected: or 1=1 (+30)
  [CONTENT] Suspicious file download detected: .exe (+35)
  [CONTENT] Unvalidated redirect detected: http://evil.com (+25)
  [REPUTATION] Unknown (not in database) (+0)

Note: Terminal warnings (JavaFX, Maven) are normal and not security issues. See guides/TERMINAL_WARNINGS.md for details.

👨‍💻 Development

Prerequisites

Java 17+ (Download)
Maven 3.8+ (Download) - See guides/INSTALL_MAVEN.md
Git (Download)

Setup with IntelliJ (Recommended)

Download IntelliJ IDEA Community (Free)
Open Project: File → Open → Select sitecheck/pom.xml
Wait for Maven Sync - IntelliJ handles dependencies automatically
Run GUI: Right-click MainGUI.java → Run 'MainGUI.main()'

Advantages: No Maven installation needed, one-click run, built-in debugger

Command Line Setup

# Build
mvn clean package

# Run tests
mvn test

# Run GUI
mvn exec:java -Dexec.mainClass="com.team.sitecheck.MainGUI"

# Run CLI
mvn exec:java -Dexec.mainClass="com.team.sitecheck.Main" -Dexec.args="scan https://example.com"

Testing

GUI Testing:

Use test site buttons in GUI (🟢 LOW, 🟡 MEDIUM, 🔴 HIGH)
Or test manually with known URLs

Unit Tests:

mvn test  # Runs all tests including GUITest.java

Git Workflow

main (protected) ────────────────►
              │         ▲
              │         │ PR
              ▼         │
develop ──────┼─────────┘
              │
feature/name ─┘

Making Changes:

git checkout develop && git pull origin develop
git checkout -b feature/your-feature
Make changes, test
git add . && git commit -m "Description"
git push origin feature/your-feature
Create Pull Request (base: develop)

📚 Additional Guides

See the guides/ folder for detailed documentation:

RUN_GUI.md - How to run the GUI
TERMINAL_WARNINGS.md - Explains terminal warnings (not security issues)
MAVEN_ERRORS_EXPLAINED.md - Why Maven errors occur and permanent fix
RESTART_TESTING.md - How to restart local project testing
TESTING.md - Comprehensive testing guide
INTELLIJ_SETUP.md - IntelliJ setup instructions
GUI_CRASH_FIX.md - GUI troubleshooting
And more in the guides/ folder...

📄 License

MIT License

Acknowledgments

jsoup - HTML parsing library
Jackson - JSON processing
Google Safe Browsing - URL reputation API

Built with ❤️ for CyberHacks

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
guides		guides
src		src
.gitignore		.gitignore
README.md		README.md
fix-maven-session.ps1		fix-maven-session.ps1
pom.xml		pom.xml
run-gui.bat		run-gui.bat
setup-maven.ps1		setup-maven.ps1

Folders and files

Latest commit

History

Repository files navigation

🔍 Website Risk Checker

📋 Table of Contents

🎯 Overview

Risk Scoring

Features

🚀 Quick Start

🔧 How It Works

High-Level Flow

Detailed Workflow

1. Entry Point (Main.java / MainGUI.java)

2. Orchestration (SiteScanner.java)

3. URL Analysis (UrlHeuristics.java)

4. Content Analysis (ContentHeuristics.java)

5. Reputation Check (ReputationClient.java)

6. Score Calculation (ScoreCalculator.java)

7. Result Formatting (ScanResult.java)

Data Flow

📁 Project Structure

Module Responsibilities

💻 Usage

GUI Mode (Recommended)

CLI Mode

Example Output

👨‍💻 Development

Prerequisites

Setup with IntelliJ (Recommended)

Command Line Setup

Testing

Git Workflow

📚 Additional Guides

📄 License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Entry Point (`Main.java` / `MainGUI.java`)

2. Orchestration (`SiteScanner.java`)

3. URL Analysis (`UrlHeuristics.java`)

4. Content Analysis (`ContentHeuristics.java`)

5. Reputation Check (`ReputationClient.java`)

6. Score Calculation (`ScoreCalculator.java`)

7. Result Formatting (`ScanResult.java`)

Packages