This repository contains a comprehensive analysis pipeline for investigating the environmental weathering and metabolic degradation of polystyrene (PS) nanoparticles using non-targeted mass spectrometry. The study examines three polymer types (PET, PP, PS) across multiple weathering conditions.
- Polymers: Polyethylene terephthalate (PET), Polypropylene (PP), Polystyrene (PS)
- Weathering Conditions:
- RAW: Raw polymer material
- SEP C: Nanoparticles without weathering (control)
- SEP: Accelerated weathering (UV + mechanical stress)
- Semi: Environmental weathering (natural conditions)
- Samples: 12 total (3 polymers × 4 conditions)
- Analysis: Non-targeted LC-MS/MS
- Features: 21,928 molecular features detected
- Test Sample: PS SEP (weathered polystyrene nanoparticles)
- Control Sample: Caffeine (positive control for microsomal activity)
- Incubation System: Liver microsomes + NADPH + buffers (MgCl₂, Tris-HCl)
- Time Points: 0, 2, 15, 30, 90 minutes
- Blank: No sample control
- Features: 17,051 molecular features detected
├── README.md # This file
├── RESULTS_SUMMARY.md # Key findings and conclusions
│
├── Data Processing Scripts/
│ ├── DataProcess1.py # Initial data processing and feature filtering
│ ├── meta_processing_find_features.py # Feature annotation and MS/MS matching
│ └── batch_correction.py # Batch effect correction (ComBat)
│
├── Weathering Analysis/
│ ├── pca_analysis.py # PCA visualization (original data)
│ ├── clustermap_analysis.py # Hierarchical clustering heatmaps
│ ├── weathering_analysis.py # Comprehensive weathering trajectory analysis
│ └── distance_visualizations.py # Alternative distance visualization methods
│
├── Metabolic Degradation Analysis/
│ └── metabolism_analysis.py # Time-course analysis for microsomal incubation
│
├── Input Data/
│ ├── alignment_mini.csv # Feature table with m/z, RT, intensities
│ ├── Batches.xlsx # Batch assignment for correction
│ └── alignment_metadata.csv # Sample metadata
│
└── Output Files/
├── Weathering Results/
│ ├── corrected_combat_style.csv
│ ├── weathering_pca_trajectories.png
│ ├── weathering_distance_analysis.png
│ ├── weathering_distances.csv
│ ├── weathering_feature_changes.csv
│ └── distance_*.png (various visualizations)
│
└── Metabolism Results/
├── metabolism_pca_timecourse.png
├── metabolism_increasing.png
├── metabolism_decreasing.png
├── metabolism_comparison.png
├── metabolism_full_results.csv
└── metabolism_ps_specific.csv
Purpose: Process raw MZmine feature tables and filter for significant features
Input:
alignment_full_feature_table.csv- Raw MZmine outputalignment_metadata.csv- Sample information
Output:
alignment_mini.csv- Processed feature tablealignment_mini_stats5.csv- Statistical resultsPS_chemicals_with_time_trend.csv- Time-dependent featuresPS_chemicals.csv- PS-specific chemicals
Key Steps:
- Extract area and height measurements
- Calculate blank-corrected intensities
- Perform linear regression vs time
- Identify PS-specific features
- Filter by statistical significance (p < 0.05)
Usage:
python3 DataProcess1.pyPurpose: Link weathering results with MS/MS spectral data
Input:
alignment_mini.csvweathering_feature_changes.csv
Output:
weathering_features_with_msms.csv- Annotated features with MS/MS scans
Usage:
python3 meta_processing_find_features.pyPurpose: Remove technical variation while preserving biological differences
Input:
alignment_mini.csvBatches.xlsx
Output:
corrected_combat_style.csv- ComBat-corrected datacorrected_*.csv- Results from 4 correction methods- Comparison visualizations
Methods Compared:
- Median Normalization
- Mean Centering
- ComBat-style (SELECTED - standard for metabolomics)
- Ratio-based Normalization
Usage:
python3 batch_correction.pyKey Finding: ComBat-style correction selected despite higher CV (194.25%) because it's the publication-standard method for metabolomics data.
Purpose: Initial exploratory visualization of sample relationships
Input: corrected_combat_style.csv
Output:
pca_scree.png- Variance explained by each PCpca_scores.png- 2D scores plot (PC1 vs PC2)pca_scores_3d.png- 3D scores plotpca_loadings.png- Top contributing featurespca_results.csv- PC coordinates for all samples
Key Statistics:
- PC1: 18.3% variance
- PC2: 16.9% variance
- PC3: 9.0% variance
- Cumulative (PC1-PC3): 44.2%
Usage:
python3 pca_analysis.pyPurpose: Visualize sample relationships through clustering
Input: corrected_combat_style.csv
Output:
clustermap_combat.png- Feature × sample heatmapclustermap_combat_correlation.png- Sample correlation matrixclustermap_combat_distance.png- Sample distance matrixsample_correlation_matrix.csvsample_distance_matrix.csv
Methods:
- Top 1,000 most variable features selected
- Log₁₀ + Z-score normalization
- Average linkage, Euclidean distance
- Dual color annotations (polymer type + weathering category)
Usage:
python3 clustermap_analysis.pyPurpose: Quantify weathering trajectories and compare polymer degradation
Input: corrected_combat_style.csv
Output:
weathering_pca_trajectories.png- Trajectories with arrowsweathering_distance_analysis.png- 4-panel distance comparisonweathering_distances.csv- All pairwise distancesweathering_feature_changes.csv- Top 50 features per polymer
Key Analyses:
- Weathering trajectory PCA - Shows RAW → SEP C → SEP → Semi progression
- Euclidean distance calculations - In original 21,928D feature space
- Feature-level fold changes - Top changing features (RAW to Semi)
Research Questions Answered:
-
How do polymers degrade from RAW to Semi?
- Total weathering distance (RAW → Semi):
- PS: 189.75 (most altered)
- PP: 160.11 (moderate)
- PET: 138.82 (most stable)
- Total weathering distance (RAW → Semi):
-
Does accelerated weathering (SEP) replicate environmental weathering (Semi)?
- Across all polymers, SEP is 20-29% closer to SEP C than to Semi
- Conclusion: Accelerated weathering does NOT fully replicate environmental weathering
Usage:
python3 weathering_analysis.pyPurpose: Alternative visualizations of original feature space distances
Input: corrected_combat_style.csv
Output:
distance_heatmap.png- Color-coded distance matrixdistance_dendrogram.png- Hierarchical clustering treedistance_trajectories.png- Weathering path distancesdistance_mds.png- 2D projection preserving distancesdistance_network.png- Network graph
Key Difference from PCA:
- MDS (Multidimensional Scaling): Optimizes distance preservation in 2D
- PCA: Optimizes variance explanation
- MDS better represents actual 21,928D distances
Usage:
python3 distance_visualizations.pyPurpose: Evaluate metabolic degradation of PS nanoparticles
Input: alignment_mini.csv
Output:
metabolism_pca_timecourse.png- Time trajectory (T0 → T90)metabolism_increasing.png- Features increasing (metabolites)metabolism_decreasing.png- Features decreasing (degradation)metabolism_comparison.png- PS vs Caffeine comparisonmetabolism_full_results.csv- Complete results (17,051 features)metabolism_ps_specific.csv- PS-specific features (130 features)
Key Analyses:
- Time trend analysis - Linear regression for each feature vs time
- PS-specific feature identification - Filters out blank and caffeine signals
- Metabolite formation - Features increasing over time
- Parent compound degradation - Features decreasing over time
- Microsomal activity validation - Caffeine control assessment
Key Results:
- Total features: 17,051
- PS-specific features (strict): 130 with significant time trends
- PS increasing features: 1,557 (potential metabolites)
- PS decreasing features: 162 (parent compounds metabolized)
- Caffeine control: 2,921 significant features (✓ microsomes active)
Interpretation:
- PS nanoparticles are metabolically degraded by liver microsomes
- More metabolite formation than degradation (1,557 vs 162)
- PS degradation differs from caffeine metabolism
- Microsomal system is metabolically active (validated by caffeine)
Usage:
python3 metabolism_analysis.pyOriginal Feature Space (21,928D):
d(A,B) = √[Σᵢ₌₁²¹⁹²⁸ (Aᵢ - Bᵢ)²]
- Uses ALL features
- Captures complete chemical differences
- Used for weathering distance analysis
PCA Space (2D):
d(A,B) = √[(PC1ₐ - PC1ᵦ)² + (PC2ₐ - PC2ᵦ)²]
- Uses only PC1 and PC2 (~35% variance)
- For visualization only
- Does NOT capture full chemical differences
Standard Pipeline:
- Replace zeros with min_value/2
- Log₁₀ transformation:
log₁₀(x + 1) - Z-score standardization:
(x - μ) / σ
Model:
Y*ᵢⱼ = (Yᵢⱼ - α - Xβ - γᵢ)/δᵢ + α + Xβ
Where:
- Y*ᵢⱼ = Corrected intensity
- γᵢ = Additive batch effect
- δᵢ = Multiplicative batch effect
- α, β = Overall effects
- Python 3.8 or higher
pip install pandas numpy matplotlib seaborn scikit-learn scipyDetailed Requirements:
- pandas >= 1.3.0
- numpy >= 1.21.0
- matplotlib >= 3.4.0
- seaborn >= 0.11.0
- scikit-learn >= 0.24.0
- scipy >= 1.7.0
# Create virtual environment (optional but recommended)
python3 -m venv ms_analysis_env
source ms_analysis_env/bin/activate # On Windows: ms_analysis_env\Scripts\activate
# Install requirements
pip install -r requirements.txt# 1. Data preprocessing
python3 DataProcess1.py
python3 batch_correction.py
python3 meta_processing_find_features.py
# 2. Weathering analysis
python3 pca_analysis.py
python3 clustermap_analysis.py
python3 weathering_analysis.py
python3 distance_visualizations.py
# 3. Metabolic degradation analysis
python3 metabolism_analysis.py# Generate key weathering results
python3 batch_correction.py
python3 weathering_analysis.py
# Generate metabolic degradation results
python3 metabolism_analysis.py- PET: 138.82 (most chemically stable)
- PP: 160.11 (intermediate stability)
- PS: 189.75 (most chemically altered)
Interpretation: PS undergoes the greatest chemical transformation during weathering, consistent with literature showing PS is more susceptible to UV degradation than PET.
- All polymers: SEP is 20-29% closer to SEP C than to Semi
- Interpretation: Accelerated weathering protocols do not fully replicate environmental weathering chemistry
- 130 features with significant time-dependent changes (p < 0.05)
- Strict criteria: PS > Blank AND PS > Caffeine AND significant trend
- 1,557 features increasing over 90 minutes
- Suggests PS breaks down into multiple metabolic products
- 162 features decreasing over 90 minutes
- Represents original PS components being metabolized
From Literature Search:
- PET degradation time: ~1,179 years (most persistent)
- PS degradation time: ~900 years
- PP degradation time: ~0.27 years (with UV exposure)
Note: PP's rapid degradation in pure form vs. our intermediate results (PP between PS and PET) is explained by commercial stabilizers in our PP samples. This makes our results more representative of real-world commercial plastics.
Our results align with literature:
- PET most resistant to weathering
- PS susceptible to UV-induced chemical changes
- Commercial PP (with stabilizers) shows intermediate behavior
Issue 1: "FileNotFoundError: alignment_mini.csv"
- Solution: Ensure input files are in the current directory or provide full path
Issue 2: Memory errors with large datasets
- Solution: Reduce
top_nparameter in filtering functions - Use
filter_features()with lowertop_nvalue (e.g., 500 instead of 1000)
Issue 3: PCA plots appear crowded
- Solution: Adjust
figsizeparameter in plotting functions - Increase DPI for better resolution:
dpi=300→dpi=600
Issue 4: ComBat correction warnings
- Solution: These are usually harmless; check that batch assignments are correct
- Verify
Batches.xlsxhas correct sample-to-batch mapping
- Initial release
- Complete weathering analysis pipeline
- Metabolic degradation analysis
- Comprehensive visualization suite