Statistical Analysis of Website Performance Metrics

Cloud Infrastructure Optimization using Python & Hypothesis Testing

๐Ÿ“Œ Project Objective

To analyze website performance metrics using Python libraries like Pandas, Matplotlib, and SciPy. This project identifies patterns in website speed, throughput, response time, and category performance to support real cloud optimization decisions such as auto scaling, monitoring, SLA management, and user experience enhancement.

๐Ÿ“‚ Dataset Details

Total Rows Total Columns File Name File Type
734 9 labeled_dataset.csv CSV

Column Names
Sr No
website_url
Category
Page Size (KB)
Load Time(s)
Response Time(s)
Throughput
Performance_Label
User Response

โš™๏ธ Python Workflow Used

๐Ÿ“„ STEP 1: Load Dataset

Dataset Loaded Successfully First 5 Rows Displayed Columns: ['Sr No','website_url','Category','Page Size (KB)', 'Load Time(s)','Response Time(s)', 'Throughput','Performance_Label','User Response'] Shape: (734, 9)

๐Ÿงน STEP 2: Clean Dataset

Before Cleaning : (734, 9) drop_duplicates() dropna() After Cleaning : (733, 9) Performance_Label standardized Category column cleaned

โœ” Dataset cleaned successfully.

๐Ÿ“Š STEP 3: Descriptive Statistics

Metric Load Time(s) Response Time(s) Throughput
Count 733 733 733
Mean 1.786 1.013 317.69
Median 1.38 0.599 97.30
Max 7.94 7.419 15227.28

โœ” Statistical summary generated.

๐Ÿ“ˆ STEP 4: Visualizations

1. Load Time Distribution



2. Average Throughput by Performance Label



3. Performance Label Distribution

โœ” Graphs generated successfully.

๐Ÿงช STEP 5: Hypothesis Test 1

T-Test: Fast vs Slow Load Time p-value = 0.000632 Result: Significant difference in Load Time between Fast and Slow websites.

Interpretation: Website speed categories differ significantly. This helps cloud teams identify slow services and optimize resources.

๐Ÿงช STEP 6: Hypothesis Test 2

Chi-Square Test: Category vs Performance Label p-value = 0.006116 Result: Category and Performance Label are significantly related.

Interpretation: Website type (E-commerce, Travel, Media etc.) influences performance behavior.

โ˜๏ธ Real Cloud Computing Use Cases

Metric Cloud Use Case
Load Time Auto Scaling Decisions
Response Time Latency Monitoring
Throughput Load Balancer Analysis
Performance_Label SLA Monitoring
Category Capacity Planning

โœ… Final Conclusion

This project successfully performed end-to-end statistical analysis on a website performance dataset using Python. The dataset was cleaned, visualized, and tested using T-Test, ANOVA, and Chi-Square methods.