Swansea Harriers Analytics Platform
Comprehensive athlete performance analytics system integrating Power of 10 API data with custom Python data processing, SQL database architecture, and intuitive management interfaces for coaches and team managers.
TECHNOLOGY STACK
Project Overview
Challenge
Swansea Harriers needed a comprehensive system to track athlete performance, analyze training progress, and provide coaches with actionable insights. The existing manual processes were time-consuming and limited analytical capabilities.
Solution Architecture
- • Power of 10 API integration for official race results
- • Python data cleaning and transformation pipeline
- • SQLite database for performance data storage
- • Flask web application with coach/manager interfaces
- • Automated data synchronization and validation
System Capabilities
Data Integration & Processing
The system integrates with the Power of 10 rankings database, which contains official race results for UK athletics. Raw data requires significant cleaning and normalization before it can be used for meaningful analysis.
Power of 10 API Integration
Pythonimport requests
import pandas as pd
from datetime import datetime, timedelta
class PowerOf10Connector:
def __init__(self, api_base="https://www.thepowerof10.info/api/"):
self.api_base = api_base
self.session = requests.Session()
def get_athlete_results(self, athlete_id, years_back=5):
"""Fetch athlete results with data validation"""
end_date = datetime.now()
start_date = end_date - timedelta(days=365 * years_back)
params = {
'athlete_id': athlete_id,
'start_date': start_date.strftime('%Y-%m-%d'),
'end_date': end_date.strftime('%Y-%m-%d')
}
try:
response = self.session.get(
f"{self.api_base}athlete/results",
params=params,
timeout=30
)
response.raise_for_status()
raw_data = response.json()
return self._clean_results_data(raw_data)
except requests.exceptions.RequestException as e:
self.logger.error(f"API request failed: {e}")
return []
def _clean_results_data(self, raw_data):
"""Clean and normalize race result data"""
cleaned_results = []
for result in raw_data.get('results', []):
# Normalize performance times
performance = self._parse_performance(result.get('performance'))
# Validate event classification
event = self._standardize_event_name(result.get('event'))
# Extract venue and competition info
venue_info = self._parse_venue_data(result.get('venue'))
if performance and event:
cleaned_results.append({
'athlete_id': result.get('athlete_id'),
'performance_seconds': performance,
'event': event,
'date': result.get('date'),
'venue': venue_info.get('name'),
'competition': result.get('competition'),
'wind_speed': result.get('wind'),
'position': result.get('position')
})
return cleaned_resultsData Processing Pipeline
Data Cleaning Challenges
Performance Format Variations
Race times in different formats (MM:SS.ss, H:MM:SS, etc.) requiring standardization to seconds.
Event Name Inconsistencies
Multiple naming conventions for the same event across different competitions and venues.
Data Quality Issues
Missing wind readings, incomplete venue information, and duplicate result entries.
Performance Time Normalization
Pythondef _parse_performance(self, perf_string):
"""Convert various time formats to seconds"""
if not perf_string:
return None
# Remove common suffixes and clean
clean_perf = perf_string.strip().upper()
clean_perf = re.sub(r'[^0-9:.]', '', clean_perf)
try:
# Handle different time formats
if ':' in clean_perf:
parts = clean_perf.split(':')
if len(parts) == 2: # MM:SS.ss
minutes = float(parts[0])
seconds = float(parts[1])
return minutes * 60 + seconds
elif len(parts) == 3: # H:MM:SS.ss
hours = float(parts[0])
minutes = float(parts[1])
seconds = float(parts[2])
return hours * 3600 + minutes * 60 + seconds
else: # Pure seconds
return float(clean_perf)
except (ValueError, IndexError):
self.logger.warning(f"Could not parse performance: {perf_string}")
return None
return NoneDatabase Architecture
Schema Design
- • Athletes Table: Personal details, categories, club membership
- • Results Table: Performance data with event classification
- • Events Table: Standardized event definitions and categories
- • Competitions Table: Meet information and venue details
- • Personal Bests: Calculated PB tracking by event
- • Season Bests: Annual performance tracking
Database Schema (SQLite)
SQL-- Athletes table with comprehensive tracking
CREATE TABLE athletes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
po10_id VARCHAR(20) UNIQUE,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
date_of_birth DATE,
gender VARCHAR(1) CHECK (gender IN ('M', 'F')),
category VARCHAR(10), -- U17, U20, Senior, etc.
club_membership_start DATE,
is_active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Results with comprehensive performance data
CREATE TABLE results (
id INTEGER PRIMARY KEY AUTOINCREMENT,
athlete_id INTEGER REFERENCES athletes(id),
event_id INTEGER REFERENCES events(id),
competition_id INTEGER REFERENCES competitions(id),
performance_seconds REAL NOT NULL,
wind_speed REAL,
position INTEGER,
is_personal_best BOOLEAN DEFAULT FALSE,
is_season_best BOOLEAN DEFAULT FALSE,
result_date DATE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Indexed for performance analysis queries
CREATE INDEX idx_results_athlete_event ON results(athlete_id, event_id);
CREATE INDEX idx_results_date ON results(result_date);
CREATE INDEX idx_results_performance ON results(performance_seconds);Management Interface
The Flask web application provides coaches and team managers with intuitive interfaces for athlete monitoring, performance analysis, and training planning. The system automatically calculates personal bests, season bests, and performance trends.
Key Features
Athlete Profiles
Comprehensive athlete pages showing PBs, SBs, recent results, and performance progression charts.
Team Analysis
Group performance summaries, team rankings, and comparative analysis across age categories.
Performance Trends
Visual progression tracking with seasonal comparisons and improvement identification.
Competition Planning
Entry standard tracking, qualification monitoring, and target time setting tools.
Technical Implementation
Architecture Decisions
SQLite Database
Chosen for simplicity and portability while providing full SQL capabilities for complex analytical queries.
Flask Framework
Lightweight web framework ideal for internal tools with rapid development and easy deployment.
pandas Processing
Powerful data manipulation capabilities for cleaning, transforming, and analyzing performance data.
System Benefits
- • Automated data synchronization reduces manual entry
- • Historical trend analysis identifies training patterns
- • Performance benchmarking across age groups
- • Competition readiness assessment tools
- • Injury risk indicators through performance drops
- • Training load optimization recommendations
Performance Analytics
The system provides advanced analytics capabilities that help coaches make data-driven decisions about training programs, competition scheduling, and athlete development strategies.
Performance Analysis Functions
Pythondef calculate_performance_trends(athlete_id, event, months=12):
"""Calculate performance trends and improvement rates"""
query = """
SELECT result_date, performance_seconds
FROM results r
JOIN athletes a ON r.athlete_id = a.id
JOIN events e ON r.event_id = e.id
WHERE a.id = ? AND e.name = ?
AND result_date >= date('now', '-{} months')
ORDER BY result_date
""".format(months)
results = db.execute(query, [athlete_id, event]).fetchall()
if len(results) < 3:
return None
# Convert to pandas for analysis
df = pd.DataFrame(results, columns=['date', 'performance'])
df['date'] = pd.to_datetime(df['date'])
# Calculate trend line
x = (df['date'] - df['date'].min()).dt.days
z = np.polyfit(x, df['performance'], 1)
trend_slope = z[0] # seconds per day
# Performance improvement rate
annual_improvement = trend_slope * -365 # negative slope = improvement
# Recent form (last 3 months vs previous period)
recent_avg = df[df['date'] >= df['date'].max() - pd.Timedelta(days=90)]['performance'].mean()
previous_avg = df[df['date'] < df['date'].max() - pd.Timedelta(days=90)]['performance'].mean()
return {
'trend_slope': trend_slope,
'annual_improvement_seconds': annual_improvement,
'recent_form_change': previous_avg - recent_avg,
'consistency_score': 1 / df['performance'].std() if df['performance'].std() > 0 else 0
}