Sports AnalyticsProduction

Swansea Harriers Analytics Platform

Comprehensive athlete performance analytics system integrating Power of 10 API data with custom Python data processing, SQL database architecture, and intuitive management interfaces for coaches and team managers.

TECHNOLOGY STACK

PythonFlaskSQLitePower of 10 APIpandasHTML/CSS/JavaScriptChart.jsREST API

View Repository

Project Overview

Challenge

Swansea Harriers needed a comprehensive system to track athlete performance, analyze training progress, and provide coaches with actionable insights. The existing manual processes were time-consuming and limited analytical capabilities.

Solution Architecture

• Power of 10 API integration for official race results
• Python data cleaning and transformation pipeline
• SQLite database for performance data storage
• Flask web application with coach/manager interfaces
• Automated data synchronization and validation

System Capabilities

500+

Athletes tracked across all categories

15+ Events

Track & field disciplines monitored

5 Years

Historical performance data

Data Integration & Processing

The system integrates with the Power of 10 rankings database, which contains official race results for UK athletics. Raw data requires significant cleaning and normalization before it can be used for meaningful analysis.

Power of 10 API Integration

Python

import requests
import pandas as pd
from datetime import datetime, timedelta

class PowerOf10Connector:
    def __init__(self, api_base="https://www.thepowerof10.info/api/"):
        self.api_base = api_base
        self.session = requests.Session()
        
    def get_athlete_results(self, athlete_id, years_back=5):
        """Fetch athlete results with data validation"""
        end_date = datetime.now()
        start_date = end_date - timedelta(days=365 * years_back)
        
        params = {
            'athlete_id': athlete_id,
            'start_date': start_date.strftime('%Y-%m-%d'),
            'end_date': end_date.strftime('%Y-%m-%d')
        }
        
        try:
            response = self.session.get(
                f"{self.api_base}athlete/results", 
                params=params,
                timeout=30
            )
            response.raise_for_status()
            
            raw_data = response.json()
            return self._clean_results_data(raw_data)
            
        except requests.exceptions.RequestException as e:
            self.logger.error(f"API request failed: {e}")
            return []
    
    def _clean_results_data(self, raw_data):
        """Clean and normalize race result data"""
        cleaned_results = []
        
        for result in raw_data.get('results', []):
            # Normalize performance times
            performance = self._parse_performance(result.get('performance'))
            
            # Validate event classification
            event = self._standardize_event_name(result.get('event'))
            
            # Extract venue and competition info
            venue_info = self._parse_venue_data(result.get('venue'))
            
            if performance and event:
                cleaned_results.append({
                    'athlete_id': result.get('athlete_id'),
                    'performance_seconds': performance,
                    'event': event,
                    'date': result.get('date'),
                    'venue': venue_info.get('name'),
                    'competition': result.get('competition'),
                    'wind_speed': result.get('wind'),
                    'position': result.get('position')
                })
                
        return cleaned_results

Data Processing Pipeline

Data Cleaning Challenges

Performance Format Variations

Race times in different formats (MM:SS.ss, H:MM:SS, etc.) requiring standardization to seconds.

Event Name Inconsistencies

Multiple naming conventions for the same event across different competitions and venues.

Data Quality Issues

Missing wind readings, incomplete venue information, and duplicate result entries.

Performance Time Normalization

Python

def _parse_performance(self, perf_string):
    """Convert various time formats to seconds"""
    if not perf_string:
        return None
        
    # Remove common suffixes and clean
    clean_perf = perf_string.strip().upper()
    clean_perf = re.sub(r'[^0-9:.]', '', clean_perf)
    
    try:
        # Handle different time formats
        if ':' in clean_perf:
            parts = clean_perf.split(':')
            
            if len(parts) == 2:  # MM:SS.ss
                minutes = float(parts[0])
                seconds = float(parts[1])
                return minutes * 60 + seconds
                
            elif len(parts) == 3:  # H:MM:SS.ss
                hours = float(parts[0])
                minutes = float(parts[1])
                seconds = float(parts[2])
                return hours * 3600 + minutes * 60 + seconds
                
        else:  # Pure seconds
            return float(clean_perf)
            
    except (ValueError, IndexError):
        self.logger.warning(f"Could not parse performance: {perf_string}")
        return None
        
    return None

Database Architecture

Schema Design

• Athletes Table: Personal details, categories, club membership
• Results Table: Performance data with event classification
• Events Table: Standardized event definitions and categories
• Competitions Table: Meet information and venue details
• Personal Bests: Calculated PB tracking by event
• Season Bests: Annual performance tracking

Database Schema (SQLite)

SQL

-- Athletes table with comprehensive tracking
CREATE TABLE athletes (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    po10_id VARCHAR(20) UNIQUE,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    date_of_birth DATE,
    gender VARCHAR(1) CHECK (gender IN ('M', 'F')),
    category VARCHAR(10),  -- U17, U20, Senior, etc.
    club_membership_start DATE,
    is_active BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Results with comprehensive performance data
CREATE TABLE results (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    athlete_id INTEGER REFERENCES athletes(id),
    event_id INTEGER REFERENCES events(id),
    competition_id INTEGER REFERENCES competitions(id),
    performance_seconds REAL NOT NULL,
    wind_speed REAL,
    position INTEGER,
    is_personal_best BOOLEAN DEFAULT FALSE,
    is_season_best BOOLEAN DEFAULT FALSE,
    result_date DATE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Indexed for performance analysis queries
CREATE INDEX idx_results_athlete_event ON results(athlete_id, event_id);
CREATE INDEX idx_results_date ON results(result_date);
CREATE INDEX idx_results_performance ON results(performance_seconds);

Management Interface

The Flask web application provides coaches and team managers with intuitive interfaces for athlete monitoring, performance analysis, and training planning. The system automatically calculates personal bests, season bests, and performance trends.

Key Features

Athlete Profiles

Comprehensive athlete pages showing PBs, SBs, recent results, and performance progression charts.

Team Analysis

Group performance summaries, team rankings, and comparative analysis across age categories.

Performance Trends

Visual progression tracking with seasonal comparisons and improvement identification.

Competition Planning

Entry standard tracking, qualification monitoring, and target time setting tools.

Technical Implementation

Architecture Decisions

SQLite Database

Chosen for simplicity and portability while providing full SQL capabilities for complex analytical queries.

Flask Framework

Lightweight web framework ideal for internal tools with rapid development and easy deployment.

pandas Processing

Powerful data manipulation capabilities for cleaning, transforming, and analyzing performance data.

System Benefits

• Automated data synchronization reduces manual entry
• Historical trend analysis identifies training patterns
• Performance benchmarking across age groups
• Competition readiness assessment tools
• Injury risk indicators through performance drops
• Training load optimization recommendations

Performance Analytics

The system provides advanced analytics capabilities that help coaches make data-driven decisions about training programs, competition scheduling, and athlete development strategies.

Performance Analysis Functions

Python

def calculate_performance_trends(athlete_id, event, months=12):
    """Calculate performance trends and improvement rates"""
    query = """
        SELECT result_date, performance_seconds
        FROM results r
        JOIN athletes a ON r.athlete_id = a.id
        JOIN events e ON r.event_id = e.id
        WHERE a.id = ? AND e.name = ?
        AND result_date >= date('now', '-{} months')
        ORDER BY result_date
    """.format(months)
    
    results = db.execute(query, [athlete_id, event]).fetchall()
    
    if len(results) < 3:
        return None
        
    # Convert to pandas for analysis
    df = pd.DataFrame(results, columns=['date', 'performance'])
    df['date'] = pd.to_datetime(df['date'])
    
    # Calculate trend line
    x = (df['date'] - df['date'].min()).dt.days
    z = np.polyfit(x, df['performance'], 1)
    trend_slope = z[0]  # seconds per day
    
    # Performance improvement rate
    annual_improvement = trend_slope * -365  # negative slope = improvement
    
    # Recent form (last 3 months vs previous period)
    recent_avg = df[df['date'] >= df['date'].max() - pd.Timedelta(days=90)]['performance'].mean()
    previous_avg = df[df['date'] < df['date'].max() - pd.Timedelta(days=90)]['performance'].mean()
    
    return {
        'trend_slope': trend_slope,
        'annual_improvement_seconds': annual_improvement,
        'recent_form_change': previous_avg - recent_avg,
        'consistency_score': 1 / df['performance'].std() if df['performance'].std() > 0 else 0
    }