Contributing

We welcome contributions to The Data Packet! This guide will help you get started with contributing to the project.

Development Setup

Prerequisites

  • Python 3.9 or higher

  • Git

  • Hatch (recommended for dependency management)

Installation

  1. Fork and Clone the Repository

    git clone https://github.com/YourUsername/The-Data-Packet.git
    cd The-Data-Packet
    
  2. Install Hatch (if not already installed)

    pip install hatch
    
  3. Create Development Environment

    hatch env create dev
    
  4. Install Pre-commit Hooks (optional but recommended)

    hatch run dev:pre-commit install
    

Development Workflow

Code Quality

The project maintains high code quality standards:

# Run all quality checks
hatch run dev:python -m black the_data_packet tests
hatch run dev:python -m isort the_data_packet tests
hatch run dev:python -m mypy --exclude 'docs' the_data_packet

Testing

# Run all tests
hatch test

# Run with coverage
hatch run test-cov

# Run specific test file
hatch run dev:python -m unittest tests.test_models -v

Building Documentation

# Build documentation
hatch run dev:docs

# Serve documentation locally
hatch run dev:docs-serve

Complete Build Pipeline

# Run the full build pipeline (code quality, tests, docs, packaging)
bb  # This is an alias for the complete build process

Making Changes

Code Style Guidelines

  • PEP 8 Compliance: Code must follow PEP 8 style guidelines

  • Type Hints: All public APIs must have complete type annotations

  • Docstrings: All public classes and methods must have comprehensive docstrings

  • Import Sorting: Use isort for consistent import organization

Example Function:

def extract_article_data(soup: BeautifulSoup, url: str) -> ArticleData:
    """
    Extract article data from a BeautifulSoup object.

    Args:
        soup: Parsed HTML content
        url: The article URL

    Returns:
        ArticleData object with extracted information

    Raises:
        ValueError: If the HTML structure is invalid
        RuntimeError: If extraction fails
    """
    # Implementation here

Testing Requirements

  • New Features: Must include comprehensive tests

  • Bug Fixes: Must include regression tests

  • Coverage: Maintain or improve test coverage

  • Documentation: Update documentation for API changes

Commit Guidelines

Use clear, descriptive commit messages:

feat: add support for extracting article publication dates

- Parse publication dates from meta tags and JSON-LD
- Add date validation and formatting
- Include tests for various date formats

Fixes #123

Types of Contributions

Bug Reports

When reporting bugs, please include:

  • Clear Description: What happened vs what you expected

  • Reproduction Steps: Minimal example to reproduce the issue

  • Environment: Python version, OS, package version

  • Error Messages: Full traceback if applicable

Feature Requests

For new features:

  • Use Case: Describe the problem this solves

  • Proposed Solution: How you envision it working

  • Alternatives: Other approaches you considered

  • Breaking Changes: Whether this affects existing APIs

Code Contributions

  1. Check Issues: Look for existing issues or create a new one

  2. Create Branch: Use descriptive branch names (feature/add-date-parsing)

  3. Write Tests: Include comprehensive tests for your changes

  4. Update Documentation: Add or update relevant documentation

  5. Submit PR: Create a pull request with detailed description

Pull Request Process

PR Requirements

Before submitting a pull request:

  • [ ] Tests pass locally (hatch test)

  • [ ] Code follows style guidelines (black, isort, mypy)

  • [ ] Documentation is updated

  • [ ] Commit messages are clear and descriptive

  • [ ] PR description explains the changes

Review Process

  1. Automated Checks: CI pipeline runs tests and quality checks

  2. Code Review: Maintainers review the code for quality and design

  3. Feedback: Address any requested changes

  4. Approval: Once approved, maintainers will merge the PR

Release Process

The project follows semantic versioning (SemVer):

  • Major: Breaking changes to public APIs

  • Minor: New features without breaking changes

  • Patch: Bug fixes and internal improvements

Project Structure

The-Data-Packet/
├── the_data_packet/          # Main package source code
│   ├── __init__.py           # Package initialization and public API
│   ├── models/               # Data models
│   ├── clients/              # HTTP and RSS clients
│   ├── extractors/           # Content extraction logic
│   ├── scrapers/             # Main scraper orchestration
│   └── cli.py                # Command-line interface
├── tests/                    # Test suite
│   ├── test_*.py             # Test modules
│   └── conftest.py           # Shared test configuration
├── docs/                     # Documentation
│   └── source/               # Sphinx documentation source
├── pyproject.toml            # Project configuration
└── README.md                 # Project overview

Key Files

  • pyproject.toml: Project metadata, dependencies, and tool configuration

  • the_data_packet/__init__.py: Public API exports

  • tests/conftest.py: Shared test fixtures and configuration

  • docs/source/conf.py: Sphinx documentation configuration

Architecture Guidelines

Design Principles

  • Single Responsibility: Each class has a clear, focused purpose

  • Dependency Injection: Components accept dependencies via constructor

  • Error Handling: Graceful degradation with informative error messages

  • Resource Management: Proper cleanup of network resources

  • Type Safety: Complete type annotations for better maintainability

Component Boundaries

  • Models: Pure data containers with validation

  • Clients: External service interactions (HTTP, RSS)

  • Extractors: HTML parsing and content extraction

  • Scrapers: High-level orchestration of components

  • CLI: User interface and argument handling

Getting Help

If you need help with development:

  • Documentation: Check the API documentation first

  • Issues: Search existing issues for similar questions

  • Discussions: Use GitHub Discussions for general questions

  • Code Examples: Look at the test suite for usage examples

Thank you for contributing to The Data Packet! 🎉