Contributing¶
We welcome contributions to The Data Packet! This guide will help you get started with contributing to the project.
Development Setup¶
Prerequisites¶
Python 3.9 or higher
Git
Hatch (recommended for dependency management)
Installation¶
Fork and Clone the Repository
git clone https://github.com/YourUsername/The-Data-Packet.git cd The-Data-Packet
Install Hatch (if not already installed)
pip install hatch
Create Development Environment
hatch env create dev
Install Pre-commit Hooks (optional but recommended)
hatch run dev:pre-commit install
Development Workflow¶
Code Quality¶
The project maintains high code quality standards:
# Run all quality checks
hatch run dev:python -m black the_data_packet tests
hatch run dev:python -m isort the_data_packet tests
hatch run dev:python -m mypy --exclude 'docs' the_data_packet
Testing¶
# Run all tests
hatch test
# Run with coverage
hatch run test-cov
# Run specific test file
hatch run dev:python -m unittest tests.test_models -v
Building Documentation¶
# Build documentation
hatch run dev:docs
# Serve documentation locally
hatch run dev:docs-serve
Complete Build Pipeline¶
# Run the full build pipeline (code quality, tests, docs, packaging)
bb # This is an alias for the complete build process
Making Changes¶
Code Style Guidelines¶
PEP 8 Compliance: Code must follow PEP 8 style guidelines
Type Hints: All public APIs must have complete type annotations
Docstrings: All public classes and methods must have comprehensive docstrings
Import Sorting: Use isort for consistent import organization
Example Function:
def extract_article_data(soup: BeautifulSoup, url: str) -> ArticleData:
"""
Extract article data from a BeautifulSoup object.
Args:
soup: Parsed HTML content
url: The article URL
Returns:
ArticleData object with extracted information
Raises:
ValueError: If the HTML structure is invalid
RuntimeError: If extraction fails
"""
# Implementation here
Testing Requirements¶
New Features: Must include comprehensive tests
Bug Fixes: Must include regression tests
Coverage: Maintain or improve test coverage
Documentation: Update documentation for API changes
Commit Guidelines¶
Use clear, descriptive commit messages:
feat: add support for extracting article publication dates
- Parse publication dates from meta tags and JSON-LD
- Add date validation and formatting
- Include tests for various date formats
Fixes #123
Types of Contributions¶
Bug Reports¶
When reporting bugs, please include:
Clear Description: What happened vs what you expected
Reproduction Steps: Minimal example to reproduce the issue
Environment: Python version, OS, package version
Error Messages: Full traceback if applicable
Feature Requests¶
For new features:
Use Case: Describe the problem this solves
Proposed Solution: How you envision it working
Alternatives: Other approaches you considered
Breaking Changes: Whether this affects existing APIs
Code Contributions¶
Check Issues: Look for existing issues or create a new one
Create Branch: Use descriptive branch names (
feature/add-date-parsing)Write Tests: Include comprehensive tests for your changes
Update Documentation: Add or update relevant documentation
Submit PR: Create a pull request with detailed description
Pull Request Process¶
PR Requirements¶
Before submitting a pull request:
[ ] Tests pass locally (
hatch test)[ ] Code follows style guidelines (
black,isort,mypy)[ ] Documentation is updated
[ ] Commit messages are clear and descriptive
[ ] PR description explains the changes
Review Process¶
Automated Checks: CI pipeline runs tests and quality checks
Code Review: Maintainers review the code for quality and design
Feedback: Address any requested changes
Approval: Once approved, maintainers will merge the PR
Release Process¶
The project follows semantic versioning (SemVer):
Major: Breaking changes to public APIs
Minor: New features without breaking changes
Patch: Bug fixes and internal improvements
Project Structure¶
The-Data-Packet/
├── the_data_packet/ # Main package source code
│ ├── __init__.py # Package initialization and public API
│ ├── models/ # Data models
│ ├── clients/ # HTTP and RSS clients
│ ├── extractors/ # Content extraction logic
│ ├── scrapers/ # Main scraper orchestration
│ └── cli.py # Command-line interface
├── tests/ # Test suite
│ ├── test_*.py # Test modules
│ └── conftest.py # Shared test configuration
├── docs/ # Documentation
│ └── source/ # Sphinx documentation source
├── pyproject.toml # Project configuration
└── README.md # Project overview
Key Files¶
pyproject.toml: Project metadata, dependencies, and tool configurationthe_data_packet/__init__.py: Public API exportstests/conftest.py: Shared test fixtures and configurationdocs/source/conf.py: Sphinx documentation configuration
Architecture Guidelines¶
Design Principles¶
Single Responsibility: Each class has a clear, focused purpose
Dependency Injection: Components accept dependencies via constructor
Error Handling: Graceful degradation with informative error messages
Resource Management: Proper cleanup of network resources
Type Safety: Complete type annotations for better maintainability
Component Boundaries¶
Models: Pure data containers with validation
Clients: External service interactions (HTTP, RSS)
Extractors: HTML parsing and content extraction
Scrapers: High-level orchestration of components
CLI: User interface and argument handling
Getting Help¶
If you need help with development:
Documentation: Check the API documentation first
Issues: Search existing issues for similar questions
Discussions: Use GitHub Discussions for general questions
Code Examples: Look at the test suite for usage examples
Thank you for contributing to The Data Packet! 🎉