Skip to content

Wired

the_data_packet.sources.wired

Wired.com article source implementation.

This module implements article collection from Wired.com using RSS feeds and web scraping. Wired.com provides RSS feeds for different categories that contain recent article URLs, which are then scraped for full content.

Features
  • RSS feed-based article discovery
  • Multiple category support (security, guides, business, science, AI)
  • Robust content extraction with fallback methods
  • Content cleaning and validation
  • Error handling for network issues and malformed content
RSS Feed Strategy
  1. Fetch category-specific RSS feed
  2. Parse feed to extract article URLs
  3. Scrape individual articles for full content
  4. Clean and validate extracted content
  5. Return standardized Article objects
Content Extraction
  • Primary: Article body containers and paragraph tags
  • Fallback: Main content areas and text containers
  • Cleaning: Remove navigation, ads, and boilerplate text
  • Validation: Ensure sufficient content length
Supported Categories
  • security: Security and cybersecurity articles
  • guide: How-to guides and tutorials
  • business: Business and industry news
  • science: Science and technology research
  • ai: Artificial intelligence and machine learning
Rate Limiting
  • Respectful delays between requests
  • Connection reuse via HTTP session
  • Proper User-Agent identification
Example Usage

source = WiredSource()

Get latest security article

article = source.get_latest_article("security")

Get multiple guide articles

articles = source.get_multiple_articles("guide", count=3)

Check supported categories

if "ai" in source.supported_categories: ai_articles = source.get_multiple_articles("ai", count=5)

logger = get_logger(__name__) module-attribute

WiredSource

Article source for Wired.com.

http_client = HTTPClient() instance-attribute

RSS_FEEDS = {'security': 'https://www.wired.com/feed/category/security/latest/rss', 'science': 'https://www.wired.com/feed/category/science/latest/rss', 'ai': 'https://www.wired.com/feed/tag/ai/latest/rss'} class-attribute instance-attribute

SKIP_PATTERNS = ['subscribe to wired', 'most popular', 'related stories', 'advertisement', 'get wired', 'sign up', 'newsletter'] class-attribute instance-attribute

name: str property

Source name identifier.

supported_categories: List[str] property

List of supported categories.

__init__() -> None

Initialize Wired source.

get_latest_article(category: str) -> Article

Get the latest article from a category.

get_multiple_articles(category: str, count: int) -> List[Article]

Get multiple articles from a category.