Watchdog is composed of several specialized components that work together to provide reliable uptime monitoring. This page provides detailed information about each component’s implementation and responsibilities.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/horlerdipo/watchdog/llms.txt
Use this file to discover all available pages before exploring further.
Orchestrator
The orchestrator is the main entry point and coordinator for the entire monitoring system. Location:orchestrator/orchestrator.go
Responsibilities
- Initialize the system (logger, event bus, supervisor)
- Register event listeners on the event bus
- Create and manage parent worker groups for each monitoring interval
- Prefill Redis with URL data from the database
- Coordinate graceful shutdown
Key methods
The orchestrator uses a mutex-protected map (
intervals) to safely manage worker groups across goroutines.Configuration
The orchestrator reads these environment variables:SUPERVISOR_POOL_FLUSH_BATCHSIZE: Number of tasks to batch before flushing (default: 100)SUPERVISOR_POOL_FLUSH_TIMEOUT: Seconds to wait before flushing incomplete batches (default: 5)
Parent worker
Each parent worker manages a pool of child workers for a specific monitoring interval. Location:worker/parent_worker.go
Responsibilities
- Spawn and manage a pool of child workers
- Receive tick signals from the orchestrator
- Fetch URL IDs from Redis for the assigned interval
- Divide work into chunks and distribute to child workers
- Forward work chunks through the work pool channel
Implementation details
The parent worker chunks URL IDs based on
MAXIMUM_WORK_POOL_SIZE to prevent overwhelming the work pool channel.Configuration
MAXIMUM_CHILD_WORKERS: Number of child workers to spawn (must be > 0)MAXIMUM_WORK_POOL_SIZE: Maximum size of work chunks distributed to children
Child worker
Child workers perform the actual HTTP checks and report results to the supervisor. Location:worker/child_worker.go
Responsibilities
- Listen for work chunks from the parent worker’s work pool
- Fetch full URL details from Redis hash storage
- Perform HTTP requests with configured method and timeout
- Evaluate HTTP response status codes
- Submit check results to the supervisor as tasks
HTTP check logic
A URL is considered healthy if the HTTP status code is in the 2xx range (200-299). Network errors or other status codes mark the URL as unhealthy.
Configuration
HTTP_REQUEST_TIMEOUT: Timeout in seconds for HTTP requests (default: 5)
Supervisor
The supervisor receives check results from workers, applies decision logic, and publishes domain events. Location:supervisor/supervisor.go
Responsibilities
- Receive check results through a buffered work pool channel
- Batch tasks for efficient processing
- Apply decision logic to determine success or failure
- Publish
ping.successfulorping.unsuccessfulevents to the event bus - Implement timeout-based flushing for incomplete batches
Batching logic
- Batch size: Flushes when buffer reaches
SUPERVISOR_POOL_FLUSH_BATCHSIZE - Timeout: Flushes incomplete batches after
SUPERVISOR_POOL_FLUSH_TIMEOUTseconds
Event publishing
Event bus
The event bus provides a lightweight pub/sub mechanism for decoupling event producers from consumers. Location:core/event_bus.go
Responsibilities
- Manage subscriptions to event topics
- Dispatch events to registered handlers
- Execute handlers asynchronously in separate goroutines
- Provide thread-safe subscription management
Implementation
Event handlers run asynchronously in separate goroutines, allowing the supervisor to continue processing without waiting for side effects to complete.
Event listeners
Listeners subscribe to events and handle side effects such as persistence and notifications.Ping successful listener
Location:events/listeners/ping_successful_listener.go
Handles successful ping events:
- State transition detection: Checks if the URL was previously unhealthy
- Incident resolution: Marks incidents as resolved in the database
- Recovery notification: Sends “Site is UP” email to the contact address
- Metrics persistence: Stores the successful check in the
url_statuseshypertable - Status update: Updates the URL’s current status to
healthyin theurlstable
Ping unsuccessful listener
Location:events/listeners/ping_unsuccessful_listener.go
Handles unsuccessful ping events:
- State transition detection: Checks if the URL was previously healthy
- Incident creation: Logs a new incident in the database
- Downtime notification: Sends “Site is DOWN” email to the contact address
- Status update: Updates the URL’s current status to
unhealthyin theurlstable - Metrics persistence: Stores the failed check in the
url_statuseshypertable
Notifications are only sent on state transitions (healthy → unhealthy or unhealthy → healthy), not on every check. This prevents notification spam.
Database repositories
Repositories encapsulate SQL operations for data persistence.URL repository
Location:database/url_repository.go
Provides methods for:
Add: Insert new monitored URLsFindById: Fetch URL by IDFetchAll: List URLs with filtering and paginationUpdateStatus: Update URL health statusRemove: Delete a URL
URL status repository
Location:database/url_status_repository.go
Provides methods for time-series data:
Add: Insert status check results into the hypertableGetRecentStatus: Fetch most recent status matching a conditionGetLastStatus: Fetch the most recent status regardless of value
Next steps
Architecture overview
Understand the high-level system design
Event flow
Follow the complete event-driven workflow