System Architecture
Searchlock is designed as a modular distributed system that separates responsibilities between data collection, data processing, and user interaction.
The platform is composed of several independent components that communicate through APIs and shared storage. This architecture allows the system to scale individual components without affecting the entire platform.
High-Level Architecture
At a high level, the system consists of the following layers:
- Frontend Application — User interface for searching and tracking products
- Backend API — Handles business logic and communication with the database
- Database — Stores products, price history, and user information
- Scraping Engine — Collects product data from external websites
- Scheduler — Coordinates scraping tasks and periodic updates
Frontend Layer
The frontend provides a web interface for interacting with the system.
Users can:
- search for products
- compare prices between stores
- track products
- view historical price data
- receive notifications
The frontend communicates with the backend through a REST API, which handles all business logic and data retrieval.
Backend API
The backend acts as the central coordination layer of the platform.
Its responsibilities include:
- managing product data
- processing user requests
- storing price history
- managing user tracking lists
- exposing REST endpoints for the frontend
The API also ensures that incoming data from scraping services is properly validated and stored.
Database Layer
The database stores all persistent data required by the system.
Core entities include:
- users
- products
- stores
- price records
- tracking relationships
Price data is stored historically, allowing the system to generate price history and trend analysis.
Scraping System
The scraping system is responsible for collecting product information from supported e-commerce websites.
Each supported store has a dedicated spider responsible for extracting product information such as:
- product name
- price
- product URL
- store identifier
- timestamp
These spiders run through a scraping engine that manages crawling and data extraction.
Scraping Workflow
The scraping process follows a structured pipeline.
- The scheduler triggers scraping tasks.
- The engine starts the appropriate spider.
- The spider visits product pages.
- Data is extracted and validated.
- The pipeline processes the information.
- The results are stored in the database.
This structure allows scraping tasks to run independently from the main application.
Scheduler
The scheduler is responsible for triggering scraping jobs periodically.
Typical responsibilities include:
- scheduling crawling intervals
- distributing scraping workloads
- triggering updates for monitored products
This ensures the platform continuously updates price information without requiring manual intervention.
Scalability Considerations
The system was designed to allow horizontal scaling of the scraping layer.
Because scraping tasks are independent, additional workers can be deployed to:
- increase crawling capacity
- reduce update latency
- support additional stores
Separating scraping services from the backend API also prevents scraping workloads from affecting user-facing performance.
Design Principles
Several principles guided the system design:
Separation of concerns Each component has a clearly defined responsibility.
Fault isolation Failures in scraping services do not affect the main application.
Scalability New stores or spiders can be added without modifying the core system.
Extensibility The architecture allows future additions such as analytics, machine learning models, or automated price prediction.
Next Sections
The following documentation pages describe each subsystem in more detail:
- Scraping System
- Backend API
- Data Model
- Notification System
- Engineering Challenges
These sections explain the implementation details behind each component of the platform.