Dataset: Multi-Engine SERP Propagation and Feature Signals from News-Derived Queries (Google, Brave, Mojeek)
1 min read
Comprehensive SERP Data (v2)
This dataset is a second-generation release of the SERP profiling workflow, built for multi-engine analysis and robustness checks.
It includes (i) the keyword seeds derived from news headlines, (ii) indexed SERP records, and (iii) the final derived feature dataset used for statistical analysis.
Highlights
- Engines: Google, Brave, Mojeek
- Rank range: 1–20
- Core files:
keywords.csv: news headline–derived search terms (seed file)index.parquet: indexed SERP universe (post-collection reconciliation)dataset-20260222_141303.parquet: derived dataset used for analysis (features after extraction/acceptance)
- Reports: dataset generation summary, index statistics, outlier & consistency checks
Research Use Cases
- Cross-engine SERP comparison beyond Google/Bing (Google vs Brave vs Mojeek)
- Robustness / consistency analysis under data-quality constraints
- Feature trend analysis across rank buckets and source/no-source conditions
- Reproducible dataset generation pipelines for SERP research
📂 Dataset is hosted on Hugging Face:
Clone (Hugging Face): git clone https://huggingface.co/datasets/goker/comp-serp-data-v2
Dataset page: https://huggingface.co/datasets/goker/comp-serp-data-v2
Notes on raw artifacts
Raw SERP artifacts (HTML / JSON / screenshots / runtime metrics) may contain third-party content.
This release focuses on research-ready derived data products and documentation. If raw artifacts are released, they are handled separately and referenced from this page.