Comprehensive SERP Data (v2)

This dataset is a second-generation release of the SERP profiling workflow, built for multi-engine analysis and robustness checks.
It includes (i) the keyword seeds derived from news headlines, (ii) indexed SERP records, and (iii) the final derived feature dataset used for statistical analysis.

Highlights

Engines: Google, Brave, Mojeek
Rank range: 1–20
Core files:
- keywords.csv: news headline–derived search terms (seed file)
- index.parquet: indexed SERP universe (post-collection reconciliation)
- dataset-20260222_141303.parquet: derived dataset used for analysis (features after extraction/acceptance)
Reports: dataset generation summary, index statistics, outlier & consistency checks

Research Use Cases

Cross-engine SERP comparison beyond Google/Bing (Google vs Brave vs Mojeek)
Robustness / consistency analysis under data-quality constraints
Feature trend analysis across rank buckets and source/no-source conditions
Reproducible dataset generation pipelines for SERP research

📂 Dataset is hosted on Hugging Face:

Clone (Hugging Face): git clone https://huggingface.co/datasets/goker/comp-serp-data-v2

Dataset page: https://huggingface.co/datasets/goker/comp-serp-data-v2

Notes on raw artifacts

Raw SERP artifacts (HTML / JSON / screenshots / runtime metrics) may contain third-party content.
This release focuses on research-ready derived data products and documentation. If raw artifacts are released, they are handled separately and referenced from this page.

Comprehensive SERP Data (v2)

Highlights

Engines: Google, Brave, Mojeek
Rank range: 1–20
Core files:
- keywords.csv: news headline–derived search terms (seed file)
- index.parquet: indexed SERP universe (post-collection reconciliation)
- dataset-20260222_141303.parquet: derived dataset used for analysis (features after extraction/acceptance)
Reports: dataset generation summary, index statistics, outlier & consistency checks

Research Use Cases

Cross-engine SERP comparison beyond Google/Bing (Google vs Brave vs Mojeek)
Robustness / consistency analysis under data-quality constraints
Feature trend analysis across rank buckets and source/no-source conditions
Reproducible dataset generation pipelines for SERP research

📂 Dataset is hosted on Hugging Face:

Clone (Hugging Face): git clone https://huggingface.co/datasets/goker/comp-serp-data-v2

Dataset page: https://huggingface.co/datasets/goker/comp-serp-data-v2

Project

Commit

Tech stack

✓ Trust the developer and buy me a coffee

datasets-comp-serp-data-v2

Dataset: Multi-Engine SERP Propagation and Feature Signals from News-Derived Queries (Google, Brave, Mojeek)

Comprehensive SERP Data (v2)

Highlights

Research Use Cases

Notes on raw artifacts

datasets-comp-serp-data-v2

Dataset: Multi-Engine SERP Propagation and Feature Signals from News-Derived Queries (Google, Brave, Mojeek)

Comprehensive SERP Data (v2)

Highlights

Research Use Cases

Notes on raw artifacts