How to Select Top N Rows or Random Samples Per Group in SQL / BigQuery

When working with grouped data, you often need either the top N rows per group (e.g., highest scores per category) or a random sample per group. Here’s how to do both efficiently, especially in BigQuery.

You can check also how to get Random Sample per group in Pandas.

1. Selecting Top N Rows Per Group

To get the top N records for each group (based on a sort order), use window functions with ROW_NUMBER():

SELECT *
FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY sort_col DESC) AS rn
  FROM my_table
)
WHERE rn <= 3;

Explanation:

PARTITION BY groups the data by group_col.
ORDER BY defines “top” within each group.
rn <= 3 keeps the top 3 rows per group.

2. Taking a Random Sample Per Group

To randomly sample N rows per group in BigQuery:

SELECT *
FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY RAND()) AS rn
  FROM my_table
)
WHERE rn <= 5;

Explanation:

ORDER BY RAND() randomizes rows within each group.
The outer filter keeps the first N (here 5) random rows per group.

3. Example & Demo

You can test the SQL on this link: Get Random N results per group in SQL:

CREATE TABLE yourtable (
  year int,
  id varchar(20),
  rate decimal(10,2)
);

INSERT INTO yourtable VALUES
(2006,    'p01', 8),
(2003,    'p01', 7.4),
(2008,    'p01', 6.8),
(2001,    'p01', 5.9),
(2007,    'p01', 5.3),
(2009,    'p01', 4.4),
(2002,    'p01', 3.9),
(2004,    'p01', 3.5),
(2005,    'p01', 2.1),
(2000,    'p01', 0.8),
(2001,    'p02', 12.5),
(2004,    'p02', 12.4),
(2002,    'p02', 12.2),
(2003,    'p02', 10.3),
(2000,    'p02', 8.7),
(2006,    'p02', 4.6),
(2007,    'p02', 3.3);

SELECT yourtable.*
FROM   yourtable INNER JOIN (
  SELECT   id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
  FROM     yourtable
  GROUP BY id) group_max
  ON yourtable.id = group_max.id
     AND FIND_IN_SET(year, grouped_year) <=3
ORDER BY
  yourtable.id, yourtable.year DESC;

The result will be 3 samples for each group:

year	id	rate
2008	p01	6.80
2006	p01	8.00
2003	p01	7.40
2004	p02	12.40
2002	p02	12.20
2001	p02	12.50

> Python Basics

> Advanced Python Tutorials

> Python Errors

> Pandas Advanced

> Pandas Count

> Pandas Column

> Pandas Basics

> Pandas DataFrame

> Pandas Row

> User Interface

> Advanced Linux

> Troubleshoot

> Video & Sound

> Linux Commands

> MySQL

> SQL Basics

> Python

> DB apps

> JupyterLab

> Jupyter Tips

> Jupyter Display

> Regex in Text Editor

> Regex Basics

> Regex Match

> Regex Date

> PyCharm Advanced

> Git and PyCharm

> PyCharm Error

> PyCharm Tips

> Linux Mint Applications

> VIrtual Machine

> Miscellaneous

> Java

> Automation

> Windows

> Office

> Cheat Sheet

1. Selecting Top N Rows Per Group

2. Taking a Random Sample Per Group

3. Example & Demo

Resources