Here are some commonly used scenarios that can be met by custom queries:
1. Query based on the user's distinct_id
Directly query the specific actions of a user on a certain day using distinct_id:
SELECT * FROM events WHERE distinct_id = 'wahaha' AND date = '2015-09-10' LIMIT 100
SQL
2. Query the number of users who placed orders between 10 AM and 11 AM every day
Use standard SQL date function EXTRACT to retrieve the hour information.
SELECT date, COUNT(*) FROM events WHERE EXTRACT(HOUR FROM time) IN (10, 11) AND event = 'SubmitOrder' GROUP BY 1
SQL
3. Query the distribution of the number of user orders over a period of time.
First calculate the number of orders for each user, and then use the CASE..WHEN syntax to group them.
SELECT CASE WHEN c < 10 THEN '<10' WHEN c < 20 THEN '<20' WHEN c < 100 THEN '<100' ELSE '>100' END, COUNT(*) FROM ( SELECT user_id, COUNT(*) AS c FROM events WHERE date BETWEEN '2015-09-01' AND '2015-09-20' AND event = 'SubmitOrder' GROUP BY 1 )a GROUP BY 1
SQL
Use LEFT OUTER JOIN to calculate the difference set.
SELECT a.user_id FROM ( SELECT DISTINCT user_id FROM events WHERE date='2015-10-1' AND event = 'BuyGold' ) a LEFT OUTER JOIN ( SELECT DISTINCT user_id FROM events WHERE date='2015-10-1' AND event = 'SaleGold' ) b ON a.user_id = b.user_id WHERE b.user_id IS NULL
SQL
5. Calculate the user's usage duration.
Use analytical functions to estimate the cumulative usage duration based on the intervals between two consecutive events for each user, excluding intervals longer than 10 minutes.
SELECT user_id, SUM( CASE WHEN end_time - begin_time < 600 THEN end_time - begin_time ELSE 0 END ) FROM ( SELECT user_id, EXTRACT(EPOCH FROM time) AS end_time, LAG(EXTRACT(EPOCH FROM time), 1, NULL) OVER (PARTITION BY user_id ORDER BY time asc) AS begin_time FROM events WHERE date='2015-5-1' ) a GROUP BY 1
SQL