This page explains how to use the sample operator function in APL.
sample
operator in APL psuedo-randomly selects rows from the input dataset at a rate specified by a parameter. This operator is useful when you want to analyze a subset of data, reduce the dataset size for testing, or quickly explore patterns without processing the entire dataset. The sampling algorithm is not statistically rigorous but provides a way to explore and understand a dataset. For statistically rigorous analysis, use summarize
instead.
You can find the sample
operator useful when working with large datasets, where processing the entire dataset is resource-intensive or unnecessary. It’s ideal for scenarios like log analysis, performance monitoring, or sampling for data quality checks.
Splunk SPL users
sample
command works similarly, returning a subset of data rows randomly. However, the APL sample
operator requires a simpler syntax without additional arguments for biasing the randomness.ANSI SQL users
sample
operator, but you can achieve similar results using the TABLESAMPLE
clause. In APL, sample
operates independently and is more flexible, as it’s not tied to a table scan.ProportionOfRows
: A float greater than 0 and less than 1 which specifies the proportion of rows to return from the dataset. The rows are selected randomly._time | req_duration_ms | id | status | uri | method | geo.city | geo.country |
---|---|---|---|---|---|---|---|
2023-10-16 12:45:00 | 234 | user1 | 200 | /index | GET | New York | US |
2023-10-16 12:47:00 | 120 | user2 | 404 | /login | POST | Paris | FR |
2023-10-16 12:48:00 | 543 | user3 | 500 | /checkout | POST | Tokyo | JP |