SELECT
statementThe most important (and most commonly used) MSQL statement is the SELECT
statement, which is used for retrieving data from all the components of the Mireo SpaceTime Cluster, processing it in various ways and producing meaningful results.
The following is the supported syntax of the SELECT
statement:
[ WITH cte AS (query), ... ]
SELECT
expr [ [AS] alias ], ...
[ FROM (tableName | subQuery [[AS] alias])
[
([ LEFT | RIGHT ] [ INNER | OUTER ] JOIN (TABLE | subQuery) [ [AS] tbl ]
(ON condition | USING (columnName, ...)) ], ... )
| ( (CROSS | OUTER) APPLY tableFunction(args...) [ [AS] tbl ] )
]
[ WHERE condition ]
[ GROUP BY expr, ... ]
[ HAVING condition ]
[ UNION ALL query ... ]
[ ORDER BY expr [(ASC | DESC)] [(NULLS FIRST | NULLS LAST)], ... ]
[ LIMIT m ]
[ OFFSET n ]
The only required part of the SELECT
statement is the select-list, i.e. a list of value expressions coming immediately after the SELECT
keyword. These expressions are evaluated for each input row (e.g.) and their results become the column values of the output row (this operation is often called projection). Each expression in the list can be given a name (i.e. an alias, specified by the AS name
construct) to make it easier to refer to it later.
Aliased expression names aren't visible inside the WHERE
and JOIN
clauses, because those clauses are evaluated before the query select-list.
A special kind of projection expression is *
(e.g. in SELECT * FROM st.segment
), which signifies that all the columns of the input table should simply be copied to the output row.
Two examples of valid MSQL queries using projection:
SELECT to_char(1611577608::timestamp)::text; -- Note: FROM clause is not mandatory in MSQL queries
--> '2021-01-25 12:26:48.000000'
SELECT t.vid AS vehicle_id, t.* FROM st.trip AS t;
-- Selects all fields from the st.trip table (aliased as `t`) and also duplicates the `vid` field, aliasing it as `vehicle_id`
To return only those rows that satisfy a specified filtering condition, an (optional) WHERE
clause can be added to the SELECT
statement. This filtering is always done before projections and aggregations, so as to process only the rows which will enter into the final result.
A filtering condition must be a boolean-valued expression (i.e. its result must be of a bool type) and can refer to any of the input columns, but it cannot refer to items from the select-list, as they aren't yet evaluated at the moment when the filtering is done.
Only the rows for which the filtering condition evaluates to true will be retained in the result. Rows for which the condition evaluates to NULL
will be dropped just as if it evaluated to false. For example, SELECT COUNT(*) FROM (VALUES 15, NULL::int64) AS t(x) WHERE x > 10
will return 1
because the second row (where x = NULL
) is dropped.
The MSQL query planner can convert certain filtering conditions into index constraints (e.g. for SpaceTime range queries), which can dramatically increase the query performance. Whether this happened or not can be seen from the query execution plan.
An example of a filter which can be converted to SpaceTime range query constraints:
SELECT sum(len) FROM st.trip
WHERE vid = 3267
AND t[0] BETWEEN timestamp '2020-07-01 00:00:00' AND timestamp '2020-07-31 23:59:59'
AND ST_Intersects(point '(x0, y0)', ST_ToGeom(geogpolygon '((16.2598 48.1487, 16.5015 48.1487, 16.5015 48.2750, 16.2598 48.2750, 16.2598 48.1487))'));
An example of a filter which cannot be so converted:
SELECT count(*) FROM st.segment WHERE flags & 4 = 1; -- Bitwise operation which cannot be converted into SpaceTime range query constraint
The results of a query are typically returned in an unspecified order, i.e. the ordering of result rows can be arbitrary and even differ between two runs of the same query. But an exact ordering of the results can be accomplished by specifying the ORDER BY
clause, in which case the result rows will be sorted before being returned from the query.
The ORDER BY
clause specifies a list of columns or expressions by which to sort the output rows. Instead of column names, column indices (starting at 1
) can also be specified, e.g. ORDER BY 1
will order the rows by the value of the first column.
Each column/expression in the ORDER BY
list can have a sorting direction, either ASC
(ascending) or DESC
(descending), with ascending being the default if none is specified. Likewise, each column/expression can also have an individual policy for treating NULL
values: NULLS FIRST
which places them before anything else (the default when sorting is descending) or NULLS LAST
which places them after everything else (default when sorting is ascending).
An example of a query which specifies an ordering:
SELECT *
FROM (VALUES (1, 10), (5, 7), (NULL::int64, 17), (5, 11), (5, NULL::int64)) AS t(x, y)
ORDER BY x, y DESC;
--> +------+------+
--> | x | y |
--> +------+------+
--> | 1 | 10 |
--> | 5 | NULL | -- (y = NULL) before other values
--> | 5 | 11 |
--> | 5 | 7 |
--> | NULL | 17 | -- (x = NULL) after other values
--> +------+------+
A query declaring null-handling policies for ordering (the specified policies are the opposite of the defaults):
SELECT *
FROM (VALUES (1, 10), (5, 7), (NULL::int64, 17), (5, 11), (5, NULL::int64)) AS t(x, y)
ORDER BY x NULLS FIRST, y DESC NULLS LAST;
--> +------+------+
--> | x | y |
--> +------+------+
--> | NULL | 17 | -- (x = NULL) before other values
--> | 1 | 10 |
--> | 5 | 11 |
--> | 5 | 7 |
--> | 5 | NULL | -- (y = NULL) after other values
--> +------+------+
Specifying a column index instead of a column name:
SELECT x % 3, x
FROM (VALUES 5, 4, 6, 10) AS t(x)
ORDER BY 1; -- Ordering by the first column, i.e. x % 3
--> +-------+----+
--> | x % 3 | x |
--> +-------+----+
--> | 0 | 6 |
--> | 1 | 4 |
--> | 1 | 10 |
--> | 2 | 5 |
--> +-------+----+
The maximum number of rows that the query should return can be specified using the LIMIT
clause. The argument of the LIMIT
clause must be an unsigned integer specifying the maximum number of result rows. If there are fewer rows than the amount specified, all rows will be returned.
Likewise, a starting offset can be specified with the OFFSET
clause, which also takes an unsigned integer argument specifying the number of rows to skip before starting to return results. If this is larger than the total number of rows, the result will be empty.
When both LIMIT num
and OFFSET idx
are specified, the query will return the next num
rows starting from the idx
-th row. The rows skipped by the OFFSET
clause will still have to be computed, therefore a large OFFSET
might result in a slow query.
The limiting and offsetting operations are logically applied after result ordering, so they can be used to e.g. select top 10 rows according to a certain criterion.
If an ORDER BY
is not specified before the LIMIT
and OFFSET
clauses, the results of the query will be non-deterministic, because in that case the LIMIT
and OFFSET
clause will be applied on input rows which will come in an unspecified order.
The execution plan resulting from adding a LIMIT
clause to a query involving joins or aggregates might not be what a first-time user will expect.
There is no way the MSQL query planner can know how many rows to fetch from the database in order to produce a maximum of e.g. 100 result rows, if the rows will first be aggregated or joined (and dropped if they don't have a join match). Therefore, queries such as SELECT SUM(len) FROM st.trip LIMIT 100
will actually process all the rows of the st.trip
table, and the query will be slow.
A basic example of a limiting query:
SELECT * FROM (VALUES 1, 2, 3, 4) AS t(x) ORDER BY x LIMIT 2;
--> +---+
--> | x |
--> +---+
--> | 1 |
--> | 2 |
--> +---+
A case of fewer rows than the amount specified in LIMIT
(all rows are returned):
SELECT * FROM (VALUES 1, 2, 3, 4) AS t(x) ORDER BY x LIMIT 10;
--> +---+
--> | x |
--> +---+
--> | 1 |
--> | 2 |
--> | 3 |
--> | 4 |
--> +---+
A query specifying both LIMIT
and OFFSET
:
SELECT * FROM (VALUES 1, 2, 3, 4) AS t(x) ORDER BY x LIMIT 2 OFFSET 1;
--> +---+
--> | x |
--> +---+
--> | 2 |
--> | 3 |
--> +---+
An offset larger than the total amount of rows (the query returns nothing):
SELECT * FROM (VALUES 1, 2, 3, 4) AS t(x) ORDER BY x OFFSET 10;
-- Returns an empty resultset
Data from several tables can be queried and joined together by specifying the JOIN
clause after the FROM
.
Join condition can be specified in two ways, either as JOIN table ON condition
, where condition
is a boolean-valued expression, or as JOIN table USING (colName1 [, colName2, ...])
, where colName1..N
are names of the columns (they must be the same in both tables) to be compared for equality. USING (vid)
is mostly just a shortcut for ON t1.vid = t2.vid
, with the only difference being that the columns specified in USING
won't be duplicated in the result, while in the case of ON
they will be (e.g. in the example above there will be two vid
columns, one from t1
and the other from t2
).
Type | Description |
---|---|
INNER |
Only rows which have a match will end up in the result. The default. |
LEFT (OUTER) |
Rows from the left side that don't have a match will also end up in the result |
RIGHT (OUTER) |
Rows from the right side that don't have a match will also end up in the result |
FULL (OUTER) |
Rows from both sides that don't have a match will also end up in the result |
CROSS |
Returns a Cartesian product of two (or more) tables |
NATURAL JOINs
aren't supported in the MSQL language. You should instead list the join columns explicitly using JOIN ... USING (col1, col2, ...)
syntax.
It is important to note that a single row from either side can be matched with multiple rows from the opposite side, if they all satisfy the given condition.
Two or more tables can also be joined together implicitly, by listing them all in the FROM
clause, and putting the join condition inside WHERE
(e.g. SELECT * FROM st.segment, ndb.vehicle WHERE vid = id
). The MSQL query planner should always be able to convert this into proper joins.
MSQL also supports a similar syntax, the CROSS/OUTER APPLY
, which is logically equivalent to INNER/LEFT JOIN
, only it usually results in a correlated subquery. This syntax is most commonly used when invoking table-valued functions, e.g. SELECT formatted_address FROM st.trip CROSS APPLY map.rev_geocode(x0, y0)
.
A basic example of a query involving JOIN
(i.e. an inner join, which is the default):
WITH
t1(x) AS (VALUES 1, 2, 3, 4),
t2(y) AS (VALUES 1, 2, 5, 6)
SELECT * FROM t1 JOIN t2 ON x = y
ORDER BY 1, 2;
--> +---+---+
--> | x | y |
--> +---+---+
--> | 1 | 1 |
--> | 2 | 2 |
--> +---+---+
Join which produces multiple matches for each row from the left/right (i.e. it isn't an equi-join):
WITH
t1(x) AS (VALUES 1, 2, 3, 4),
t2(y) AS (VALUES 1, 2, 5, 6)
SELECT * FROM t1 JOIN t2 ON x > y
ORDER BY 1, 2;
--> +---+---+
--> | x | y |
--> +---+---+
--> | 2 | 1 |
--> | 3 | 1 |
--> | 3 | 2 |
--> | 4 | 1 |
--> | 4 | 2 |
--> +---+---+
Specifying the join condition via USING
(note there is just one x
column, not two as in the previous example):
WITH
t1(x) AS (VALUES 1, 2, 3, 4),
t2(x) AS (VALUES 1, 2, 5, 6)
SELECT * FROM t1 JOIN t2 USING(x)
ORDER BY x;
--> +---+
--> | x |
--> +---+
--> | 1 |
--> | 2 |
--> +---+
Left (outer) join (note that all rows from the left side are returned, but some results have NULL
in the y
column):
WITH
t1(x) AS (VALUES 1, 2, 3, 4),
t2(y) AS (VALUES 1, 2, 5, 6)
SELECT * FROM t1 LEFT OUTER JOIN t2 ON x = y -- OUTER is optional, LEFT JOIN would work the same
ORDER BY 1, 2;
--> +---+------+
--> | x | y |
--> +---+------+
--> | 1 | 1 |
--> | 2 | 2 |
--> | 3 | NULL |
--> | 4 | NULL |
--> +---+------+
Right (outer) join (note that all rows from the right side are returned, but some results have NULL
in the x
column):
WITH
t1(x) AS (VALUES 1, 2, 3, 4),
t2(y) AS (VALUES 1, 2, 5, 6)
SELECT * FROM t1 RIGHT OUTER JOIN t2 ON x = y -- OUTER is optional
ORDER BY 1, 2;
--> +------+---+
--> | x | y |
--> +------+---+
--> | 1 | 1 |
--> | 2 | 2 |
--> | NULL | 5 |
--> | NULL | 6 |
--> +------+---+
Full (outer) join (all rows are returned from both sides, but some results have NULL
in either x
or y
column):
WITH
t1(x) AS (VALUES 1, 2, 3, 4),
t2(y) AS (VALUES 1, 2, 5, 6)
SELECT * FROM t1 FULL OUTER JOIN t2 ON x = y -- OUTER is optional
ORDER BY 1, 2;
--> +------+------+
--> | x | y |
--> +------+------+
--> | 1 | 1 |
--> | 2 | 2 |
--> | 3 | NULL |
--> | 4 | NULL |
--> | NULL | 5 |
--> | NULL | 6 |
--> +------+------+
Cross join (Cartesian product of rows):
WITH
t1(x) AS (VALUES 1, 2, 3),
t2(y) AS (VALUES 4, 5, 6)
SELECT * FROM t1 CROSS JOIN t2
ORDER BY 1, 2;
--> +---+---+
--> | x | y |
--> +---+---+
--> | 1 | 4 |
--> | 1 | 5 |
--> | 1 | 6 |
--> | 2 | 4 |
--> | 2 | 5 |
--> | 2 | 6 |
--> | 3 | 4 |
--> | 3 | 5 |
--> | 3 | 6 |
--> +---+---+
Rows of the result can be grouped and aggregated together by specifying the GROUP BY
clause. The GROUP BY
clause defines a list of columns or expressions which should be used in grouping. Rows having the same values of the specified columns/expressions will then end up in the same group. Instead of column names, column indices (starting at 1) can also be specified, e.g. GROUP BY 1
will group the rows by the value of the first column (i.e. first expression specified in the select-list).
In a select-list of an aggregating query, the only expressions allowed are either aggregate expressions or references to expressions being used for grouping (e.g. those listed in the GROUP BY
).
If you want to select some value in an aggregating query without grouping by it (and it doesn't matter which precise value it is from within a group), you can use the ANY_VALUE aggregate function for this purpose.
If an aggregate expression appears in a select-list but there is no GROUP BY
clause, all the rows of the result will be aggregated together in a single resulting row.
Grouping without any aggregate expressions specified in the select-list effectively calculates the set of distinct values. This can be useful because MSQL currently doesn't support SELECT DISTINCT
, so grouping without aggregates can be a workaround.
Related to the GROUP BY
clause is the HAVING
clause, which specifies a boolean condition for filtering the resulting aggregate rows. The difference between WHERE
and HAVING
clauses is that the WHERE
clause is evaluated on the input rows (i.e. it determines whether a given row will be aggregated or not) and the HAVING
clause is evaluated on the rows of the aggregate result (i.e. it determines which result rows will be kept and which dropped).
A simple example of an aggregate query:
WITH t(id, val) AS (VALUES (1, 32), (1, 45), (2, 17), (4, 22))
SELECT id, sum(val) AS sum FROM t
GROUP BY id
ORDER BY id;
--> +----+-----+
--> | id | sum |
--> +----+-----+
--> | 1 | 77 |
--> | 2 | 17 |
--> | 4 | 22 |
--> +----+-----+
Grouping by more than one column (columns specified by index):
WITH t(x, y, z) AS (VALUES (1, 2, 3), (1, 1, 5), (2, 4, 1), (1, 2, 5))
SELECT x, min(z) AS min, y FROM t
GROUP BY 1, 3
ORDER BY 1, 3;
--> +---+-----+---+
--> | x | min | y |
--> +---+-----+---+
--> | 1 | 5 | 1 |
--> | 1 | 3 | 2 |
--> | 2 | 1 | 4 |
--> +---+-----+---+
Doing an aggregation over all rows (no grouping):
SELECT avg(x) FROM (VALUES 1, 4, 5, 9, -2, 10) AS t(x);
--> 4.5
Doing an aggregation over an empty rowset:
SELECT avg(x) FROM (VALUES 1, 2, 3) AS t(x) WHERE x > 10;
--> NULL
Specifying a HAVING
clause:
WITH t(id, val) AS (VALUES (1, 17), (2, 31), (1, 41), (3, 12), (4, 8), (2, 55))
SELECT id, sum(val) AS sum FROM t
GROUP BY id
HAVING sum > 20
ORDER BY sum DESC;
--> +----+-----+
--> | id | sum |
--> +----+-----+
--> | 2 | 86 |
--> | 1 | 58 |
--> +----+-----+
Using aggregation to produce a set of distinct values:
WITH t(id, val) AS (VALUES (1, 17), (2, 31), (1, 41), (3, 12), (4, 8), (2, 55))
SELECT id FROM t
GROUP BY 1
ORDER BY 1;
--> +----+
--> | id |
--> +----+
--> | 1 |
--> | 2 |
--> | 3 |
--> | 4 |
--> +----+
Individual results of several queries can be combined into a single resultset using the set operations, of which currently only UNION ALL
is supported.
The UNION ALL
set operation effectively concatenates the result rows of two or more queries into a single resultset, but it doesn't remove duplicate rows.
UNION DISTINCT
(which does remove duplicates) isn't currently supported in the MSQL language.
The syntax for the UNION ALL
operation is:
query1
UNION ALL
query2
[UNION ALL ...]
The queries being combined must all return the same number of columns, and corresponding columns must have compatible data types that can be converted to a common result type.
An example of combining queries with UNION ALL
:
WITH
t1(a, b, c) AS (SELECT 1, 'abc', true),
t2 AS (SELECT 3.14, NULL::text, false)
(SELECT * FROM t1) UNION ALL (SELECT * FROM t2);
--> +------+-------+-------+
--> | a | b | c |
--> +------+-------+-------+
--> | 1.0 | 'abc' | true |
--> | 3.14 | NULL | false |
--> +------+-------+-------+
MSQL offers full support for writing subqueries. Subqueries can be used in place of any scalar or table-valued expression, e.g. in the select-list, WHERE
clause, HAVING
clause (see scalar subqueries), as well as in FROM
and JOIN/APPLY
clauses (see table expressions).
Most of the subqueries are converted into other operations (such as joins) by the MSQL query planner, but a special kind of subqueries called correlated subqueries often cannot be so converted. They are rather executed as a separate query being invoked for each row of the parent query, which might carry some performance penalties.
There also exists a special syntax for joining with the results of a correlated subquery, CROSS/OUTER APPLY
. It causes the right side of the CROSS/OUTER APPLY
to be executed for each row from the left side, and combines the results the same as in JOIN
. The CROSS APPLY
construct is equivalent to INNER JOIN
, as it will drop those rows from the left for which the right side didn't return any results, while the OUTER APPLY
is equivalent to the LEFT JOIN
as it will keep those rows from the left (and set right-side columns to NULL
).
An example of CROSS APPLY
:
WITH places(name) AS (VALUES 'Colloseum', 'Stadio San Siro', 'Pyramid of Giza')
SELECT name AS search_name, city FROM places
CROSS APPLY (SELECT * FROM map.geocode_pts(name || ', Italy') LIMIT 1)
ORDER BY name;
--> +-----------------------+----------+
--> | search_name | city |
--> +-----------------------+----------+
--> | 'Colloseum' | 'Roma' |
--> | 'Stadio San Siro' | 'Milano' |
--> +-----------------------+----------+
An example of OUTER APPLY
(note the first row which didn't match anything):
WITH places(name) AS (VALUES 'Colloseum', 'Stadio San Siro', 'Pyramid of Giza')
SELECT name AS search_name, city FROM places
OUTER APPLY (SELECT * FROM map.geocode_pts(name || ', Italy') LIMIT 1)
ORDER BY name;
--> +-----------------------+----------+
--> | search_name | city |
--> +-----------------------+----------+
--> | 'Colloseum' | 'Roma' |
--> | 'Pyramid of Giza' | NULL |
--> | 'Stadio San Siro' | 'Milano' |
--> +-----------------------+----------+
Constant-value tables can be generated inside queries using the VALUES
keyword. These tables are only logical (i.e. the data is never materialized on disk), but they can be used in queries just as any other table. Such tables can be very useful, e.g. for defining and naming constant categories, generating sequences of numbers, converting numeric codes to textual descriptions etc.
The syntax of the VALUE
construct is:
VALUES
(r1_c1, r1_c2, ...),
(r2_c1, r2_c2, ...),
...
but a shorter version VALUES v1, v2, v3, ...
is also supported when there is only one column per row.
By default, the columns of the VALUES
result have an auto-generated name such as EXPR$0
, but it is always advised to assign your own name to them by specifying an alias (e.g. SELECT x FROM (VALUES 1, 2, 3) AS t(x)
), to be able to refer to the results in the rest of the query.
The VALUES (1, 'abc'), (2, 'def'), ...
is effectively equivalent to
(SELECT 1, 'abc')
UNION ALL
(SELECT 2, 'def')
UNION ALL
...
There is also a special table-valued function CONST.SEQUENCE, which can be used to generate value tables containing a sequence of integers.
An example of a query using VALUES
:
SELECT x, name
FROM (VALUES 1, 2, 3, 2) AS t(x)
JOIN (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS n(x, name) USING(x)
ORDER BY x;
--> +---+---------+
--> | x | name |
--> +---+---------+
--> | 1 | 'one' |
--> | 2 | 'two' |
--> | 2 | 'two' |
--> | 3 | 'three' |
--> +---+---------+
An example of a query using const.sequence
:
SELECT num, num*num AS num2 FROM const.sequence(1, 10);
--> +---+----+
--> | x | x2 |
--> +---+----+
--> | 1 | 1 |
--> | 2 | 4 |
--> | 3 | 9 |
--> | 4 | 16 |
--> | 5 | 25 |
--> | 6 | 36 |
--> | 7 | 49 |
--> | 8 | 64 |
--> | 9 | 81 |
--> +---+----+
MSQL query language also supports Common Table Expressions (CTE), which are auxiliary statements that define temporary tables for use in larger queries. These temporary tables are only logical (i.e. they are never physically materialized) and they exist just for this one query.
Common Table Expressions are specified using the keyword WITH
, via the following syntax:
WITH t1 AS (query1),
t2(col1, col2, ...) AS (query2), -- Note that column names can be specified next to the CTE name
...
SELECT ... -- The rest of query which can use `t1` and `t2` as tables in FROM, JOIN, subqueries etc.
Only regular SELECT
statements are supported inside Common Table Expressions, i.e. DDL statements, DESCRIBE
or EXPLAIN
aren't allowed.
Recursive CTEs aren't supported in the MSQL language.
CTEs are very useful in the context of larger queries, where they allow for reusing a part of the query instead of copying it around, they can help break deeply nested queries into shorter and more shallow ones, and in general they can substantially increase readability and intelligibility of queries.
An example of a query using CTEs:
WITH
data(id, val) AS (VALUES (1, 43), (2, 57), (3, 32), (4, 50), (5, 22)),
totals AS (SELECT avg(val) AS mean, stddev_samp(val) AS sd FROM data)
SELECT
id, val, mean, val - mean AS delta
FROM data, totals
WHERE abs(val - mean) > sd;
--> +----+-----+------+-------+
--> | id | val | mean | delta |
--> +----+-----+------+-------+
--> | 2 | 57 | 40.8 | 16.2 |
--> | 5 | 22 | 40.8 | -18.8 |
--> +----+-----+------+-------+