Skip to content

Questions about unaggregated variables and the usage of variables in GROUP BY #3879

@nkaralis

Description

@nkaralis

Version

6.0.0

Question

Hello.

I would like to ask some questions about the handling of unaggregated variables.

For the examples provided below, I have used Jena v6.0.0 (Fuseki and ARQ) and the following graph.

<http://www.example.org/s2> <http://www.example.org/p1> <http://www.example.org/o3>.
<http://www.example.org/s2> <http://www.example.org/p2> "test literal".
<http://www.example.org/s2> <http://www.example.org/p2> <http://www.example.org/o2>.
<http://www.example.org/s1> <http://www.example.org/p1> <http://www.example.org/o1>.
<http://www.example.org/s3> <http://www.example.org/p1> <http://www.example.org/o5>.
<http://www.example.org/s3> <http://www.example.org/p2> <http://www.example.org/o6>.

In the paragraph 11.4 of the specification, it is stated that "In a query level which uses aggregates, only expressions consisting of aggregates and constants may be projected, with one exception. When GROUP BY is given with one or more simple expressions consisting of just a variable, those variables may be projected from the level."

Based on the above statement, the queries

Q1. SELECT ?s (COUNT(?o) AS ?c) WHERE { ?s ?p ?o } and

Q2. SELECT (?s AS ?s2) (COUNT(?o) AS ?c) WHERE { ?s ?p ?o }

raise the errors "Non-group key variable in SELECT: ?s" and "Non-group key variable in SELECT: ?s in expression ?s", respectively.

However, in the paragraph 18.2.4.1, the provided algorithm does not follow the statement of paragraph 11.4, as it replaces an unaggregated V variable with SAMPLE(V)

First question: Which statement should be followed?

Second question: Based on the algorithm of paragraph 18.2.4.1, should Q2 be rejected?

Third question: Why does the algorithm treat (?X AS VAR) and ?X differently?

Fourth question: When it comes to the cases of ORDER BY(X) and HAVING(X), shouldn't the queries Q3 and Q4 (provided below) return the same results?

Q3. SELECT ?s (COUNT(DISTINCT ?o) AS ?c) WHERE { ?s ?p ?o } GROUP BY ?s HAVING(ISIRI(?p)) (empty result set)

Q4. SELECT ?s (COUNT(DISTINCT ?o) AS ?c) WHERE { ?s ?p ?o } GROUP BY ?s HAVING(ISIRI(SAMPLE(?p))) (explicit SAMPLE, three solutions)


I also have a few questions about the variables appearing in GROUP BY.

In the table provided in paragraph 18.2.1 of the specification, it is stated that V is in-scope given an expression GROUP BY (expr AS v).

Fifth Question: Should the query Q5 (provided below) throw a syntax error?

Q5. SELECT (COUNT(DISTINCT ?o) AS ?c) WHERE { ?s ?p ?o } GROUP BY (1 AS ?c) # empty result set

Sixth Question: Since the GROUP BY is processed before the SELECT clause, is ?c always assigned the UNDEF value in the Group in Q6 (provided below)?

Q6. SELECT (COUNT(DISTINCT ?o) AS ?c) WHERE { ?s ?p ?o } GROUP BY ?c # one solution

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions