Skip to content

Feat/embeds#12

Draft
retro wants to merge 6 commits intomasterfrom
feat/embeds
Draft

Feat/embeds#12
retro wants to merge 6 commits intomasterfrom
feat/embeds

Conversation

@retro
Copy link
Owner

@retro retro commented Jun 12, 2022

This PR implements ability to embed relations under a column in Penkala queries. It is inspired by jOOQ's multiset operator and implemented in similar fashion.

Problem

Penkala is a single query builder. It generates SQL which is then passed to JDBC, and then decomposed by Penkala's decomposition function (which transforms SQL rows into a tree structure). There are queries that are hard to express using only joins (for instance: get first 5 users and 5 latest articles for each of those users) which is where embedded relations come into play.

Example

(deftest embedded-relations
  (let [{:keys [film film-actor film-category actor category]} *env*
        film-actors (-> actor
                        (r/with-parent film)
                        (r/inner-join film-actor :film-actor
                                      [:and
                                       [:= [:parent-scope :film-id] :film-actor/film-id]
                                       [:= :actor-id :film-actor/actor-id]] [])
                        (r/select [:first-name :last-name]))
        film-categories (-> category
                            (r/with-parent film)
                            (r/inner-join film-category :film-category
                                          [:and
                                           [:= [:parent-scope :film-id] :film-category/film-id]
                                           [:= :category-id :film-category/category-id]] [])
                            (r/select [:name]))

        films (-> film
                  (r/extend-with-embedded :actors film-actors)
                  (r/extend-with-embedded :categories film-categories)
                  (r/select [:title :actors :categories])
                  (r/order-by [:title])
                  (r/limit 5))
        res (select! *env* films)]
    (fact
     res => [#:film{:title "ACADEMY DINOSAUR",
                    :categories [#:category{:name "Documentary"}],
                    :actors
                    [#:actor{:last-name "GUINESS", :first-name "PENELOPE"}
                     #:actor{:last-name "GABLE", :first-name "CHRISTIAN"}
                     #:actor{:last-name "TRACY", :first-name "LUCILLE"}
                     #:actor{:last-name "PECK", :first-name "SANDRA"}
                     #:actor{:last-name "CAGE", :first-name "JOHNNY"}
                     #:actor{:last-name "TEMPLE", :first-name "MENA"}
                     #:actor{:last-name "NOLTE", :first-name "WARREN"}
                     #:actor{:last-name "KILMER", :first-name "OPRAH"}
                     #:actor{:last-name "DUKAKIS", :first-name "ROCK"}
                     #:actor{:last-name "KEITEL", :first-name "MARY"}]}
             #:film{:title "ACE GOLDFINGER",
                    :categories [#:category{:name "Horror"}],
                    :actors
                    [#:actor{:last-name "FAWCETT", :first-name "BOB"}
                     #:actor{:last-name "ZELLWEGER", :first-name "MINNIE"}
                     #:actor{:last-name "GUINESS", :first-name "SEAN"}
                     #:actor{:last-name "DEPP", :first-name "CHRIS"}]}
             #:film{:title "ADAPTATION HOLES",
                    :categories [#:category{:name "Documentary"}],
                    :actors
                    [#:actor{:last-name "WAHLBERG", :first-name "NICK"}
                     #:actor{:last-name "FAWCETT", :first-name "BOB"}
                     #:actor{:last-name "STREEP", :first-name "CAMERON"}
                     #:actor{:last-name "JOHANSSON", :first-name "RAY"}
                     #:actor{:last-name "DENCH", :first-name "JULIANNE"}]}
             #:film{:title "AFFAIR PREJUDICE",
                    :categories [#:category{:name "Horror"}],
                    :actors
                    [#:actor{:last-name "DEGENERES", :first-name "JODIE"}
                     #:actor{:last-name "DAMON", :first-name "SCARLETT"}
                     #:actor{:last-name "PESCI", :first-name "KENNETH"}
                     #:actor{:last-name "WINSLET", :first-name "FAY"}
                     #:actor{:last-name "KILMER", :first-name "OPRAH"}]}
             #:film{:title "AFRICAN EGG",
                    :categories [#:category{:name "Family"}],
                    :actors
                    [#:actor{:last-name "PHOENIX", :first-name "GARY"}
                     #:actor{:last-name "TAUTOU", :first-name "DUSTIN"}
                     #:actor{:last-name "LEIGH", :first-name "MATTHEW"}
                     #:actor{:last-name "CARREY", :first-name "MATTHEW"}
                     #:actor{:last-name "TEMPLE", :first-name "THORA"}]}])))

(this example implements the same query as the one on the jOOQ blog)

Generated SQL

SELECT
  (
    SELECT
      json_build_object (
        'heading',
        array_to_json("data-and-types-embedded-43819"."heading"),
        'body',
        array_to_json("data-and-types-embedded-43819"."body")
      )
    FROM
      (
        SELECT
          array [ array [ 'first-name',
          pg_typeof("data-embedded-43819"."first-name") :: text ],
          array [ 'last-name',
          pg_typeof("data-embedded-43819"."last-name") :: text ] ] as heading,
          array_agg(
            array [ to_json("data-embedded-43819"."first-name"),
            to_json("data-embedded-43819"."last-name") ]
          ) AS body
        FROM
(
            SELECT
              "actor"."first_name" AS "first-name",
              "actor"."last_name" AS "last-name"
            FROM
              "actor" AS "actor"
              INNER JOIN (
                SELECT
                  "film_actor"."actor_id" AS "actor-id",
                  "film_actor"."film_id" AS "film-id",
                  "film_actor"."last_update" AS "last-update"
                FROM
                  "film_actor" AS "film_actor"
              ) "film-actor" ON (
                "film"."film_id" = "film-actor"."film-id"
                AND "actor"."actor_id" = "film-actor"."actor-id"
              )
          ) "data-embedded-43819"
        GROUP BY
          heading
      ) "data-and-types-embedded-43819"
  ) AS "actors",
  (
    SELECT
      json_build_object (
        'heading',
        array_to_json("data-and-types-embedded-43820"."heading"),
        'body',
        array_to_json("data-and-types-embedded-43820"."body")
      )
    FROM
      (
        SELECT
          array [ array [ 'name',
          pg_typeof("data-embedded-43820"."name") :: text ] ] as heading,
          array_agg(array [ to_json("data-embedded-43820"."name") ]) AS body
        FROM
(
            SELECT
              "category"."name" AS "name"
            FROM
              "category" AS "category"
              INNER JOIN (
                SELECT
                  "film_category"."category_id" AS "category-id",
                  "film_category"."film_id" AS "film-id",
                  "film_category"."last_update" AS "last-update"
                FROM
                  "film_category" AS "film_category"
              ) "film-category" ON (
                "film"."film_id" = "film-category"."film-id"
                AND "category"."category_id" = "film-category"."category-id"
              )
          ) "data-embedded-43820"
        GROUP BY
          heading
      ) "data-and-types-embedded-43820"
  ) AS "categories",
  "film"."title" AS "title"
FROM
  "film" AS "film"
ORDER BY
  "film"."title" FETCH NEXT 5 ROWS ONLY

Embedded relations are converted to JSON, and we also pick up SQL types of all columns so we can coerce them during decomposition. Coercion is usually happening on the JDBC level, but since JSON has only a subset of types that can be returned from the query we will need to implement coercion in our codebase (with com.verybigthings.penkala.decomposition/coerce-embedded-value multimethod).

@neektza
Copy link
Collaborator

neektza commented Jun 13, 2022

What are the performance implications of using this feature? Are there any online resources that discuss this?

@retro
Copy link
Owner Author

retro commented Jun 13, 2022

@neektza https://blog.jooq.org/the-performance-of-various-to-many-nesting-algorithms/ . It's still N+1, but on the DB level, so no network roundtrip. TL;DR, it will depend on the size of the dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants