Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

postgresql - Improving a function that UPSERTs based on an input array

I am hoping to get some help improving a method for UPSERTing rows passed in as an array. I'm on Postgres 11.4 with deployment on RDS. I'm got a lot of tables to sort out, but am starting with a simple table for experimentation:

BEGIN;
DROP TABLE IF EXISTS "data"."item" CASCADE;

CREATE TABLE IF NOT EXISTS "data"."item" (
    "id" uuid NOT NULL DEFAULT NULL,
    "marked_for_deletion" boolean NOT NULL DEFAULT false,
    "name_" citext NOT NULL DEFAULT NULL,

CONSTRAINT item_id_pkey
    PRIMARY KEY ("id")
);

CREATE INDEX item_marked_for_deletion_ix_bgin ON "data"."item" USING GIN("marked_for_deletion") WHERE marked_for_deletion = true;

ALTER TABLE "data"."item" OWNER TO "user_change_structure";
COMMIT;

The function, so far, looks like this:

DROP FUNCTION IF EXISTS data.item_insert_array (item[]);

CREATE OR REPLACE FUNCTION data.item_insert_array (data_in item[]) 
  RETURNS int
AS $$
INSERT INTO item (
    id, 
    marked_for_deletion, 
    name_)

SELECT
    d.id, 
    d.marked_for_deletion,
    d.name_

FROM unnest(data_in) d

ON CONFLICT(id) DO UPDATE SET 
    marked_for_deletion = EXCLUDED.marked_for_deletion,
    name_ = EXCLUDED.name_;

SELECT cardinality(data_in); -- array_length() doesn't work. ˉ\_(ツ)_/ˉ

$$ LANGUAGE sql;

ALTER FUNCTION data.item_insert_array(item[]) OWNER TO user_bender;

And a call looks like this:

select * from item_insert_array(

    array[
        ('2f888809-2777-524b-abb7-13df413440f5',true,'Salad fork'),
        ('f2924dda-8e63-264b-be55-2f366d9c3caa',false,'Melon baller'),
        ('d9ecd18d-34fd-5548-90ea-0183a72de849',true,'Fondue fork')
        ]::item[]
    );

I'm trying to develop a system for UPSERT that is injection-safe and that performs well. I'll be replacing a more naive multi-value insert where the INSERT is composed completely on the client side. Meaning, I can't be certain that I'm not introducing defects when concatenating the text. (I asked about this here: Postgres bulk insert/update that's injection-safe. Perhaps a function that takes an array?)

I've gotten this far with the help of various excellent answers:

https://dba.stackexchange.com/questions/224785/pass-array-of-mixed-type-into-stored-function

https://dba.stackexchange.com/questions/131505/use-array-of-composite-type-as-function-parameter-and-access-it

https://dba.stackexchange.com/questions/225176/how-to-pass-an-array-to-a-plpgsql-function-with-variadic-parameter/

I'm not trying for the most complex version of all of this, for instance, I am fine with a single function per table, and fine that every array element has exactly the same format. I'll write code generators to build out everything I need, once I've got the basic pattern sorted out. So, I don't think that I need VARIADIC parameter lists, polymorphic elements, or everything-packaged-as-JSON. (Although I will need to insert JSON from time to time, that's just data.)

I can still use some remedial help with some questions:

  • Is the code above injection-safe, or do I need to rewrite it in PL/pgSQL to use something like FOREACH with an EXECUTE...USING or FORMAT or quote_literal, etc.?

  • I'm setting the input array to item[]. That's fine as I'm passing in all of the fields for this tiny table, but I won't always want to pass in all columns. I thought that I could use anyarray as the type within the function, but I can't figure out how to pass in an array in that scenario. Is there a generic array-of-stuff type? I can create custom types for each of these functions, but I'd rather not. Mainly, because I would only use the type in that one situation.

  • It seems like it would make sense to implement this as a procedure rather than a function so that I can handle the transaction within the function. Am I off base on that?

  • Any stylistic (or otherwise) on what to return? I'm returning a count now, which is at least a little useful.

I'm out over my skis a bit here, so any general comments will be much appreciated. For clarity, what I'm after is a schema for inserting multiple rows safely and with decent performance that, ideally, doesn't involve a custom type per function or COPY.

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We've got a lot of different servers pushing up to central tables in Postgres, which adds another wrinkle. What if I add a column to my table:

ALTER TABLE item ADD COLUMN category citext;

Now the table has four columns instead of three.

All of my existing pushes immediately break because now there's a column missing from the inputs. There is a 0% chance that we can update all of the server simultaneously, so that's no an option.

One solution is to create a custom type for each version of the table:

CREATE TYPE item_v1 AS (
    id uuid,
    marked_for_deletion boolean,
    name_ citext);

CREATE TYPE item_v2 AS (
    id uuid,
    marked_for_deletion boolean,
    name_ citext,
    category citext);

And then a function for each type:

CREATE OR REPLACE FUNCTION data.item_insert_array (data_in item_v1[]) 
etc.

CREATE OR REPLACE FUNCTION data.item_insert_array (data_in item_v2[]) 
etc.

I guess you could have a single ginormous method that takes anyarray and uses a CASE to sort out what code to run. I wouldn't do that for a few reasons, but I suppose you could. (I've seen that approach turn gangrenous in more than one language in a real hurry.)

All of that seems like a fair bit of work. Is there a simpler technique I'm missing? I'm imagining that you could submit structured text/XML/JSON, unpack it and work from there. But I would not file that under "simpler."

I'm still working through the design here, obviously. I've written up enough code to test out what I've shown, but want to sort out the details before going back and implementing this on dozens of tables.

Thanks for any help.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...