jueves, 22 de junio de 2017

Script - PostgreSQL multiple async parallel execution in PL/pgSQL

If you need to split and parallelize a pl/pgsql procedure execution to get better performance (due to the fact that a single pl/pgsql executes only in 1 thread cpu, it´s constraint to 1 single thread cpu), you can try the use of dblink async calls, which will save you a lot of time, without need to do more complicated solutions.

By example, we need to do a  massive insert into a table with a insert as select query, the server has a lot of IO capacity that isn´t used by a single pl/pgsql execution. Having a method to do a partition (by index date ranges or similar), you could do this:

-- Create the dblink extension
CREATE EXTENSION dblink;
GRANT EXECUTE ON FUNCTION dblink_connect_u(text) TO <user>;
GRANT EXECUTE ON FUNCTION dblink_connect_u(text, text) TO <user>;

select dblink_connect_u('dblink1','dbname=<database> user=<user>');
select dblink_connect_u('dblink2','dbname=<database> user=<user>');
select dblink_connect_u('dblink3','dbname=<database> user=<user>');



-- These are the aync calls, they will return immediatly and will begin to execute in backend
select * from dblink_send_query('dblink1','INSERT INTO <dest_table> SELECT * FROM <orig_table> WHERE AUD_FEC BETWEEN <FEC_INI1> AND <FEC_FIN1>');
select * from dblink_send_query('dblink2','INSERT INTO <dest_table> SELECT * FROM <orig_table> WHERE AUD_FEC BETWEEN <FEC_INI2> AND <FEC_FIN2>');
select * from dblink_send_query('dblink3','INSERT INTO <dest_table> SELECT * FROM <orig_table> WHERE AUD_FEC BETWEEN <FEC_INI3> AND <FEC_FIN3>');

-- Here we wait for each finished results
select * from dblink_get_result('dblink1') as t1(a text);
select * from dblink_get_result('dblink2') as t1(a text);
select * from dblink_get_result('dblink3') as t1(a text);

SELECT dblink_disconnect('dblink1');
SELECT dblink_disconnect('dblink2');
SELECT dblink_disconnect('dblink3');


If we have a date interval (fec_ini, fec_fin) and orig_table is indexed by aud_fec, we can use this to do an automatic date partition to parallelize up to maxParall variable inside a PL/pgSQL procedure:

SELECT cast((fec_fin-fec_ini)/maxParall as text) INTO vInterval;

FOR numParall IN 1..maxParall LOOP
    select dblink_connect_u('dblink'||numParall,'dbname=database user=user') into vText;
    select * from dblink_send_query('dblink'||numParall,'INSERT INTO dest_table SELECT * FROM orig_table WHERE AUD_FEC >= '||fec_ini||'+'||numParall-1||'*CAST('||chr(39)||vInterval||chr(39)||' AS interval) AND AUD_FEC < '||fec_ini||'+'||numParall||'*CAST('||chr(39)||vInterval||chr(39)||' AS interval)') into vInt;
END LOOP;       
FOR numParall IN 1..maxParall LOOP
    select * from dblink_get_result('dblink'||numParall) as t1(a text) into vText;
    SELECT dblink_disconnect('dblink'||numParall) into vText;
END LOOP;       

Hope this can help you.