jueves, 28 de diciembre de 2017

Script - MongoDB stats, execution plans and clearing

Some usefull methods to review and clear execution plans in MongoDb

MongoDB stats, collections stats

db.stats()
db.<collection>.stats()

Execution plans clearing, needed if an incorrect execution plan appears and you need to refresh them.

db.<collection>.getPlanCache().clear()

Show execution plans by collection or by query

db.<collection>.getPlanCache().listQueryShapes()
db.<collection>.getPlanCache().getPlansByQuery({
        "query" : {
            "$or" : [
                {
                    "key1" : "value1",
                    "key2" : "value2"
                }
            ],
            "key3" : "value3"
        },
        "sort" : {},
        "projection" : {}
    })

jueves, 9 de noviembre de 2017

Script - MongoDB online data refresh method

Here we have an usefull command to refresh data from a mongoDB database to another.
No extra space is required (no backup space required, it will be done on the fly by a pipe command).

mongodump --host <host_orig> --username=<user_orig> --password=<password_orig> --authenticationDatabase=admin -d <db_orig> --archive --numParallelCollections=2| mongorestore --host=<host_dest> --username=<user_dest> --password=<password_dest> --authenticationDatabase=admin --nsFrom=<db_orig>.* --nsTo=<db_dest>.* --archive --numParallelCollections 2 --numInsertionWorkersPerCollection 2 2>&1

Where

<host_orig> = Original server ip address
<user_orig> = Original username with backup privileges
<password_orig> = Original username password
<db_orig> = Original database name


<host_dest> = Destination server ip address
<user_dest> = Destination username with restore privileges
<password_dest> = Destination username password
<db_dest> = Destination database name (can be different from original database name)

We can use this data refresh method in multiple situations, by example, if we want to rename a database, or create a dev environment from production, ...

lunes, 4 de septiembre de 2017

Páginas a la Carta - Lectópolis

El 8 de sept a partir de las 14:00 se abren las votaciones de la Plataforma Talento de ElPais.com, si te gusta la idea, no dudes en votarla en el siguiente enlace: https://t.co/1WTvLi7Kh6 (debes registrarte previamente en elpais.com).


gracias!!!

jueves, 22 de junio de 2017

Script - PostgreSQL multiple async parallel execution in PL/pgSQL

If you need to split and parallelize a pl/pgsql procedure execution to get better performance (due to the fact that a single pl/pgsql executes only in 1 thread cpu, it´s constraint to 1 single thread cpu), you can try the use of dblink async calls, which will save you a lot of time, without need to do more complicated solutions.

By example, we need to do a  massive insert into a table with a insert as select query, the server has a lot of IO capacity that isn´t used by a single pl/pgsql execution. Having a method to do a partition (by index date ranges or similar), you could do this:

-- Create the dblink extension
CREATE EXTENSION dblink;
GRANT EXECUTE ON FUNCTION dblink_connect_u(text) TO <user>;
GRANT EXECUTE ON FUNCTION dblink_connect_u(text, text) TO <user>;

select dblink_connect_u('dblink1','dbname=<database> user=<user>');
select dblink_connect_u('dblink2','dbname=<database> user=<user>');
select dblink_connect_u('dblink3','dbname=<database> user=<user>');



-- These are the aync calls, they will return immediatly and will begin to execute in backend
select * from dblink_send_query('dblink1','INSERT INTO <dest_table> SELECT * FROM <orig_table> WHERE AUD_FEC BETWEEN <FEC_INI1> AND <FEC_FIN1>');
select * from dblink_send_query('dblink2','INSERT INTO <dest_table> SELECT * FROM <orig_table> WHERE AUD_FEC BETWEEN <FEC_INI2> AND <FEC_FIN2>');
select * from dblink_send_query('dblink3','INSERT INTO <dest_table> SELECT * FROM <orig_table> WHERE AUD_FEC BETWEEN <FEC_INI3> AND <FEC_FIN3>');

-- Here we wait for each finished results
select * from dblink_get_result('dblink1') as t1(a text);
select * from dblink_get_result('dblink2') as t1(a text);
select * from dblink_get_result('dblink3') as t1(a text);

SELECT dblink_disconnect('dblink1');
SELECT dblink_disconnect('dblink2');
SELECT dblink_disconnect('dblink3');


If we have a date interval (fec_ini, fec_fin) and orig_table is indexed by aud_fec, we can use this to do an automatic date partition to parallelize up to maxParall variable inside a PL/pgSQL procedure:

SELECT cast((fec_fin-fec_ini)/maxParall as text) INTO vInterval;

FOR numParall IN 1..maxParall LOOP
    select dblink_connect_u('dblink'||numParall,'dbname=database user=user') into vText;
    select * from dblink_send_query('dblink'||numParall,'INSERT INTO dest_table SELECT * FROM orig_table WHERE AUD_FEC >= '||fec_ini||'+'||numParall-1||'*CAST('||chr(39)||vInterval||chr(39)||' AS interval) AND AUD_FEC < '||fec_ini||'+'||numParall||'*CAST('||chr(39)||vInterval||chr(39)||' AS interval)') into vInt;
END LOOP;       
FOR numParall IN 1..maxParall LOOP
    select * from dblink_get_result('dblink'||numParall) as t1(a text) into vText;
    SELECT dblink_disconnect('dblink'||numParall) into vText;
END LOOP;       

Hope this can help you.

miércoles, 22 de febrero de 2017

Script - MongoDB mongorestore performance


The following script is necessary if we want a great level of parallelism in the full restoration of a mongodb database.

For mongodb databases (without sharding) we can use the following technique to improve the performance of full type restorations.

It is based on a vertical scaling for mongodb databases without sharding (I know that mongo was not designed for vertical scaling, but sometimes you need this ...), so, if we deploy a server (aws or similar) with a lot of cpu threads, memory and iops capacity, we can get very low recovery times.

In this case we will avoid that mongorestore rebuild the indexes, so that it is the system itself that updates them as records are recovered, having them created before mongorestore.


  • Index definition extraction (from same database or another with similar dbname, collections and indexes)

echo "db.getCollectionNames().forEach(function(c) { if (c.indexOf('system.') == -1)
{ind=db.getCollection(c).getIndexes();
print('db.runCommand({ createIndexes:\"'+c+'\",indexes:'+JSON.stringify(ind)+'});');
}
});"|mongo <server>:<port>/<database> --username=<username> --password=<password> --authenticationDatabase=admin >> indexes.txt


  • Collections drop

echo "db.getCollectionNames().forEach(function(c) { if (c.indexOf('system.') == -1) db[c].drop();});"|mongo <server>:<port>/<database> --username=<username> --password=<password> --authenticationDatabase=admin

  • Index creation (without collections, empty database)

cat indexes.txt|mongo <server>:<port>/<database> --username=<username> --password=<password> --authenticationDatabase=admin

  • MongoRestore command without indexes and with high degree of parallelism

mongorestore -u <username> -p <password> --authenticationDatabase=admin <backup_dir>/<database> -d <database> --numParallelCollections 5 --numInsertionWorkersPerCollection 20 --gzip --noIndexRestore