SoporteDBA: 2016

viernes, 21 de octubre de 2016

ETL - Pentaho PDI, parallelizing database input steps

We have plenty documentation in the web about parallelizing and clustering Pentaho PDI destination steps, but no much information about performing database input steps.

By example, if we want to improve performance of a Pentaho PDI ETL process in a MongoDB Input step by using parallelism, we could specify the "Number of Copies to Start" (right click on MongoDB step) to 10:

Only with this configuration we will have 10 steps consuming the same dataset, we need something to partition the dataset, so we could use the "mod" function over a field matching with the step instance copy at execution time, we´ll do this in the "Query" tab of the step:

{$where:"this.field%10 == ${Internal.Step.CopyNr}"}

So, at execution time, this javascript criteria will allow to retrieve subsets of the original dataset based on a mod partition over the parallelism specified and the step instance copy (parameter Internal.Step.CopyNr) .

Whe need a field with numbers with high cardinality to make this technique effective.

miércoles, 27 de julio de 2016

Script - Kill inactive non dedicated connections

This killing procedure will allow us to close connections from inactive and non dedicated connections, (if we have to kill active or dedicated connections we already have database/so tools to do this (db.killOp, kill)):

     1) Locate the process :

netstat -lnap|grep 127

tcp        0      0 127.0.0.1:27017 127.0.0.1:55798 ESTABLISHED 2516/mongod

    2) Locate the the socket's FD (file descriptor)

lsof -np 2516|grep 127

COMMAND PID   USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
mongod 2516 mongod   98u IPv4 458536      0t0    TCP 127.0.0.1:27017->127.0.0.1:55798 (ESTABLISHED)

   3) Connect to the master process, close the socket and exit:

gdb -p 2516
call close(98u)
quit

martes, 22 de marzo de 2016

Script - Comprobar parámetros de otra sesión

Podemos comprobar los valores de los parámetros de cualquier sesión de nuestra base de datos Oracle haciendo uso de la vista v$ses_optimizer_env, en el ejemplo podemos ver que para la sesión 183 se ha establecido un valor de 500Mb para el parámetro hash_area_size:

select * from v$ses_optimizer_env where sid=183

SID ID NAME                           ISDEFAULT VALUE
--- --- ------------------------------ ---------- --------------
183   2 parallel_execution_enabled     YES        true
183   9 optimizer_features_enable      YES        10.2.0.5
183 11 cpu_count                      YES        24
183 12 active_instance_count          YES        1
183 13 parallel_threads_per_cpu       YES        2
183 14 hash_area_size                 NO         500000000
183 15 bitmap_merge_area_size         YES        1048576
183 16 sort_area_size                 YES        65536
183 17 sort_area_retained_size        YES        0
183 24 pga_aggregate_target           YES        3145728 KB
183 25 _pga_max_size                  NO         629140 KB
183 35 parallel_query_mode            YES        enabled
183 36 parallel_dml_mode              YES        disabled
183 37 parallel_ddl_mode              YES        enabled
183 38 optimizer_mode                 YES        all_rows
183 48 cursor_sharing                 NO         force
183 50 star_transformation_enabled    YES        false
183 66 optimizer_index_cost_adj       YES        100
183 67 optimizer_index_caching        YES        0
183 70 query_rewrite_enabled          YES        true
183 71 query_rewrite_integrity        YES        enforced
183 101 workarea_size_policy           YES        auto
183 105 optimizer_dynamic_sampling     YES        2
183 112 statistics_level               NO         basic
183 114 skip_unusable_indexes          YES        true
183 165 optimizer_secure_view_merging YES        true